Abstract

Background

Respiratory syncytial virus (RSV) is a widespread respiratory pathogen, and RSV-related acute lower respiratory tract infections are the most common cause of respiratory hospitalization in children <2 years of age. Over the last 2 decades, a number of severity scores have been proposed to quantify disease severity for RSV in children, yet there remains no overall consensus on the most clinically useful score.

Methods

We conducted a systematic review of English-language publications in peer-reviewed journals published since January 2000 assessing the validity of severity scores for children (≤24 months of age) with RSV and/or bronchiolitis, and identified the most promising scores. For included articles, (1) validity data were extracted, (2) quality of reporting was assessed using the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis checklist (TRIPOD), and (3) quality was assessed using the Prediction Model Risk Of Bias Assessment Tool (PROBAST). To guide the assessment of the validity data, standardized cutoffs were employed, and an explicit definition of what we required to determine a score was sufficiently validated.

Results

Our searches identified 8541 results, of which 1779 were excluded as duplicates. After title and abstract screening, 6670 references were excluded. Following full-text screening and snowballing, 32 articles, including 31 scores, were included. The most frequently assessed scores were the modified Tal score and the Wang Bronchiolitis Severity Score; none of the scores were found to be sufficiently validated according to our definition. The reporting and/or design of all the included studies was poor. The best validated score was the Bronchiolitis Score of Sant Joan de Déu, and a number of other promising scores were identified.

Conclusions

No scores were found to be sufficiently validated. Further work is warranted to validate the existing scores, ideally in much larger datasets.

Respiratory syncytial virus (RSV) is a common respiratory infection; it is estimated that by the age of 2 years most children will have experienced at least 1 RSV infection [1]. While the vast majority of RSV infections in infants are self-limiting and nonserious, presenting only with generic symptoms of a mild upper respiratory tract infection (eg, cough or runny nose), a fraction of infants will develop an acute lower respiratory tract infection, most commonly presenting as bronchiolitis or less commonly as pneumonia. We previously estimated that in 2019, there were 33.0 million cases of RSV-related acute lower respiratory tract infections in children aged <5 years, which resulted in 3.6 million hospital admissions and 101 400 RSV-attributable overall deaths [2]. As such, RSV-related acute lower respiratory tract infections are the most common cause of respiratory hospitalizations in children aged <5 years. Notably, the vast majority of RSV-related acute lower respiratory tract infections occur in low-income countries.

Over the last 2 decades, a number of different scoring systems have been proposed to quantify disease severity of RSV in children to aid in clinical decision-making and to serve as outcome measure/clinical endpoint for clinical trials of vaccines and therapeutics. There are many ways to assess the usefulness of these scores; this primarily consists of assessing their validity (face, discriminative, construct, criterion), reliability, responsiveness, and utility [3, 4].

A major review of severity scores, published more than a decade ago but still oft-cited, found all of the pediatric dyspnea scores to be insufficiently evaluated across all domains [3]. The literature base was reexamined in a systematic review and meta-analysis published in 2017, a review published in 2018, and most recently in a rapid review published in 2020 specifically looking to identify scores for resource-limited settings [5–7]. All of these similarly found the severity scores to have been insufficiently validated.

This lack of a validated severity score has a significant impact on clinical trials; a 2015 meeting of key academic, commercial, and regulatory stakeholders in RSV vaccine development identified the lack of “clinically meaningful and reproducible indicators” as the biggest challenge to RSV vaccine development [8]. The lack of consensus was similarly expressed in a recent review of RSV vaccines [9].

Given that it has been almost 3 years since the last review was conducted, we sought to reexamine the literature base to identify and report on efforts to validate clinical severity scores for use in children (≤24 months of age) with RSV and/or bronchiolitis and to synthesize the data to report on the criterion-concurrent and construct validity of the identified severity scores, as well as the included parameters of these scores. Based on this, we identified the most promising scores.

METHODS

Three online medical literature databases (Medline, Embase, and Global Health) were searched using the Ovid platform in June 2022 for English-language publications published in peer-reviewed journals since January 2000 on the validity of severity scores for children with RSV or bronchiolitis. The search strategies for each database can be found in Supplementary Annex 1; they were adapted from a recent systematic review on biomarkers for disease severity in RSV [10].

A severity score was defined as a tool used to quantify disease severity over the course of the illness; as such, single-purpose models (eg, models designed to only predict hospital admission) were excluded.

Covidence software was used to identify and automatically exclude duplicates [11]. After removing duplicates, we screened the titles and abstracts of the articles for relevance using predefined inclusion/exclusion criteria (Table 1). The inclusion/exclusion criteria were similarly adapted from the aforementioned biomarkers review [10].

Table 1.

Inclusion and Exclusion Criteria

Inclusion CriteriaExclusion Criteria
Published in a peer-reviewed journalNot published in a peer-reviewed journal
Published since 2000Published prior to 2000
Published in the English languagePublished in any language other than English
Human RSV and/or bronchiolitis studiesStudies in animal models or cell lines, and studies of children without an RSV or bronchiolitis diagnosis
Relation explored between clinical measures and severity of RSV infection, and including a defined clinical severity scoreStudies focused on treatment, diagnostics, or epidemiology of RSV infection
Children (≤24 mo old) with RSV and/or bronchiolitisStudies in those >24 mo old with RSV and/or bronchiolitis
At least 50 children with RSV and/or bronchiolitis<50 children with RSV and/or bronchiolitis
Inclusion CriteriaExclusion Criteria
Published in a peer-reviewed journalNot published in a peer-reviewed journal
Published since 2000Published prior to 2000
Published in the English languagePublished in any language other than English
Human RSV and/or bronchiolitis studiesStudies in animal models or cell lines, and studies of children without an RSV or bronchiolitis diagnosis
Relation explored between clinical measures and severity of RSV infection, and including a defined clinical severity scoreStudies focused on treatment, diagnostics, or epidemiology of RSV infection
Children (≤24 mo old) with RSV and/or bronchiolitisStudies in those >24 mo old with RSV and/or bronchiolitis
At least 50 children with RSV and/or bronchiolitis<50 children with RSV and/or bronchiolitis

Abbreviation: RSV, respiratory syncytial virus.

Table 1.

Inclusion and Exclusion Criteria

Inclusion CriteriaExclusion Criteria
Published in a peer-reviewed journalNot published in a peer-reviewed journal
Published since 2000Published prior to 2000
Published in the English languagePublished in any language other than English
Human RSV and/or bronchiolitis studiesStudies in animal models or cell lines, and studies of children without an RSV or bronchiolitis diagnosis
Relation explored between clinical measures and severity of RSV infection, and including a defined clinical severity scoreStudies focused on treatment, diagnostics, or epidemiology of RSV infection
Children (≤24 mo old) with RSV and/or bronchiolitisStudies in those >24 mo old with RSV and/or bronchiolitis
At least 50 children with RSV and/or bronchiolitis<50 children with RSV and/or bronchiolitis
Inclusion CriteriaExclusion Criteria
Published in a peer-reviewed journalNot published in a peer-reviewed journal
Published since 2000Published prior to 2000
Published in the English languagePublished in any language other than English
Human RSV and/or bronchiolitis studiesStudies in animal models or cell lines, and studies of children without an RSV or bronchiolitis diagnosis
Relation explored between clinical measures and severity of RSV infection, and including a defined clinical severity scoreStudies focused on treatment, diagnostics, or epidemiology of RSV infection
Children (≤24 mo old) with RSV and/or bronchiolitisStudies in those >24 mo old with RSV and/or bronchiolitis
At least 50 children with RSV and/or bronchiolitis<50 children with RSV and/or bronchiolitis

Abbreviation: RSV, respiratory syncytial virus.

For the remaining included articles, their full text was acquired and subsequently screened for relevance. The reference lists of articles identified for inclusion, as well as 3 previous reviews, were examined to identify additional relevant references (ie, snowballing) [3, 6, 7].

Data from the included studies were extracted into a standardized spreadsheet [12]. The World Bank's income level classification scheme was used to categorize the economies of the countries [13]. Data were simultaneously separately collected on the parameters included in each score (eg, presence of fever). Additionally, score names were standardized.

Given the widely observed poor quality of publications reporting prediction models, as well as specifically for severity scores for RSV, we employed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis checklist (TRIPOD), a 23-item checklist to quantify the quality of reporting [5, 14–16]. The related Prediction Model Risk Of Bias Assessment Tool (PROBAST) was also employed to assess the risk of bias of included studies [14, 17]. For the included studies, the TRIPOD and PROBAST checklists were both assessed.

Each of the abovementioned steps were conducted independently by 2 reviewers (E. P. and Z. S.); any uncertainty was resolved through consultation with a senior researcher (H. N.). We updated the searches up to 15 August 2023.

Given the heterogeneous nature of the included studies and the small amount of data on each severity score, only a narrative synthesis was made and a meta-analysis was not conducted. The review was registered with PROSPERO (CRD42022343781).

Quality Assessment of Validity of Identified Scores

Using the data extracted from the included studies, we assessed each of the identified scores for their face, construct (discriminative and convergent), and criterion-concurrent validity. We found, similarly to the 2014 review, a wide range of different uses of these terms and so have explicitly specified how we categorized and assessed the validity data (Supplementary Table 1) [3].

To guide our assessment, the same cutoffs as proposed by Hakizimana et al in their rapid review were used [7]. For the area under the receiver operating characteristic curve (AUROC), a score of <0.5 was classified as poor, 0.50–0.69 as low, 0.70–0.90 as moderate, and >0.90 as high; for Spearman correlation coefficient, we took 0–0.19 as very weak, 0.2–0.39 as weak, 0.4–0.59 as moderate, 0.6–0.79 as strong, and 0.8–1 as a very strong correlation. As Hakizimana et al did not specify cutoffs for the Pearson correlation coefficient; we used <0.10 as negligible, 0.10–0.39 as weak, 0.40–0.69 as moderate, 0.70–0.89 as strong, and >0.90 as very strong. For other measures, we made a subjective assessment informed by the above cutoffs. We considered a P value ≤ .01 as constituting statistical significance.

We considered a score to be sufficiently validated if at least 2 external validation studies with a low risk of bias rating (as assessed by PROBAST) had assessed the criterion-concurrent, convergent, and/or discriminative validity for at least 2 separate outcomes each, and that performed at least moderately for each outcome. To identify promising scores (ie, scores that are currently insufficiently validated), we made a subjective assessment based on the scores that were deemed to be most likely to be sufficiently validated.

RESULTS

Descriptive Statistics

Initial searches produced 7391 results (Figure 1), of which 59 articles were identified for full-text screening after title and abstract screening. Of these, 24 were included. Our updated search yielded 1150 results, of which 30 articles were identified for full-text screening after title and abstract screening. Of these, 6 were included.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart. Abbreviations: ICU, intensive care unit; PICU, pediatric intensive care unit; RSV, respiratory syncytial virus.
Figure 1.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart. Abbreviations: ICU, intensive care unit; PICU, pediatric intensive care unit; RSV, respiratory syncytial virus.

Two additional relevant articles were identified through snowballing. As such, overall 32 articles were included, comprising 31 unique scores (Supplementary Table 2) [18–49]. The vast majority of the included studies used a prospective design (n = 27), most commonly a cohort study (n = 22), and the remaining 5 studies used either a purely retrospective design (n = 4) or combination of retrospective and prospective design (n = 1).

Four studies developed a new score, of which 1 included external validation in the same publication; the remaining 28 studies validated existing scores. Eight studies were multicenter studies. Twenty-five studies used data collected in secondary care, including 3 studies that also made use of data from the community; the remaining 6 studies used data collected in tertiary care, including 1 study that also made use of data from the community.

The most frequently used scores were the modified Tal (mTal) score and Wang Bronchiolitis Severity Score (WBSS), each of which was used in 5 studies. Four studies used the Bronchiolitis Score of Sant Joan de Déu (BROSJOD) and the Wood–Downes–Ferrés score (WDF); 3 studies used the Global Respiratory Severity Score (GRSS). The Bronchiolitis Severity Score (BSS), Escala de Severidad de la Bronquiolitis Aguda (ESBA), Freire model, modified Respiratory Index Score (mRIS), and modified Wood clinical asthma score (mWCAS) were each used in 2 studies. The remaining 21 scores were only evaluated once. Although Raita et al [42] claimed to use the Freire model—a model developed by Freire et al [30]—they excluded 1 of the parameters included in the original Freire model, so we considered it as a separate score and referred to it as the modified Freire model.

Most commonly, discriminative validity was assessed (n = 24). Sixteen studies assessed convergent validity and 4 criterion-concurrent validity.

Seven articles used data from Spain; 5 from the United States; 4 from Israel; 2 each from Australia, France, Singapore, and Turkey; and 1 each from Canada, Colombia, Egypt, India, Ireland, Japan, New Zealand, Portugal, and the United Kingdom. The vast majority of the included data were from high-income countries (n = 28); only 3 studies used data from upper-middle-income countries (Turkey [n = 2] and Colombia [n = 1]), and 2 used data from a lower-middle-income country (Egypt and India). No included articles used data from any low-income country.

Severity Score Components

For 27 of the scores, we were able to identify the parameters used; however, we were unable to identify all of the parameters used in the 4 machine learning models proposed by Raita et al [42], as the authors only mention the 15 most important predictors. There was significant variation in the parameters used by each severity score model. After grouping synonymous terms (eg, respiratory rate and respiratory frequency), 52 unique parameters were included in the scores (Supplementary Table 3).

The mean number of parameters in each score was 5 (range, 3–10). Most commonly included was respiratory rate (n = 21); the next most common parameters included retractions (n = 13), oxygen saturation (n = 12), wheezing (n = 11), and heart rate (n = 6). The majority of parameters were used ≤3 times (n = 41).

Discriminative Validity

Twenty-four of the studies assessed the discriminative validity of the scores, mostly by assessing their ability to discriminate between those discharged or admitted to the hospital, and between those admitted to the pediatric intensive care unit (PICU) and those hospitalized but not admitted to the PICU. The WBSS and BROSJOD were assessed in 5 articles, the WDF and mTal score in 3 articles, and the WDF, ESBA, Freire, GRSS, mRIS, and mWCAS in 2 articles; the remaining 14 scores were only evaluated once.

Anıl et al [20] reported that hospitalized patients had significantly higher WBSS than those discharged, as assessed by an odds ratio (OR). There were significant differences between those classified as mild, moderate, and severe (according to the WBSS) and a control group, for the pulse rate, respiratory rate, and oxygen saturation. They also reported significant differences in the pH and partial pressure of carbon dioxide between those with a severe WBSS score compared to the control group and mild and moderate bronchiolitis severity group. De Rose et al [26] reported high discriminative validity of the WBSS, as assessed by the AUROC, at predicting the need for respiratory support. They additionally reported statistically significant higher median WBSS in those needing respiratory support and those on nasal continuous positive airway pressure versus those on high-flow nasal cannula. Kubota et al [37] found that the WBSS had a moderate discriminative validity at differentiating among those hospitalized who required respiratory support. They additionally reported that the median WBSS score among those hospitalized who required respiratory support was modestly statistically significantly higher. Jacob et al [35] reported that the WBSS was moderately associated with nasogastric tube feeding according to its OR, but this result was not statistically significant (ie, P > .01). They also reported that the WBSS did not significantly predict desaturation days during hospitalization. Somech et al [49] reported statistically significant differences in the mean WBSS among those who were ambulatory, hospitalized, and admitted to the PICU.

Balaguer et al [21] found that the BROSJOD score had a moderate validity, as assessed by its volume under the surface, at discriminating by expert classification at admission, and a high validity after 24 and 48 hours. They also found statistically significant associations between the score and hospital length of stay (LOS), PICU LOS, and need for invasive mechanical ventilation; however, they found no association with need for noninvasive ventilation. Broadly consistent with these findings, Ricart et al [44] found large statistically significant differences in the mean LOS, days of oxygen therapy, days of nasogastric tube feeding, and maximum mean fraction of inspired oxygen among those with a more severe BROSJOD score. There were also large statistically significant differences in the percentage of those with a more severe BROSJOD score who were admitted to the PICU or required ventilation. In addition, Rodriguez-Gonzalez et al [46] reported that the BROSJOD score had a moderate ability at discriminating by need for respiratory support but did not significantly correlate with PICU admission. Granda et al [34] reported that the BROSJOD score had moderate ability at predicting of any admission, need for supplemental oxygen, PICU admission within the next 48 hours, or death.

Bueno-Campaña et al [22] found that a high WDF score was moderately correlated with the need for respiratory support as assessed by its relative risk. Granda et al [34] found the WDF to have a moderate discriminative ability for predicting for a range of relevant outcomes. Similarly, Rivas-Juesas et al [45] reported that the WDF and ESBA score at admission both had a moderate ability at discriminating between those classified as severe and nonsevere. They also found the mean WDF and ESBA score at admission in the severe and nonsevere group to be statistically significantly higher. However, Ramos-Fernández et al [43] reported that the ESBA score at admission only had a poor ability at discriminating by admission to the PICU, but that the highest ESBA score was highly discriminative.

Caserta et al [23] reported a high discriminative validity of the GRSS, as assessed by its AUROC, at predicting admission and similar results when a subgroup analysis was conducted in those aged ≤3 months and 3–10 months. Unfortunately, however, they did not report the confidence intervals (CIs). They also found a statistically significant difference in mean GRSS among those admitted to the PICU and those hospitalized but not admitted to the PICU. When externally validated by Kubota et al [37], they found that the GRSS (as well as the WBSS) had a moderate discriminative validity at differentiating among those hospitalized who required respiratory support. They additionally reported that the median GRSS (and WBSS) score among those hospitalized who required respiratory support was modestly statistically significantly higher. Similarly, De Rose et al [26] reported a strong discriminative validity of the GRSS at predicting the need for respiratory support; however, they also found that the median GRSS of those needing nasal continuous positive airway pressure versus high-flow nasal cannula were statistically insignificant.

McCallum et al [39] reported that the mTal had a low-moderate discriminative ability as measured by the point estimate of the AUROC at predicting oxygen need at 12 hours and 24 hours; however, the CIs of the AUROCs were so wide, we ignored their results. When mTal was externally validated by Golan-Tripto et al [33], it was found overall to have a moderate discriminative validity at differentiating based on need for oxygen support and hospital LOS ≥72 hours. Notably, the discriminative validity for oxygen support (but not hospital LOS) was statistically significantly higher among those with greater experience. Similarly, Granda et al [34] found mTal to have a moderate ability for predicting for a range of relevant outcomes.

Chong et al [24] reported that the mRIS, a modified version of the Tal score (albeit different from the mTal score), had a fair ability at discriminating between those who required noninvasive respiratory support, but a poor ability at discriminating by admission, intravenous hydration, and LOS ≥2 days. Another publication [25] using a subset of the same dataset similarly reported a poor ability of the mRIS at discriminating by admission.

Freire et al [30] reported that their model had a moderate ability at discriminating among those hospitalized who required escalated care and those who did not; the performance was similar when internally validated using bootstrap validation. External validation by Granda et al [34] similarly found moderate ability of the Freire model for predicting for a range of relevant outcomes. When a modified version of the Freire model was evaluated by Raita et al [42], it was found to have a low ability at discriminating by positive pressure ventilation and intensive treatment use. Raita et al [42] also reported validity data for the 4 machine learning models they developed; all of the models had moderate discriminative ability at discriminating by positive pressure ventilation use and intensive treatment use.

Duarte-Dorado et al [28] reported statistically significant, albeit modest, differences in median mWCAS among patients at admission and discharge, and those hospitalized who required admission to the PICU. Granta et al [34] reported that the mWCAS, as assessed by AUROC, had a moderate ability at differentiating for a range of relevant outcomes.

Abbate et al [18] reported a statistically significant but weak correlation between the Modified WBSS and LOS. Amat et al [19] reported that the Wainwright severity score on admission had a moderate association with hospitalization (assessed using an unadjusted OR) and that those admitted to the PICU had a statistically significantly higher severity score compared to those hospitalized but not admitted to the PICU. Univariate analysis also identified a correlation with need for intensive care (but the magnitude was not reported) but not with LOS. De Rose et al [26] reported a strong discriminative validity of the Kristjánsson Respiratory Score (KRS) at predicting the need for respiratory support. Destino et al [27] reported a low discriminative ability, as assessed by its AUROC, for the Children's Hospital of Wisconsin Respiratory score (CHWRS) and Respiratory Distress Assessment Instrument (RDAI) at predicting admission. Garcia-Mauriño et al [32] reported fair discriminative validity of the Clinical Disease Severity Score at predicting admission, need for oxygen, need for positive pressure ventilation, and PICU admission. Granda et al [34] reported that the Respiratory Severity Score, Respiratory Clinical Score, Respiratory Score, and Bronchiolitis Risk of Admission Score had moderate ability at differentiating for a range of outcomes with no significant difference between the different scores. Krishna et al [36] reported a statistically significant association between the BSS and the type of respiratory support as well as significant differences in the heart rate and oxygen saturation between those classified as mild or moderate based on the BSS score. Özkaya et al [41] reported that the mBSS, a modified version of the WBSS, was moderately associated with admission, as assessed by the AUROC.

Convergent Validity

Seventeen studies assessed convergent validity; only the mTal, BROSJOD, WBSS, and GRSS scores were assessed more than once.

El Basha et al [29] found a strong correlation, as measured by the Spearman correlation coefficient, between the mTal and the duration of oxygen therapy; the correlation was statistically significantly stronger in term infants compared to preterm infants. Golan-Tripto et al [33] found the mTal to moderately correlate with duration of oxygen therapy and hospital LOS, but also reported significant variation by clinical severity. However, McCallum et al [39] reported only a weak correlation between the mTal score and hospital LOS.

Anıl et al [20] reported that WBSS moderately correlated with hospital LOS, whereas DeRose et al [26] reported only a very weak correlation between WBSS (as well as KRS) and LOS. Jacob et al [35] reported that the WBSS was the greatest predictor of hospital LOS, but a quantitative measure of its predicative ability was not reported; regardless, this finding was overall insignificant (ie, P > .01).

Caserta et al [23] found the GRSS to be moderately correlated with hospital LOS, whereas DeRose et al [26] found them to be very weakly correlated.

Balaguer et al [21] also reported that the Wood–Downes score strongly correlated with the BROSJOD score at admission, 24 hours, and 48 hours. They also reported that it significantly correlated with hospital and PICU LOS, although the magnitude was not reported. Rodriguez-Gonzalez et al [46] found the BROSJOD score to be moderately correlated with hospital LOS and duration of respiratory support, but not to correlate with PICU LOS.

Abbate et al [18] reported a significant weak correlation coefficient between the Modified WBSS and LOS. Amat et al [19] reported that the initial Wainwright severity score was not significantly correlated with hospital LOS on univariate analysis. Destino et al [27] found both the CHWRS and RDAI at admission to not correlate with LOS. Duarte-Dorado et al [28] found the mWCAS and Tal score to be strongly correlated at both admission and discharge. Marguet et al [38] found the Clinical Asthma Score to be only weakly correlated with hospital LOS. Rivas-Juesas et al [45] found the ESBA and WDF scores to be weakly correlated with each other. Siraj et al [48] reported that the BSS was not correlated with hospital LOS, weight-adjusted high-flow nasal canula flow rate, or duration of high-flow nasal canula therapy. McGinley et al [40] reported that the ReSViNet score was positively correlated with PICU admission, mechanical ventilation, hospitalization, and respiratory support requirement; however, they did not numerically report the magnitude of the association.

Criterion-Concurrent Validity

Only 4 studies assessed criterion-concurrent validity. Balaguer et al [21] reported a strong correlation, unusually assessed via the Kappa index, between the BROSJOD score and expert opinion at admission, 24 hours, and 48 hours. Gal et al [31] reported that the modified RDAI was correlated with transcutaenous partial pressure of carbon dioxide; this correlation remained after controlling for venous partial pressure of carbon dioxide and weight. Shete et al [47] reported the mTal score to be strongly correlated with oxygen saturation. Krishna et al [36] reported that the BSS was significantly associated with the Lung Ultrasound Score but did not report the magnitude.

TRIPOD: Quality of Reporting

The quality of reporting of the included articles, as assessed by the TRIPOD score of the included articles, was poor; the mean TRIPOD score was 52% (see Supplementary Table 2 for overall TRIPOD scores, and Supplementary Annex 2 for detailed TRIPOD scores). The reporting of model calibration, information around missing data, and summary characteristics of candidate predictors/score parameters was particularly poor.

PROBAST: Risk of Bias and Applicability

The overall risk of bias and applicability classifications, as assessed using the PROBAST framework, for each included article is listed in Supplementary Table 4 (see Supplementary Annex 3 for detailed PROBAST scores). All of the included articles had either serious methodological issues, most commonly in their analysis, or a poor quality of reporting, such that a judgment of the quality could not be made. The major methodological issues were small sample sizes, specifically with the datasets including few participants with the outcomes being predicted for, and as noted above, lack of sufficient reporting of calibration measures, quantity of missing data, and procedures for missing data.

DISCUSSION

We identified 31 unique scores from 32 articles and found that none of the identified scores were sufficiently validated. Across all 3 domains, the most promising score was the BROSJOD score; however, it does require further validation. The mTal score was the next best validated score. It is relevant to note the high degree of similarity in the parameters in these 2 scores. The methodological quality of all the included studies and the quality of reporting, systematically assessed using the PROBAST and TRIPOD checklists, respectively, was poor. The most commonly used score, the RDAI score, had very weak discriminative ability (borderline poor) and only weak convergent-criterion validity; we do not recommend further effort being taken to validate this score or its use.

Our finding that there is no sufficiently validated score is consistent with all of the previous reviews. The most promising scores we identified, namely BROSJOD and mTal, were similarly identified by Hakizimana et al [7]; they, however, also concluded that the Tal score and the Liverpool Infant Bronchiolitis Severity Score (LIBSS) (see below) were promising. In comparison to the reviews by Bekhof et al [3] and Rodríguez-Martínez et al [6], we included far fewer articles (and scores); the former included 60 articles (36 scores) and the latter included 77 articles (32 scores), whereas, as mentioned above, we included 31 articles (32 scores). This was primarily due to our more stringent inclusion criteria and our specific focus only on validity data rather than data reporting on the responsiveness, usability, or reliability of the scores. In contrast, however, we included >3 times the number of articles included by Hakizimana and colleagues’ rapid review [7] and Luarte-Martínez and colleagues' systematic review [5]. Our findings on the geographic distribution of the data sources used to validate these scores concurs with the findings of Hakizimana et al [7], namely that the vast majority of these validation efforts were conducted in high-income countries. However, the best validated scores identified above seem feasible to implement in low-resource settings.

During the course of our searches, an additional promising score, the LIBSS, was identified, but unfortunately no studies evaluating its validity met our inclusion criteria. The LIBSS was developed as a part of a PhD dissertation based on a comprehensive literature review, consultations with stakeholders, Delphi exercise, and usability assessment and then subsequently validated in a multicenter (n = 11) prospective cohort study; however, no peer-reviewed full-text article reporting on the results of the validation study was identified [50].

There are some limitations of this review. The major limitations of our review were the restriction of included articles to only those published in English and not searching the grey literature; this likely means that some relevant articles may not have been included.

Further research is required to externally validate the BROSJOD, mTal, and LIBSS scores, ideally in low-income countries and in primary care settings. The study designs should be guided by the PROBAST checklist or other similar tools and report their findings in accordance with the TRIPOD checklist or other similar tools to ensure the studies are both well designed and communicated. Given that there are a number of promising scores, the scientific community should initially focus on validating or improving these scores and only, if necessary, work on proposing new scores. Additionally, ideally when assessing the validity of these scores, it would be useful if analyses were also done with a threshold on the time of the outcome assessment (eg, discriminative validity of a score at predicting intensive care unit admission within 24 hours of taking the score), as the course of the disease is not always linear and may lead to systematic underestimation or overestimation of the actual validity of the score.

Supplementary Data

Supplementary materials are available at The Journal of Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.

Notes

PROMISE investigators. Harish Nair, Harry Campbell, and Richard Osei-Yeboah (University of Edinburgh); John Paget (Nivel); Philippe Beutels (University of Antwerp); Anne Teirlinck (National Institute for Public Health and the Environment, the Netherlands); Hanna Nohynek (Terveyden ja hyvinvoinnin laitos); Louis Bont (University Medical Center Utrecht); Andrew Pollard (University of Oxford); Peter Openshaw (Imperial College London); You Li (Nanjing Medical University); Jeroen Aerssens and Gabriela Ispas (Janssen); Veena Kumar (Novavax); Tin Htar, Elizabeth Begier, and Jessica Atwell (Pfizer); Charlotte Vernhes, Rolf Kramer, and Mathieu Bangert (Sanofi Pasteur); Gaël Dos Santos, Rachel Cohen, and Theo Last (GSK); Bahar Ahani (AstraZeneca); and Nuria Machin (TeamIT).

Author contributions. H. N. conceived the idea and served as third-person arbitrator. Z. S. and E. P. conducted the review. Z. S. authored the manuscript. E. P., Y. L., R. A. C., G. D. S., L. B., and H. N. commented critically on several drafts of the manuscript. The PROMISE investigators reviewed the manuscript prior to submission.

Disclaimer. This publication only reflects the authors’ views; the Joint Undertaking is not responsible for any use that may be made of the information contained herein.

Financial support. This project has received funding from the Innovative Medicines Initiative (grant agreement number 101034339). This Joint Undertaking receives support from the Horizon 2020 Framework Programme research and innovation program and the European Federation of Pharmaceutical Industries and Associations.

Supplement sponsorship. This article appears as part of the supplement “Preparing Europe for Introduction of Immunization Against RSV: Bridging the Evidence and Policy Gap.”

References

1

Borchers
AT
,
Chang
C
,
Gershwin
ME
,
Gershwin
LJ
.
Respiratory syncytial virus—a comprehensive review
.
Clin Rev Allergy Immunol
2013
;
45
:
331
79
.

2

Li
Y
,
Wang
X
,
Blau
DM
, et al.
Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in children younger than 5 years in 2019: a systematic analysis
.
Lancet
2022
;
399
:
2047
64
.

3

Bekhof
J
,
Reimink
R
,
Brand
PLP
.
Systematic review: insufficient validation of clinical scores for the assessment of acute dyspnoea in wheezing children
.
Paediatr Respir Rev
2014
;
15
:
98
112
.

4

Streinmer
DL
.
A checklist for evaluating the usefulness of rating scales
.
Can J Psychiatry
1993
;
38
:
140
8
.

5

Luarte-Martínez
S
,
Rodríguez-Núñez
I
,
Astudillo
P
,
Manterola
C
.
Psychometric properties of scales used for grading the severity of bronchial obstruction in pediatrics: a systematic review and meta-analysis
.
Arch Argent Pediatr
2017
;
115
:
241
8
.

6

Rodríguez-Martínez
CE
,
Sossa-Briceño
MP
,
Nino
G
.
Systematic review of instruments aimed at evaluating the severity of bronchiolitis
.
Paediatr Respir Rev
2018
;
25
:
43
57
.

7

Hakizimana
B
,
Saint
G
,
van Miert
C
,
Cartledge
P
.
Can a respiratory severity score accurately assess respiratory distress in children with bronchiolitis in a resource-limited setting?
J Trop Pediatr
2020
;
66
:
234
43
.

8

Roberts
JN
,
Graham
BS
,
Karron
RA
, et al.
Challenges and opportunities in RSV vaccine development: meeting report from FDA/NIH workshop
.
Vaccine
2016
;
34
:
4843
9
.

9

Mazur
NI
,
Higgins
D
,
Nunes
MC
, et al.
The respiratory syncytial virus vaccine landscape: lessons from the graveyard and promising candidates
.
Lancet Infect Dis
2018
;
18
:
e295
311
.

10

Öner
D
,
Drysdale
SB
,
McPherson
C
, et al.
Biomarkers for disease severity in children infected with respiratory syncytial virus: a systematic literature review
.
J Infect Dis
2020
;
222
:
S648
57
.

11

Veritas Health Innovation
.
Covidence systematic review software
. www.covidence.org. Accessed 1 September 2023.

12

Sheikh
Z
,
Potter
E
,
Li
Y
, et al.
Validity of clinical severity scores for respiratory syncytial virus: a systematic review—data extraction sheet. Edinburgh DataShare. University of Edinburgh.
2023
. https://datashare.ed.ac.uk/handle/10283/4804. Accessed 24 March 2023.

13

World Bank
.
World Bank country and lending groups—World Bank data help desk.
2022
. https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups. Accessed 1 September 2023.

14

Wolff
RF
,
Moons
KGM
,
Riley
RD
, et al.
PROBAST: a tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med
2019
;
170
:
51:W1–33
.

15

Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
K
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
Ann Intern Med
2015
;
162
:
735–6
.

16

Moons
KGM
,
Altman
DG
,
Reitsma
JB
, et al.
Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): explanation and elaboration
.
Ann Intern Med
2015
;
162
:
W1
.

17

Moons
KGM
,
Wolff
RF
,
Riley
RD
, et al.
PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration
.
Ann Intern Med
2019
;
170
:
W1–33
.

18

Abbate
F
,
Depietri
G
,
Tinelli
C
, et al.
Impact of the publication of the Italian guidelines for bronchiolitis on the management of hospitalized children in Pisa, Italy
.
Pediatr Pulmonol
2023
;
58
:
2267
74
.

19

Amat
F
,
Henquell
C
,
Verdan
M
,
Roszyk
L
,
Mulliez
A
,
Labbé
A
.
Predicting the severity of acute bronchiolitis in infants: should we use a clinical score or a biomarker?
J Med Virol
2014
;
86
:
1944
52
.

20

Anıl
M
,
Göç
Z
,
Avcı
R
, et al.
B-type natriuretic peptide is a useful biomarker predicting disease severity in children with isolated bronchiolitis in the emergency department
.
Turk J Pediatr
2017
;
59
:
561
.

21

Balaguer
M
,
Alejandre
C
,
Vila
D
, et al.
Bronchiolitis score of Sant Joan de Déu: BROSJOD score, validation and usefulness
.
Pediatr Pulmonol
2016
;
52
:
533
9
.

22

Bueno-Campaña
M
,
Sainz
T
,
Alba
M
, et al.
Lung ultrasound for prediction of respiratory support in infants with acute bronchiolitis: a cohort study
.
Pediatr Pulmonol
2019
;
54
:
873
80
.

23

Caserta
MT
,
Qiu
X
,
Tesini
B
, et al.
Development of a global respiratory severity score for respiratory syncytial virus infection in infants
.
J Infect Dis
2017
;
215
:
750
6
.

24

Chong
S-L
,
Teoh
OH
,
Nadkarni
N
, et al.
The modified Respiratory Index Score (RIS) guides resource allocation in acute bronchiolitis
.
Pediatr Pulmonol
2017
;
52
:
954
61
.

25

Chong
S-L
,
Lai
OF
,
Castillo
L
, et al.
Nasal high-mobility group box 1 and caspase in bronchiolitis
.
Pediatr Pulmonol
2018
;
53
:
1627
32
.

26

De Rose
DU
,
Maddaloni
C
,
Martini
L
,
Braguglia
A
,
Dotta
A
,
Auriti
C
.
Comparison of three clinical scoring tools for bronchiolitis to predict the need for respiratory support and length of stay in neonates and infants up to three months of age
.
Front Pediatr
2023
;
11
:
1040354
.

27

Destino
L
,
Weisgerber
MC
,
Soung
P
, et al.
Validity of respiratory scores in bronchiolitis
.
Hosp Pediatr
2012
;
2
:
202
9
.

28

Duarte-Dorado
DM
,
Madero-Orostegui
DS
,
Rodriguez-Martinez
CE
,
Nino
G
.
Validation of a scale to assess the severity of bronchiolitis in a population of hospitalized infants
.
J Asthma
2013
;
50
:
1056
61
.

29

El Basha
NR
,
Marzouk
H
,
Sherif
MM
,
El Kholy
AA
.
Prematurity, a significant predictor for worse outcome in viral bronchiolitis: a comparative study in infancy
.
J Egypt Public Health Assoc
2019
;
94
:
15
.

30

Freire
G
,
Kuppermann
N
,
Zemek
R
, et al.
Predicting escalated care in infants with bronchiolitis
.
Pediatrics
2018
;
142
:
e20174253
.

31

Gal
S
,
Riskin
A
,
Chistyakov
I
,
Shifman
N
,
Srugo
I
,
Kugelman
A
.
Transcutaneous PCO2 monitoring in infants hospitalized with viral bronchiolitis
.
Eur J Pediatr
2014
;
174
:
319
24
.

32

Garcia-Mauriño
C
,
Moore-Clingenpeel
M
,
Thomas
J
, et al.
Viral load dynamics and clinical disease severity in infants with respiratory syncytial virus infection
.
J Infect Dis
2018
;
219
:
1207
15
.

33

Golan-Tripto
I
,
Goldbart
A
,
Akel
K
,
Dizitzer
Y
,
Novack
V
,
Tal
A
.
Modified Tal score: validated score for prediction of bronchiolitis severity
.
Pediatr Pulmonol
2018
;
53
:
796
801
.

34

Granda
E
,
Urbano
M
,
Andrés
P
,
Corchete
M
,
Garcinuño
AC
,
Velasco
R
.
Comparison of severity scales for acute bronchiolitis in real clinical practice
.
Eur J Pediatr
2023
;
182
:
1619
26
.

35

Jacob
R
,
Bentur
L
,
Brik
R
,
Shavit
I
,
Hakim
F
.
Is capnometry helpful in children with bronchiolitis?
Respir Med
2016
;
113
:
37
41
.

36

Krishna
D
,
Khera
D
,
Toteja
N
, et al.
Point-of-care thoracic ultrasound in children with bronchiolitis
.
Indian J Pediatr
2022
;
89
:
1079
85
.

37

Kubota
J
,
Hirano
D
,
Okabe
S
, et al.
Utility of the Global Respiratory Severity Score for predicting the need for respiratory support in infants with respiratory syncytial virus infection
.
PLoS One
2021
;
16
:
e0253532
.

38

Marguet
C
,
Lubrano
M
,
Gueudin
M
, et al.
In very young infants severity of acute bronchiolitis depends on carried viruses
.
PLoS One
2009
;
4
:
e4596
.

39

McCallum
GB
,
Morris
PS
,
Wilson
CC
, et al.
Severity scoring systems: are they internally valid, reliable and predictive of oxygen use in children with acute bronchiolitis?
Pediatr Pulmonol
2013
;
48
:
797
803
.

40

McGinley
JP
,
Lin
GL
,
Öner
D
, et al.
Clinical and viral factors associated with disease severity and subsequent wheezing in infants with respiratory syncytial virus infection
.
J Infect Dis
2022
;
226
:
S45
54
.

41

Özkaya
AK
,
Yilmaz
HL
,
Kendir
ÖT
,
Gökay
SS
,
Eyüboğlu
İ
.
Lung ultrasound findings and bronchiolitis ultrasound score for predicting hospital admission in children with acute bronchiolitis
.
Pediatr Emerg Care
2018
;
36
:
e135
42
.

42

Raita
Y
,
Camargo
CA
,
Macias
CG
, et al.
Machine learning-based prediction of acute severity in infants hospitalized for bronchiolitis: a multicenter prospective study
.
Sci Rep
2020
;
10
:
10979
.

43

Ramos-Fernández
JM
,
Piñero-Domínguez
P
,
Abollo-López
P
, et al.
Validation study of an acute bronchiolitis severity scale to determine admission to a paediatric intensive care unit
.
An Pediatr (Engl Ed)
2018
;
89
:
104
10
.

44

Ricart
S
,
Marcos
MA
,
Sarda
M
, et al.
Clinical risk factors are more relevant than respiratory viruses in predicting bronchiolitis severity
.
Pediatr Pulmonol
2012
;
48
:
456
63
.

45

Rivas-Juesas
C
,
Rius Peris
JM
,
García
AL
, et al.
A comparison of two clinical scores for bronchiolitis. A multicentre and prospective study conducted in hospitalised infants
.
Allergol Immunopathol
2018
;
46
:
15
23
.

46

Rodriguez-Gonzalez
M
,
Rodriguez-Campoy
P
,
Estalella-Mendoza
A
,
Castellano-Martinez
A
,
Flores-Gonzalez
JC
.
Characterization of cardiopulmonary interactions and exploring their prognostic value in acute bronchiolitis: a prospective cardiopulmonary ultrasound study
.
Tomography
2022
;
8
:
142
57
.

47

Shete
S
,
Nagori
G
,
Nagori
P
,
Hamid
M
.
Relation between pulse oximetry and clinical score in infants with acute bronchiolitis
.
Natl J Physiol Pharm Pharmacol
2014
;
4
:
124
.

48

Siraj
S
,
Stark
W
,
McKinley
SD
,
Morrison
JM
,
Sochet
AA
.
The bronchiolitis severity score: an assessment of face validity, construct validity, and interobserver reliability
.
Pediatr Pulmonol
2021
;
56
:
1739
44
.

49

Somech
R
,
Tal
G
,
Gilad
E
,
Mandelberg
A
,
Tal
A
,
Dalal
I
.
Epidemiologic, socioeconomic, and clinical factors associated with severity of respiratory syncytial virus infection in previously healthy infants
.
Clin Pediatr
2006
;
45
:
621
7
.

50

van Miert
C
.
Measuring clinical severity in infants with bronchiolitis.
2015
. https://livrepository.liverpool.ac.uk/2037906/1/vanMiertCla_June2015_2037906.pdf. Accessed 20 August 2022.

Author notes

Potential conflicts of interest. Y. L. has received funding from the Wellcome Trust and GSK, outside the submitted work; and has received consulting fees from Pfizer. R. A. C. is an employee of the GSK group of companies, holds shares in the GSK group of companies, and has received other compensation from GSK, outside the submitted work; has received support from Westat (former employer) for attending meetings/travel; and has held stock or stock options from Westat. G. D. S. is an employee of the GSK group of companies and hold shares as part of his annual remunerations. L. B. has received funding through University Medical Center Utrecht from AbbVie, Janssen, the Bill & Melinda Gates Foundation, Nutricia Danon, MeMed Diagnostics, GSK, Novavax, AstraZeneca, Sanofi, Ablynx, Bavaria Nordic, MabXience, Novavax, and Pfizer. H. N. has received funding from the Innovative Medicines Initiative (IMI) (grant to institution), the National Institute for Health and Care Research, Icosavax, and Pfizer; has received consulting fees from the World Health Organization (WHO), Pfizer, the Bill & Melinda Gates Foundation, and Sanofi; has received honoraria from AbbVie for educational events; has received support for attending meetings/travel from Sanofi; and has served on a data and safety monitoring board or advisory board for GSK, Sanofi, Merck, WHO, Janssen, Novavax, ReSVinet, Icosavax, and Pfizer. Z. S. has received support for attending meetings/travel from PROMISE/IMI to attend the PROMISE AGM and 12th International RSV Symposium; and is a shareholder of Evidence-Based Health Care Ltd. E. P. reports no potential conflicts of interest.

All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Supplementary data