-
PDF
- Split View
-
Views
-
Cite
Cite
Anthony D Bai, Adam S Komorowski, Carson K L Lo, Pranav Tandon, Xena X Li, Vaibhav Mokashi, Anna Cvetkovic, Vanessa R Kay, Aidan Findlater, Laurel Liang, Mark Loeb, Dominik Mertz, for the McMaster Infectious Diseases Fellow Research Group, Methodological and Reporting Quality of Noninferiority Randomized Controlled Trials Comparing Antibiotic Therapies: A Systematic Review, Clinical Infectious Diseases, Volume 73, Issue 7, 1 October 2021, Pages e1696–e1705, https://doi.org/10.1093/cid/ciaa1353
- Share Icon Share
Abstract
Antibiotic noninferiority randomized controlled trials (RCTs) are used for approval of new antibiotics and making changes to antibiotic prescribing in clinical practice. We conducted a systematic review to assess the methodological and reporting quality of antibiotic noninferiority RCTs.
We searched MEDLINE, Embase, the Cochrane Database of Systematic Reviews, and the Food and Drug Administration drug database from inception until November 22, 2019, for noninferiority RCTs comparing different systemic antibiotic therapies. Comparisons between antibiotic types, doses, administration routes, or durations were included. Methodological and reporting quality indicators were based on the Consolidated Standards of Reporting Trials reporting guidelines. Two independent reviewers extracted the data.
The systematic review included 227 studies. Of these, 135 (59.5%) studies were supported by pharmaceutical industry. Only 83 (36.6%) studies provided a justification for the noninferiority margin. Reporting of both intention-to-treat (ITT) and per-protocol (PP) analyses were done in 165 (72.7%) studies. The conclusion was misleading in 34 (15.0%) studies. The studies funded by pharmaceutical industry were less likely to be stopped early because of logistical reasons (3.0% vs 19.1%; odds ratio [OR] = 0.13; 95% confidence interval [CI], .04–.37) and to show inconclusive results (11.1% vs 42.9%; OR = 0.17; 95% CI, .08–.33). The quality of studies decreased over time with respect to blinding, early stopping, reporting of ITT with PP analysis, and having misleading conclusions.
There is room for improvement in the methodology and reporting of antibiotic noninferiority trials. Quality can be improved across the entire spectrum from investigators, funding agencies, as well as during the peer-review process.
There is room for improvement in the methodology and reporting of antibiotic noninferiority trials including justification of noninferiority margin, reporting of intention-to-treat analysis with per-protocol analysis, and having conclusions that are concordant with study results.
Clinical Trials Registration PROSPERO registration number CRD42020165040.
Noninferiority randomized controlled trials (RCTs) are commonly used study designs to evaluate drug therapy [1]. They have unique methodological considerations including sample size calculations, a priori–defined noninferiority margins and inclusion of both intention-to-treat (ITT) and per-protocol (PP) analyses [2, 3]. Variation in the reporting of noninferiority RCTs can result in misinterpretation of results, overestimation of benefits, and underestimation of harms. A Consolidated Standards of Reporting Trials (CONSORT) statement was published as a guideline for the reporting of noninferiority RCTs [2, 3].
There is emerging evidence that the methodology and reporting quality of noninferiority RCTs have deficiencies when measured against the CONSORT standards. Major concerns include lack of blinding, no justification of noninferiority margin, reporting of only ITT analysis, and misleading conclusions [4–6].
In infectious diseases research, noninferiority RCTs are used to evaluate antibiotics. In a systematic review of noninferiority RCTs of drugs, approximately 20% of trials evaluated an anti-infective agent [6]. Almost all clinical trials for approval of new antibiotics were noninferiority RCTs in the past 30 years [7]. The US Food and Drug Administration (FDA) and European Medicines Agency have provided guidance on the use of noninferiority trials to support approval of antibacterial drug products [8, 9].
The use of only noninferiority trials for the evaluation and approval of new antibiotics have been controversial. There is a concern for lack of assay sensitivity, which is the ability of a study to distinguish between active and inactive treatment [10]. In a noninferiority trial, new treatment may be shown to be noninferior to control treatment. However, it is possible that both the new treatment and control treatment were no better than placebo, so the noninferiority trial cannot provide reliable evidence of effectiveness of a new therapy [10]. New antibiotics approved by noninferiority trials usually are from an existing antibiotic class and do not have better outcomes, reflecting stagnancy and lack of innovation [11]. In response to this, experts have suggested use of innovative superiority trial designs to evaluate new antibiotics [7, 12]. As well, the landscape of antibiotic research is evolving with proposed changes such as shift to nonprofit organizations [13] and prioritization of targets for drug development [14]. After considering and balancing feasibility, ethics, and incentive for the pharmaceutical industry, a noninferiority trial is nevertheless an acceptable trial design if done in a rigorous manner, as outlined in prior statements from the Infectious Diseases Society of America for different infection syndromes [15–17]. Therefore, it is all the more important for antibiotic noninferiority RCTs to be sound in methodology and reporting because they result in the approval of new antibiotics and changes to antibiotic prescribing in clinical practice and guidelines.
It is unclear if the methodological and reporting deficiencies and concerns found in noninferiority RCTs in general also exist in antibiotic noninferiority RCTs. The primary aim of this systematic review was to evaluate the methodological and reporting quality specifically in noninferiority RCTs on antibiotics.
METHODS
This review was conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (see checklist in Supplementary Text 1) [18]. The study protocol was registered with PROSPERO (CRD42020165040).
Data Sources and Selection Criteria
We searched MEDLINE, Embase, and the Cochrane Database of Systematic Reviews from inception to November 22, 2019. The detailed search strategy was developed with a research librarian (Supplementary Text 2). We used the FDA drugs database to supplement this search [19]. For new antibiotics that were approved, we read through the FDA antibiotic approvals and labels to find the clinical studies that supported the approval and were also published in journal articles.
We included studies published in English that were identified as noninferiority RCTs in humans comparing 2 or more systemic antibiotic regimens used to treat a bacterial infection. Studies were included if the treatment and comparison arms differed in terms of antibiotic type, dose, administration, and/or duration.
Trials comparing an antibiotic to placebo alone were excluded, as were commentaries, reviews, study protocols, secondary analysis, and conference proceedings. We excluded trial registrations where the results were not published in a journal article. Phase 2 and pilot studies were identified and excluded after full text reading.
Data Extraction
Reviewers screened abstracts after appropriate training to identify potentially relevant studies and extract full texts for reading. The first 300 abstracts that each reviewer screened were double checked by another independent reviewer for consistency. If consistent, the reviewer then screened abstracts independently. For the full text review, 2 independent reviewers read and extracted the data in duplicate onto a standardized extraction form. Disagreements between reviewers were resolved by discussion to reach consensus and, if necessary, adjudication by a third reviewer.
Variables Collected
Study characteristics of interest included journal, year of study, study center(s), study population, sample size, treatment arms, infectious disease syndrome, rationale for noninferiority design, funding, sample size calculation, outcomes, and interpretation of results. Funding was classified as supported by pharmaceutical industry, not supported by pharmaceutical industry, or unclear based on the conflict of interest and funding statements. Studies on new antibiotics were defined as studies that were conducted before or within 5 years of FDA approval of the antibiotic that the study was focused on.
We used a standardized definition to interpret the results that considers the point estimate, confidence interval (CI), and the noninferiority margin based on the FDA recommendations for superiority, noninferiority, inferiority, and inconclusive results (Figure 1) [8]. The study results were also deemed to be inconclusive if both ITT and PP analyses results were reported and inconsistent with one another. We compared our conclusion of study results to the author’s conclusions in the journal article. A misleading conclusion was defined as when the author’s conclusion was discordant to our conclusions following the FDA definition described previously [4].

Interpretation of point estimate and confidence interval. Bars indicate confidence interval. Δ is the noninferiority margin. Interpretation is as follows for the new treatment compared to old treatment: A is superior; B and C are noninferior; D and E are inconclusive, where noninferiority would be rejected; F meets the noninferiority margin criteria, but is actually inferior as the lower bound is above 0; and G and H are inferior.
Primary Outcomes
The primary outcomes were quality indicators specific to noninferiority RCTs, which were selected from the checklist in the CONSORT guidelines [2, 3] and prior systematic reviews on the quality of noninferiority trials [4–6]. For each quality indicator, the reviewer determined if it was present or absent [4–6]. Quality indicators included inclusion of noninferiority in title, blinding, specification of noninferiority margin, justification of noninferiority margin, type of CI used, concordance of type I error rate used with CI, early stopping, reporting of ITT with PP analyses, handling of missing data, and comparison to historical control. As well, we recorded the use of a figure to illustrate point estimate, CI, and noninferiority margin. “Double-blinded” was defined specifically as blinding of the participant and individuals providing care. Early stopping because of logistics was defined as stopping the study before reaching target sample size for logistical reasons such as funding or recruitment. This definition excludes early stopping based on interim analyses results that met the prespecified early stopping rule.
Risk of Bias
Two independent reviewers assessed the risk of bias in duplicate using the Cochrane Collaboration’s tool for assessing risk of bias in randomized trials [20].
Comparison
The primary comparison was based on pharmaceutical funding or no pharmaceutical funding. Secondary comparison was an examination of trend over time by 3 publication year periods: 2007 or before, 2008–2013, and 2014 or after. This was based on the CONSORT statement publication years of 2006 and 2012 [2, 3]. We judged that the cutoff time point of 2007 and 2013 allowed sufficient time for the authors and journals to implement the newly published guidelines.
Statistical Analysis
Descriptive analyses included number (percentage) for categorical variables and median (interquartile range) for continuous variables. Comparisons between groups were done with Fisher’s exact test for categorical variables and Wilcoxon rank-sum test for continuous variables. Odds ratio (OR) and its 95% CI were calculated for categorical variable using a logistic regression model. The studies included in the systematic review were heterogeneous in terms of clinical syndrome, antibiotics used, and outcomes. Therefore, a pooled meta-analysis was not attempted. All tests were 2-sided with a P < .05 significant level. All analyses were done with R, version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria).
RESULTS
Our literature search yielded 6017 records, which included 4009 unique abstracts. At the abstract screening stage, 3766 abstracts were excluded, leaving 243 studies for full text review. After full text review, 227 studies were included in the analysis (Figure 2). Over time, the number of antibiotics noninferiority trials has increased (Supplementary Table 1).

Study Characteristics
Of the 227 studies, the most common infectious disease syndromes studied were community-acquired pneumonia (15.9%), skin and soft-tissue infection (15.0%), and urinary tract infection (11.5%) (Table 1 and Supplementary Table 2). The most common rationale for noninferiority design included testing new antibiotics (44.1%), shorter duration (17.6%), and older antibiotics as alternative therapeutic options (16.3%).
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Adults only | 182 (80.2%) | 117 (86.7%) | 58 (69.1%) | 2.91 (1.49–5.82) | .0029 |
Multicenter | 203 (89.4%) | 132 (97.8%) | 66 (78.6%) | 12.00 (3.89–52.54) | <.0001 |
Sample size per group median (IQR) | 204 (114–313) | 254 (160–359) | 125 (70–245) | <.0001 | |
Infectious disease syndrome | <.0001 | ||||
CAP | 36 (15.9%) | 28 (20.7%) | 8 (9.5%) | 2.49 (1.12–6.11) | |
SSTI | 34 (15.0%) | 28 (20.7%) | 6 (7.1%) | 3.40 (1.43–9.45) | |
Urinary tract infection | 26 (11.5%) | 14 (10.4%) | 10 (11.9%) | 0.86 (.36–2.08) | |
Intra-abdominal infection | 21 (9.3%) | 16 (11.9%) | 5 (6.0%) | 2.12 (.80–6.71) | |
Tuberculosis | 14 (6.2%) | 1 (0.7%) | 12 (14.3%) | 0.04 (.002–.23) | |
Helicobacter pylori | 12 (5.3%) | 2 (1.5%) | 9 (10.7%) | 0.13 (.02–.50) | |
HAP or VAP | 12 (5.3%) | 11 (8.2%) | 0 (0%) | N/A | |
Other | 72 (31.7%) | 35 (25.9%) | 34 (40.5%) | 0.51 (.29–.92) | |
Rationale | |||||
New antibiotics | 100 (44.1%) | 97 (71.9%) | 2 (2.4%) | 104.66 (30.85–655.83) | <.0001 |
Shorter duration | 40 (17.6%) | 10 (7.4%) | 28 (33.3%) | 0.16 (.07–.34) | <.0001 |
Alternative option | 37 (16.3%) | 18 (13.3%) | 17 (20.2%) | 0.61 (.29–1.26) | .1885 |
Easier administration | 27 (11.9%) | 19 (14.1%) | 7 (8.3%) | 1.80 (.75–4.80) | .2827 |
PO instead of parenteral | 18 (7.9%) | 3 (2.2%) | 15 (17.9%) | 0.10 (.02–.33) | <.0001 |
Better safety | 10 (4.4%) | 3 (2.2%) | 6 (7.1%) | 0.30 (.06–1.15) | .0893 |
Lower cost | 9 (4.0%) | 2 (1.5%) | 7 (8.3%) | 0.17 (.02–.70) | .0290 |
Narrower spectrum | 9 (4.0%) | 0 (0%) | 8 (9.5%) | N/A | .0004 |
Primary outcome | .0086 | ||||
Clinical outcome | 185 (81.5%) | 119 (88.2%) | 60 (71.4%) | 2.98 (1.48–6.12) | |
Microbiological outcome | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | |
Both clinical and microbiological outcome | 4 (1.8%) | 2 (1.5%) | 2 (2.4%) | 0.62 (.07–5.22) | |
Other | 4 (1.8%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Recording of adverse events | 219 (96.5%) | 134 (99.3%) | 78 (92.9%) | 10.31 (1.72–196.57) | .0137 |
Conclusion based on results | <.0001 | ||||
Noninferiority shown | 157 (69.2%) | 109 (80.7%) | 44 (52.4%) | 3.81 (2.10–7.05) | |
Superiority shown | 5 (2.2%) | 5 (3.7%) | 0 (0%) | N/A | |
Inferiority shown | 10 (4.4%) | 6 (4.4%) | 4 (4.8%) | 0.93 (.26–3.73) | |
Inconclusive | 55 (24.2%) | 15 (11.1%) | 36 (42.9%) | 0.17 (.08–.33) |
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Adults only | 182 (80.2%) | 117 (86.7%) | 58 (69.1%) | 2.91 (1.49–5.82) | .0029 |
Multicenter | 203 (89.4%) | 132 (97.8%) | 66 (78.6%) | 12.00 (3.89–52.54) | <.0001 |
Sample size per group median (IQR) | 204 (114–313) | 254 (160–359) | 125 (70–245) | <.0001 | |
Infectious disease syndrome | <.0001 | ||||
CAP | 36 (15.9%) | 28 (20.7%) | 8 (9.5%) | 2.49 (1.12–6.11) | |
SSTI | 34 (15.0%) | 28 (20.7%) | 6 (7.1%) | 3.40 (1.43–9.45) | |
Urinary tract infection | 26 (11.5%) | 14 (10.4%) | 10 (11.9%) | 0.86 (.36–2.08) | |
Intra-abdominal infection | 21 (9.3%) | 16 (11.9%) | 5 (6.0%) | 2.12 (.80–6.71) | |
Tuberculosis | 14 (6.2%) | 1 (0.7%) | 12 (14.3%) | 0.04 (.002–.23) | |
Helicobacter pylori | 12 (5.3%) | 2 (1.5%) | 9 (10.7%) | 0.13 (.02–.50) | |
HAP or VAP | 12 (5.3%) | 11 (8.2%) | 0 (0%) | N/A | |
Other | 72 (31.7%) | 35 (25.9%) | 34 (40.5%) | 0.51 (.29–.92) | |
Rationale | |||||
New antibiotics | 100 (44.1%) | 97 (71.9%) | 2 (2.4%) | 104.66 (30.85–655.83) | <.0001 |
Shorter duration | 40 (17.6%) | 10 (7.4%) | 28 (33.3%) | 0.16 (.07–.34) | <.0001 |
Alternative option | 37 (16.3%) | 18 (13.3%) | 17 (20.2%) | 0.61 (.29–1.26) | .1885 |
Easier administration | 27 (11.9%) | 19 (14.1%) | 7 (8.3%) | 1.80 (.75–4.80) | .2827 |
PO instead of parenteral | 18 (7.9%) | 3 (2.2%) | 15 (17.9%) | 0.10 (.02–.33) | <.0001 |
Better safety | 10 (4.4%) | 3 (2.2%) | 6 (7.1%) | 0.30 (.06–1.15) | .0893 |
Lower cost | 9 (4.0%) | 2 (1.5%) | 7 (8.3%) | 0.17 (.02–.70) | .0290 |
Narrower spectrum | 9 (4.0%) | 0 (0%) | 8 (9.5%) | N/A | .0004 |
Primary outcome | .0086 | ||||
Clinical outcome | 185 (81.5%) | 119 (88.2%) | 60 (71.4%) | 2.98 (1.48–6.12) | |
Microbiological outcome | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | |
Both clinical and microbiological outcome | 4 (1.8%) | 2 (1.5%) | 2 (2.4%) | 0.62 (.07–5.22) | |
Other | 4 (1.8%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Recording of adverse events | 219 (96.5%) | 134 (99.3%) | 78 (92.9%) | 10.31 (1.72–196.57) | .0137 |
Conclusion based on results | <.0001 | ||||
Noninferiority shown | 157 (69.2%) | 109 (80.7%) | 44 (52.4%) | 3.81 (2.10–7.05) | |
Superiority shown | 5 (2.2%) | 5 (3.7%) | 0 (0%) | N/A | |
Inferiority shown | 10 (4.4%) | 6 (4.4%) | 4 (4.8%) | 0.93 (.26–3.73) | |
Inconclusive | 55 (24.2%) | 15 (11.1%) | 36 (42.9%) | 0.17 (.08–.33) |
8 studies had unclear funding and so were not included in group A or B.
Abbreviations: CAP, community-acquired pneumonia; CI, confidence interval; HAP, hospital acquired pneumonia; IQR, interquartile range; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio; PO, route by mouth; SSTI, skin or soft-tissue infection; VAP, ventilator-associated pneumonia.
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Adults only | 182 (80.2%) | 117 (86.7%) | 58 (69.1%) | 2.91 (1.49–5.82) | .0029 |
Multicenter | 203 (89.4%) | 132 (97.8%) | 66 (78.6%) | 12.00 (3.89–52.54) | <.0001 |
Sample size per group median (IQR) | 204 (114–313) | 254 (160–359) | 125 (70–245) | <.0001 | |
Infectious disease syndrome | <.0001 | ||||
CAP | 36 (15.9%) | 28 (20.7%) | 8 (9.5%) | 2.49 (1.12–6.11) | |
SSTI | 34 (15.0%) | 28 (20.7%) | 6 (7.1%) | 3.40 (1.43–9.45) | |
Urinary tract infection | 26 (11.5%) | 14 (10.4%) | 10 (11.9%) | 0.86 (.36–2.08) | |
Intra-abdominal infection | 21 (9.3%) | 16 (11.9%) | 5 (6.0%) | 2.12 (.80–6.71) | |
Tuberculosis | 14 (6.2%) | 1 (0.7%) | 12 (14.3%) | 0.04 (.002–.23) | |
Helicobacter pylori | 12 (5.3%) | 2 (1.5%) | 9 (10.7%) | 0.13 (.02–.50) | |
HAP or VAP | 12 (5.3%) | 11 (8.2%) | 0 (0%) | N/A | |
Other | 72 (31.7%) | 35 (25.9%) | 34 (40.5%) | 0.51 (.29–.92) | |
Rationale | |||||
New antibiotics | 100 (44.1%) | 97 (71.9%) | 2 (2.4%) | 104.66 (30.85–655.83) | <.0001 |
Shorter duration | 40 (17.6%) | 10 (7.4%) | 28 (33.3%) | 0.16 (.07–.34) | <.0001 |
Alternative option | 37 (16.3%) | 18 (13.3%) | 17 (20.2%) | 0.61 (.29–1.26) | .1885 |
Easier administration | 27 (11.9%) | 19 (14.1%) | 7 (8.3%) | 1.80 (.75–4.80) | .2827 |
PO instead of parenteral | 18 (7.9%) | 3 (2.2%) | 15 (17.9%) | 0.10 (.02–.33) | <.0001 |
Better safety | 10 (4.4%) | 3 (2.2%) | 6 (7.1%) | 0.30 (.06–1.15) | .0893 |
Lower cost | 9 (4.0%) | 2 (1.5%) | 7 (8.3%) | 0.17 (.02–.70) | .0290 |
Narrower spectrum | 9 (4.0%) | 0 (0%) | 8 (9.5%) | N/A | .0004 |
Primary outcome | .0086 | ||||
Clinical outcome | 185 (81.5%) | 119 (88.2%) | 60 (71.4%) | 2.98 (1.48–6.12) | |
Microbiological outcome | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | |
Both clinical and microbiological outcome | 4 (1.8%) | 2 (1.5%) | 2 (2.4%) | 0.62 (.07–5.22) | |
Other | 4 (1.8%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Recording of adverse events | 219 (96.5%) | 134 (99.3%) | 78 (92.9%) | 10.31 (1.72–196.57) | .0137 |
Conclusion based on results | <.0001 | ||||
Noninferiority shown | 157 (69.2%) | 109 (80.7%) | 44 (52.4%) | 3.81 (2.10–7.05) | |
Superiority shown | 5 (2.2%) | 5 (3.7%) | 0 (0%) | N/A | |
Inferiority shown | 10 (4.4%) | 6 (4.4%) | 4 (4.8%) | 0.93 (.26–3.73) | |
Inconclusive | 55 (24.2%) | 15 (11.1%) | 36 (42.9%) | 0.17 (.08–.33) |
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Adults only | 182 (80.2%) | 117 (86.7%) | 58 (69.1%) | 2.91 (1.49–5.82) | .0029 |
Multicenter | 203 (89.4%) | 132 (97.8%) | 66 (78.6%) | 12.00 (3.89–52.54) | <.0001 |
Sample size per group median (IQR) | 204 (114–313) | 254 (160–359) | 125 (70–245) | <.0001 | |
Infectious disease syndrome | <.0001 | ||||
CAP | 36 (15.9%) | 28 (20.7%) | 8 (9.5%) | 2.49 (1.12–6.11) | |
SSTI | 34 (15.0%) | 28 (20.7%) | 6 (7.1%) | 3.40 (1.43–9.45) | |
Urinary tract infection | 26 (11.5%) | 14 (10.4%) | 10 (11.9%) | 0.86 (.36–2.08) | |
Intra-abdominal infection | 21 (9.3%) | 16 (11.9%) | 5 (6.0%) | 2.12 (.80–6.71) | |
Tuberculosis | 14 (6.2%) | 1 (0.7%) | 12 (14.3%) | 0.04 (.002–.23) | |
Helicobacter pylori | 12 (5.3%) | 2 (1.5%) | 9 (10.7%) | 0.13 (.02–.50) | |
HAP or VAP | 12 (5.3%) | 11 (8.2%) | 0 (0%) | N/A | |
Other | 72 (31.7%) | 35 (25.9%) | 34 (40.5%) | 0.51 (.29–.92) | |
Rationale | |||||
New antibiotics | 100 (44.1%) | 97 (71.9%) | 2 (2.4%) | 104.66 (30.85–655.83) | <.0001 |
Shorter duration | 40 (17.6%) | 10 (7.4%) | 28 (33.3%) | 0.16 (.07–.34) | <.0001 |
Alternative option | 37 (16.3%) | 18 (13.3%) | 17 (20.2%) | 0.61 (.29–1.26) | .1885 |
Easier administration | 27 (11.9%) | 19 (14.1%) | 7 (8.3%) | 1.80 (.75–4.80) | .2827 |
PO instead of parenteral | 18 (7.9%) | 3 (2.2%) | 15 (17.9%) | 0.10 (.02–.33) | <.0001 |
Better safety | 10 (4.4%) | 3 (2.2%) | 6 (7.1%) | 0.30 (.06–1.15) | .0893 |
Lower cost | 9 (4.0%) | 2 (1.5%) | 7 (8.3%) | 0.17 (.02–.70) | .0290 |
Narrower spectrum | 9 (4.0%) | 0 (0%) | 8 (9.5%) | N/A | .0004 |
Primary outcome | .0086 | ||||
Clinical outcome | 185 (81.5%) | 119 (88.2%) | 60 (71.4%) | 2.98 (1.48–6.12) | |
Microbiological outcome | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | |
Both clinical and microbiological outcome | 4 (1.8%) | 2 (1.5%) | 2 (2.4%) | 0.62 (.07–5.22) | |
Other | 4 (1.8%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Recording of adverse events | 219 (96.5%) | 134 (99.3%) | 78 (92.9%) | 10.31 (1.72–196.57) | .0137 |
Conclusion based on results | <.0001 | ||||
Noninferiority shown | 157 (69.2%) | 109 (80.7%) | 44 (52.4%) | 3.81 (2.10–7.05) | |
Superiority shown | 5 (2.2%) | 5 (3.7%) | 0 (0%) | N/A | |
Inferiority shown | 10 (4.4%) | 6 (4.4%) | 4 (4.8%) | 0.93 (.26–3.73) | |
Inconclusive | 55 (24.2%) | 15 (11.1%) | 36 (42.9%) | 0.17 (.08–.33) |
8 studies had unclear funding and so were not included in group A or B.
Abbreviations: CAP, community-acquired pneumonia; CI, confidence interval; HAP, hospital acquired pneumonia; IQR, interquartile range; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio; PO, route by mouth; SSTI, skin or soft-tissue infection; VAP, ventilator-associated pneumonia.
Only 83 (36.6%) studies had a justification for the specified noninferiority margin (Table 2). Both ITT and PP analysis results were reported in 165 (72.7%) studies. The authors’ conclusions were misleading in 34 (15.0%) studies (Supplementary Table 3). Of the 34 misleading conclusions, the authors concluded noninferiority, whereas the study results showed inconclusive results in 27 (79.4%) studies.
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Noninferiority in title | 59 (26.0%) | 24 (17.8%) | 32 (38.1%) | 0.35 (.19–.65) | .0013 |
Double-blinded | 120 (52.9%) | 101 (74.8%) | 18 (21.4%) | 10.89 (5.80–21.36) | <.0001 |
Noninferiority margin specified | 221 (97.4%) | 135 (100%) | 80 (95.2%) | N/A | .0207 |
Noninferiority margin justified | 83 (36.6%) | 44 (32.6%) | 37 (44.1%) | 0.61 (.35–1.08) | .1131 |
Justification by clinical basis | 32 (14.1%) | 5 (3.7%) | 27 (32.1%) | 0.08 (.03–.20) | <.0001 |
Justification by prior studies | 37 (16.3%) | 21 (15.6%) | 15 (17.9%) | 0.85 (.41–1.78) | .7091 |
Justification by guidelines | 2 (0.9%) | 1 (0.7%) | 1 (1.2%) | 0.62 (.02–15.80) | >.9999 |
Justification by regulatory bodies | 42 (18.5%) | 31 (23.0%) | 10 (11.9%) | 2.21 (1.05–4.99) | .0500 |
Justification by effect of control | 18 (7.9%) | 11 (8.2%) | 7 (8.3%) | 0.98 (.37–2.75) | >.9999 |
Adequate information for sample size recalculation | 152 (67.0%) | 80 (59.3%) | 68 (81.0%) | 0.34 (.18–.64) | .0010 |
2-sided 95% CI or 1-sided 97.5% CI used | 193 (85.0%) | 129 (95.6%) | 59 (70.2%) | 9.11 (3.77–25.60) | <.0001 |
Discordance between type I error rate used and CI | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | .0568 |
Early stopping due to logistics | 21 (9.3%) | 4 (3.0%) | 16 (19.1%) | 0.13 (.04–.37) | .0001 |
Analysis used | .1326 | ||||
ITT only | 41 (18.1%) | 23 (17.0%) | 17 (20.2%) | 0.81 (.40–1.64) | |
PP only | 21 (9.3%) | 8 (5.9%) | 11 (13.1%) | 0.42 (.16–1.08) | |
ITT and PP | 165 (72.7%) | 104 (77.0%) | 56 (66.7%) | 1.68 (.91–3.08) | |
Handling of missing data | 69 (30.4%) | 50 (37.0%) | 19 (22.6%) | 2.01 (1.10–3.80) | .0358 |
Imputation of missing data | 18 (7.9%) | 9 (6.7%) | 9 (10.7%) | 0.60 (.22–1.59) | .3182 |
Worst case scenario | 55 (24.2%) | 41 (30.4%) | 14 (16.7%) | 2.18 (1.13–4.43) | .0253 |
Sensitivity analyses | 11 (4.9%) | 9 (6.7%) | 2 (2.4%) | 2.93 (.73–19.53) | .2111 |
Point estimate with CI | 205 (90.3%) | 129 (95.6%) | 70 (83.3%) | 4.30 (1.65–12.60) | .0033 |
Figure of point estimate, CI, and noninferiority margin | 44 (19.4%) | 24 (17.8%) | 20 (23.8%) | 0.69 (.35–1.36) | .3012 |
Comparison to historical control | 20 (8.8%) | 13 (9.6%) | 7 (8.3%) | 1.17 (.46–3.24) | .8137 |
Misleading conclusion | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | .0105 |
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Noninferiority in title | 59 (26.0%) | 24 (17.8%) | 32 (38.1%) | 0.35 (.19–.65) | .0013 |
Double-blinded | 120 (52.9%) | 101 (74.8%) | 18 (21.4%) | 10.89 (5.80–21.36) | <.0001 |
Noninferiority margin specified | 221 (97.4%) | 135 (100%) | 80 (95.2%) | N/A | .0207 |
Noninferiority margin justified | 83 (36.6%) | 44 (32.6%) | 37 (44.1%) | 0.61 (.35–1.08) | .1131 |
Justification by clinical basis | 32 (14.1%) | 5 (3.7%) | 27 (32.1%) | 0.08 (.03–.20) | <.0001 |
Justification by prior studies | 37 (16.3%) | 21 (15.6%) | 15 (17.9%) | 0.85 (.41–1.78) | .7091 |
Justification by guidelines | 2 (0.9%) | 1 (0.7%) | 1 (1.2%) | 0.62 (.02–15.80) | >.9999 |
Justification by regulatory bodies | 42 (18.5%) | 31 (23.0%) | 10 (11.9%) | 2.21 (1.05–4.99) | .0500 |
Justification by effect of control | 18 (7.9%) | 11 (8.2%) | 7 (8.3%) | 0.98 (.37–2.75) | >.9999 |
Adequate information for sample size recalculation | 152 (67.0%) | 80 (59.3%) | 68 (81.0%) | 0.34 (.18–.64) | .0010 |
2-sided 95% CI or 1-sided 97.5% CI used | 193 (85.0%) | 129 (95.6%) | 59 (70.2%) | 9.11 (3.77–25.60) | <.0001 |
Discordance between type I error rate used and CI | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | .0568 |
Early stopping due to logistics | 21 (9.3%) | 4 (3.0%) | 16 (19.1%) | 0.13 (.04–.37) | .0001 |
Analysis used | .1326 | ||||
ITT only | 41 (18.1%) | 23 (17.0%) | 17 (20.2%) | 0.81 (.40–1.64) | |
PP only | 21 (9.3%) | 8 (5.9%) | 11 (13.1%) | 0.42 (.16–1.08) | |
ITT and PP | 165 (72.7%) | 104 (77.0%) | 56 (66.7%) | 1.68 (.91–3.08) | |
Handling of missing data | 69 (30.4%) | 50 (37.0%) | 19 (22.6%) | 2.01 (1.10–3.80) | .0358 |
Imputation of missing data | 18 (7.9%) | 9 (6.7%) | 9 (10.7%) | 0.60 (.22–1.59) | .3182 |
Worst case scenario | 55 (24.2%) | 41 (30.4%) | 14 (16.7%) | 2.18 (1.13–4.43) | .0253 |
Sensitivity analyses | 11 (4.9%) | 9 (6.7%) | 2 (2.4%) | 2.93 (.73–19.53) | .2111 |
Point estimate with CI | 205 (90.3%) | 129 (95.6%) | 70 (83.3%) | 4.30 (1.65–12.60) | .0033 |
Figure of point estimate, CI, and noninferiority margin | 44 (19.4%) | 24 (17.8%) | 20 (23.8%) | 0.69 (.35–1.36) | .3012 |
Comparison to historical control | 20 (8.8%) | 13 (9.6%) | 7 (8.3%) | 1.17 (.46–3.24) | .8137 |
Misleading conclusion | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | .0105 |
8 studies had unclear funding and so were not included in group A or B.
Abbreviations: CI, confidence interval; ITT, intention-to-treat analysis; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio; PP, per-protocol analysis.
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Noninferiority in title | 59 (26.0%) | 24 (17.8%) | 32 (38.1%) | 0.35 (.19–.65) | .0013 |
Double-blinded | 120 (52.9%) | 101 (74.8%) | 18 (21.4%) | 10.89 (5.80–21.36) | <.0001 |
Noninferiority margin specified | 221 (97.4%) | 135 (100%) | 80 (95.2%) | N/A | .0207 |
Noninferiority margin justified | 83 (36.6%) | 44 (32.6%) | 37 (44.1%) | 0.61 (.35–1.08) | .1131 |
Justification by clinical basis | 32 (14.1%) | 5 (3.7%) | 27 (32.1%) | 0.08 (.03–.20) | <.0001 |
Justification by prior studies | 37 (16.3%) | 21 (15.6%) | 15 (17.9%) | 0.85 (.41–1.78) | .7091 |
Justification by guidelines | 2 (0.9%) | 1 (0.7%) | 1 (1.2%) | 0.62 (.02–15.80) | >.9999 |
Justification by regulatory bodies | 42 (18.5%) | 31 (23.0%) | 10 (11.9%) | 2.21 (1.05–4.99) | .0500 |
Justification by effect of control | 18 (7.9%) | 11 (8.2%) | 7 (8.3%) | 0.98 (.37–2.75) | >.9999 |
Adequate information for sample size recalculation | 152 (67.0%) | 80 (59.3%) | 68 (81.0%) | 0.34 (.18–.64) | .0010 |
2-sided 95% CI or 1-sided 97.5% CI used | 193 (85.0%) | 129 (95.6%) | 59 (70.2%) | 9.11 (3.77–25.60) | <.0001 |
Discordance between type I error rate used and CI | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | .0568 |
Early stopping due to logistics | 21 (9.3%) | 4 (3.0%) | 16 (19.1%) | 0.13 (.04–.37) | .0001 |
Analysis used | .1326 | ||||
ITT only | 41 (18.1%) | 23 (17.0%) | 17 (20.2%) | 0.81 (.40–1.64) | |
PP only | 21 (9.3%) | 8 (5.9%) | 11 (13.1%) | 0.42 (.16–1.08) | |
ITT and PP | 165 (72.7%) | 104 (77.0%) | 56 (66.7%) | 1.68 (.91–3.08) | |
Handling of missing data | 69 (30.4%) | 50 (37.0%) | 19 (22.6%) | 2.01 (1.10–3.80) | .0358 |
Imputation of missing data | 18 (7.9%) | 9 (6.7%) | 9 (10.7%) | 0.60 (.22–1.59) | .3182 |
Worst case scenario | 55 (24.2%) | 41 (30.4%) | 14 (16.7%) | 2.18 (1.13–4.43) | .0253 |
Sensitivity analyses | 11 (4.9%) | 9 (6.7%) | 2 (2.4%) | 2.93 (.73–19.53) | .2111 |
Point estimate with CI | 205 (90.3%) | 129 (95.6%) | 70 (83.3%) | 4.30 (1.65–12.60) | .0033 |
Figure of point estimate, CI, and noninferiority margin | 44 (19.4%) | 24 (17.8%) | 20 (23.8%) | 0.69 (.35–1.36) | .3012 |
Comparison to historical control | 20 (8.8%) | 13 (9.6%) | 7 (8.3%) | 1.17 (.46–3.24) | .8137 |
Misleading conclusion | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | .0105 |
. | All Studies . | Group A: Industry-supported Studies . | Group B: Nonpharmaceutical-supported Studies . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
. | (N = 227) . | (N = 135) . | (N = 84) . | . | . |
Noninferiority in title | 59 (26.0%) | 24 (17.8%) | 32 (38.1%) | 0.35 (.19–.65) | .0013 |
Double-blinded | 120 (52.9%) | 101 (74.8%) | 18 (21.4%) | 10.89 (5.80–21.36) | <.0001 |
Noninferiority margin specified | 221 (97.4%) | 135 (100%) | 80 (95.2%) | N/A | .0207 |
Noninferiority margin justified | 83 (36.6%) | 44 (32.6%) | 37 (44.1%) | 0.61 (.35–1.08) | .1131 |
Justification by clinical basis | 32 (14.1%) | 5 (3.7%) | 27 (32.1%) | 0.08 (.03–.20) | <.0001 |
Justification by prior studies | 37 (16.3%) | 21 (15.6%) | 15 (17.9%) | 0.85 (.41–1.78) | .7091 |
Justification by guidelines | 2 (0.9%) | 1 (0.7%) | 1 (1.2%) | 0.62 (.02–15.80) | >.9999 |
Justification by regulatory bodies | 42 (18.5%) | 31 (23.0%) | 10 (11.9%) | 2.21 (1.05–4.99) | .0500 |
Justification by effect of control | 18 (7.9%) | 11 (8.2%) | 7 (8.3%) | 0.98 (.37–2.75) | >.9999 |
Adequate information for sample size recalculation | 152 (67.0%) | 80 (59.3%) | 68 (81.0%) | 0.34 (.18–.64) | .0010 |
2-sided 95% CI or 1-sided 97.5% CI used | 193 (85.0%) | 129 (95.6%) | 59 (70.2%) | 9.11 (3.77–25.60) | <.0001 |
Discordance between type I error rate used and CI | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | .0568 |
Early stopping due to logistics | 21 (9.3%) | 4 (3.0%) | 16 (19.1%) | 0.13 (.04–.37) | .0001 |
Analysis used | .1326 | ||||
ITT only | 41 (18.1%) | 23 (17.0%) | 17 (20.2%) | 0.81 (.40–1.64) | |
PP only | 21 (9.3%) | 8 (5.9%) | 11 (13.1%) | 0.42 (.16–1.08) | |
ITT and PP | 165 (72.7%) | 104 (77.0%) | 56 (66.7%) | 1.68 (.91–3.08) | |
Handling of missing data | 69 (30.4%) | 50 (37.0%) | 19 (22.6%) | 2.01 (1.10–3.80) | .0358 |
Imputation of missing data | 18 (7.9%) | 9 (6.7%) | 9 (10.7%) | 0.60 (.22–1.59) | .3182 |
Worst case scenario | 55 (24.2%) | 41 (30.4%) | 14 (16.7%) | 2.18 (1.13–4.43) | .0253 |
Sensitivity analyses | 11 (4.9%) | 9 (6.7%) | 2 (2.4%) | 2.93 (.73–19.53) | .2111 |
Point estimate with CI | 205 (90.3%) | 129 (95.6%) | 70 (83.3%) | 4.30 (1.65–12.60) | .0033 |
Figure of point estimate, CI, and noninferiority margin | 44 (19.4%) | 24 (17.8%) | 20 (23.8%) | 0.69 (.35–1.36) | .3012 |
Comparison to historical control | 20 (8.8%) | 13 (9.6%) | 7 (8.3%) | 1.17 (.46–3.24) | .8137 |
Misleading conclusion | 34 (15.0%) | 13 (9.6%) | 19 (22.6%) | 0.36 (.17–.78) | .0105 |
8 studies had unclear funding and so were not included in group A or B.
Abbreviations: CI, confidence interval; ITT, intention-to-treat analysis; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio; PP, per-protocol analysis.
Risk of Bias
Of 227 studies, 112 (49.3%), 89 (39.2%), and 41 (18.1%) studies were at high risk for performance bias, detection bias, and reporting bias respectively (Table 3, Supplementary Table 4, Supplementary Figure 1).
. | All Studies(N = 227) . | Group A: Industry-supported Studies(N = 135) . | Group B: Nonpharmaceutical-supported Studies(N = 84) . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
Randomization | .0185 | ||||
High risk | 5 (2.2%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Low risk | 155 (68.3%) | 88 (65.2%) | 65 (77.4%) | 0.55 (.29–1.01) | |
Unclear | 67 (29.5%) | 46 (34.1%) | 16 (19.1%) | 2.20 (1.17–4.31) | |
Allocation concealment | .1211 | ||||
High risk | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | |
Low risk | 104 (45.8%) | 65 (48.2%) | 38 (45.2%) | 1.12 (.65–1.95) | |
Unclear | 114 (50.2%) | 68 (50.4%) | 40 (47.6%) | 1.12 (.65–1.93) | |
Performance bias | <.0001 | ||||
High risk | 112 (49.3%) | 39 (28.9%) | 66 (78.6%) | 0.11 (.06–.21) | |
Low risk | 109 (48.0%) | 91 (67.4%) | 17 (20.2%) | 8.15 (4.37–15.88) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Detection bias | <.0001 | ||||
High risk | 89 (39.2%) | 28 (20.7%) | 54 (64.3%) | 0.15 (.08–.26) | |
Low risk | 131 (57.7%) | 102 (75.6%) | 28 (33.3%) | 6.18 (3.43–11.42) | |
Unclear | 7 (3.1%) | 5 (3.7%) | 2 (2.4%) | 1.58 (.33–11.19) | |
Attrition bias | .6293 | ||||
High risk | 78 (34.4%) | 45 (33.3%) | 28 (33.3%) | 1.00 (.56–1.79) | |
Low risk | 143 (63.0%) | 85 (63.0%) | 55 (65.5%) | 0.90 (.50–1.58) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Reporting bias | .0268 | ||||
High risk | 41 (18.1%) | 17 (12.6%) | 21 (25.0%) | 0.43 (.21–.88) | |
Low risk | 186 (81.9%) | 118 (87.4%) | 63 (75.0%) | 2.31 (1.14–4.75) | |
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | N/A |
. | All Studies(N = 227) . | Group A: Industry-supported Studies(N = 135) . | Group B: Nonpharmaceutical-supported Studies(N = 84) . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
Randomization | .0185 | ||||
High risk | 5 (2.2%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Low risk | 155 (68.3%) | 88 (65.2%) | 65 (77.4%) | 0.55 (.29–1.01) | |
Unclear | 67 (29.5%) | 46 (34.1%) | 16 (19.1%) | 2.20 (1.17–4.31) | |
Allocation concealment | .1211 | ||||
High risk | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | |
Low risk | 104 (45.8%) | 65 (48.2%) | 38 (45.2%) | 1.12 (.65–1.95) | |
Unclear | 114 (50.2%) | 68 (50.4%) | 40 (47.6%) | 1.12 (.65–1.93) | |
Performance bias | <.0001 | ||||
High risk | 112 (49.3%) | 39 (28.9%) | 66 (78.6%) | 0.11 (.06–.21) | |
Low risk | 109 (48.0%) | 91 (67.4%) | 17 (20.2%) | 8.15 (4.37–15.88) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Detection bias | <.0001 | ||||
High risk | 89 (39.2%) | 28 (20.7%) | 54 (64.3%) | 0.15 (.08–.26) | |
Low risk | 131 (57.7%) | 102 (75.6%) | 28 (33.3%) | 6.18 (3.43–11.42) | |
Unclear | 7 (3.1%) | 5 (3.7%) | 2 (2.4%) | 1.58 (.33–11.19) | |
Attrition bias | .6293 | ||||
High risk | 78 (34.4%) | 45 (33.3%) | 28 (33.3%) | 1.00 (.56–1.79) | |
Low risk | 143 (63.0%) | 85 (63.0%) | 55 (65.5%) | 0.90 (.50–1.58) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Reporting bias | .0268 | ||||
High risk | 41 (18.1%) | 17 (12.6%) | 21 (25.0%) | 0.43 (.21–.88) | |
Low risk | 186 (81.9%) | 118 (87.4%) | 63 (75.0%) | 2.31 (1.14–4.75) | |
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | N/A |
8 studies had unclear funding and so were not included in group A or B.
Abbreviations: CI, confidence interval; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio.
. | All Studies(N = 227) . | Group A: Industry-supported Studies(N = 135) . | Group B: Nonpharmaceutical-supported Studies(N = 84) . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
Randomization | .0185 | ||||
High risk | 5 (2.2%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Low risk | 155 (68.3%) | 88 (65.2%) | 65 (77.4%) | 0.55 (.29–1.01) | |
Unclear | 67 (29.5%) | 46 (34.1%) | 16 (19.1%) | 2.20 (1.17–4.31) | |
Allocation concealment | .1211 | ||||
High risk | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | |
Low risk | 104 (45.8%) | 65 (48.2%) | 38 (45.2%) | 1.12 (.65–1.95) | |
Unclear | 114 (50.2%) | 68 (50.4%) | 40 (47.6%) | 1.12 (.65–1.93) | |
Performance bias | <.0001 | ||||
High risk | 112 (49.3%) | 39 (28.9%) | 66 (78.6%) | 0.11 (.06–.21) | |
Low risk | 109 (48.0%) | 91 (67.4%) | 17 (20.2%) | 8.15 (4.37–15.88) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Detection bias | <.0001 | ||||
High risk | 89 (39.2%) | 28 (20.7%) | 54 (64.3%) | 0.15 (.08–.26) | |
Low risk | 131 (57.7%) | 102 (75.6%) | 28 (33.3%) | 6.18 (3.43–11.42) | |
Unclear | 7 (3.1%) | 5 (3.7%) | 2 (2.4%) | 1.58 (.33–11.19) | |
Attrition bias | .6293 | ||||
High risk | 78 (34.4%) | 45 (33.3%) | 28 (33.3%) | 1.00 (.56–1.79) | |
Low risk | 143 (63.0%) | 85 (63.0%) | 55 (65.5%) | 0.90 (.50–1.58) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Reporting bias | .0268 | ||||
High risk | 41 (18.1%) | 17 (12.6%) | 21 (25.0%) | 0.43 (.21–.88) | |
Low risk | 186 (81.9%) | 118 (87.4%) | 63 (75.0%) | 2.31 (1.14–4.75) | |
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | N/A |
. | All Studies(N = 227) . | Group A: Industry-supported Studies(N = 135) . | Group B: Nonpharmaceutical-supported Studies(N = 84) . | A vs B B as Reference OR 95% CI . | A vs B P Value . |
---|---|---|---|---|---|
Randomization | .0185 | ||||
High risk | 5 (2.2%) | 1 (0.7%) | 3 (3.6%) | 0.20 (.01–1.60) | |
Low risk | 155 (68.3%) | 88 (65.2%) | 65 (77.4%) | 0.55 (.29–1.01) | |
Unclear | 67 (29.5%) | 46 (34.1%) | 16 (19.1%) | 2.20 (1.17–4.31) | |
Allocation concealment | .1211 | ||||
High risk | 9 (4.0%) | 2 (1.5%) | 6 (7.1%) | 0.20 (.03–.87) | |
Low risk | 104 (45.8%) | 65 (48.2%) | 38 (45.2%) | 1.12 (.65–1.95) | |
Unclear | 114 (50.2%) | 68 (50.4%) | 40 (47.6%) | 1.12 (.65–1.93) | |
Performance bias | <.0001 | ||||
High risk | 112 (49.3%) | 39 (28.9%) | 66 (78.6%) | 0.11 (.06–.21) | |
Low risk | 109 (48.0%) | 91 (67.4%) | 17 (20.2%) | 8.15 (4.37–15.88) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Detection bias | <.0001 | ||||
High risk | 89 (39.2%) | 28 (20.7%) | 54 (64.3%) | 0.15 (.08–.26) | |
Low risk | 131 (57.7%) | 102 (75.6%) | 28 (33.3%) | 6.18 (3.43–11.42) | |
Unclear | 7 (3.1%) | 5 (3.7%) | 2 (2.4%) | 1.58 (.33–11.19) | |
Attrition bias | .6293 | ||||
High risk | 78 (34.4%) | 45 (33.3%) | 28 (33.3%) | 1.00 (.56–1.79) | |
Low risk | 143 (63.0%) | 85 (63.0%) | 55 (65.5%) | 0.90 (.50–1.58) | |
Unclear | 6 (2.6%) | 5 (3.7%) | 1 (1.2%) | 3.19 (.50–61.74) | |
Reporting bias | .0268 | ||||
High risk | 41 (18.1%) | 17 (12.6%) | 21 (25.0%) | 0.43 (.21–.88) | |
Low risk | 186 (81.9%) | 118 (87.4%) | 63 (75.0%) | 2.31 (1.14–4.75) | |
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | N/A |
8 studies had unclear funding and so were not included in group A or B.
Abbreviations: CI, confidence interval; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio.
Comparison of Studies by Funding
A total of 135 (59.5%) studies were supported by pharmaceutical industry, 84 (37.0%) were not supported by pharmaceutical industry, and 8 (3.5%) studies had an unclear source of funding. Of the 84 studies not supported by pharmaceutical industry, 30 (35.7%) were funded by the government and 54 (64.3%) were funded by other public funding agencies (Supplementary Table 5). Compared with nonpharmaceutical-supported studies, industry-supported studies were less likely to show inconclusive results (11.1% vs 42.9%; OR = 0.17; 95% CI, .08–.33) (Table 1).
Industry-supported studies were less likely to be stopped early for logistical reasons (3.0% vs 19.1%; OR = 0.13; 95% CI, .04–.37) (Table 2). In the 16 studies that were not supported by pharmaceutical industry and were stopped early because of logistical reasons, 11 (68.8%) had inconclusive results.
With respect to risk of bias, industry-supported studies were more likely to be at low risk for performance bias (67.4% vs 20.2%; OR = 8.15; 95% CI, 4.37–15.88), detection bias (75.6% vs 33.3%; OR = 6.18; 95%, CI 3.43–11.42), and reporting bias (87.4% vs 75.0%; OR = 2.31; 95% CI, 1.14–4.75) (Table 3).
Comparison of Studies Over Time
Forty-four (19.4%) studies were published in 2007 or earlier, 67 (29.5%) studies were published between 2008 and 2013, and 116 (51.1%) studies were published in or after 2014. Newer studies were more likely to justify the noninferiority margin (11.4%, 28.4%, and 50.9%, respectively, P < .0001) (Table 4). Although not statistically significant, newer studies were less likely to be double-blinded (68.2%, 53.7%, and 46.6%, respectively; P = .0511) or to report both ITT and PP analyses (81.8%, 73.1%, and 69.0%, respectively; P = .1035). There were an increasing proportion of studies over time that were stopped early because of logistical reasons (4.6%, 6.0%, and 12.9%, respectively; P = .1746) or had a misleading conclusion (6.8%, 22.4%, and 13.8%, respectively; P = .0797). Moreover, newer studies were more likely to be at high risk for performance bias (43.2%, 46.3%, and 53.5%, respectively; P = .0350) and detection bias (25.0%, 38.8%, and 44.8%, respectively; P = .0214) (Table 5).
Comparison of Methodology and Reporting Quality of Studies by Publication Year
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or after (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Supported by pharmaceutical industry | 34 (77.3%) | 49 (73.1%) | 52 (44.8%) | <.0001 |
Reference | 0.80 (.32–1.92) | 0.24 (.10–.51) | ||
Noninferiority in title | 5 (11.4%) | 8 (11.9%) | 46 (39.7%) | <.0001 |
Reference | 1.06 (.33–3.72) | 5.13 (2.03–15.72) | ||
Double-blinded | 30 (68.2%) | 36 (53.7%) | 54 (46.6%) | .0511 |
Reference | 0.54 (.24–1.19) | 0.41 (.19–.83) | ||
Noninferiority margin specified | 43 (97.7%) | 67 (100%) | 111 (95.7%) | .2769 |
Reference | N/A | 0.52 (.03–3.32) | ||
Noninferiority margin justified | 5 (11.4%) | 19 (28.4%) | 59 (50.9%) | <.0001 |
Reference | 3.09 (1.12–9.99) | 8.07 (3.22–24.71) | ||
Justification by clinical basis | 2 (4.6%) | 7 (10.5%) | 23 (19.8%) | .0275 |
Reference | 2.45 (.56–16.99) | 5.19 (1.44–33.30) | ||
Justification by prior studies | 2 (4.6%) | 9 (13.4%) | 26 (22.4%) | .0167 |
Reference | 3.26 (.79–22.10) | 6.07 (1.70–38.75) | ||
Justification by guidelines | 0 (0%) | 1 (1.5%) | 1 (.9%) | >.9999 |
Reference | N/A | N/A | ||
Justification by regulatory bodies | 4 (9.1%) | 6 (9.0%) | 32 (27.6%) | .0015 |
Reference | 0.98 (.26–4.05) | 3.81 (1.39–13.43) | ||
Justification by effect of control | 1 (2.3%) | 5 (7.5%) | 12 (10.3%) | .2636 |
Reference | 3.47 (.53–67.69) | 4.96 (.93–91.78) | ||
Adequate information for sample size re-calculation | 20 (45.5%) | 39 (58.2%) | 93 (80.2%) | <.0001 |
Reference | 1.67 (.78–3.63) | 4.85 (2.31–10.40) | ||
2-sided 95% CI or 1-sided 97.5% CI used | 38 (86.4%) | 59 (88.1%) | 96 (82.8%) | .6222 |
Reference | 1.16 (.36–3.61) | 0.76 (.26–1.94) | ||
Discordance between type I error rate used and CI | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | .7343 |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Early stopping due to logistics | 2 (4.6%) | 4 (6.0%) | 15 (12.9%) | .1746 |
Reference | 1.33 (.25–9.93) | 3.12 (.83–20.34) | ||
Analysis used | .1035 | |||
ITT only | 4 (9.1%) | 9 (13.4%) | 28 (24.1%) | |
Reference | 1.55 (.47–6.04) | 3.18 (1.15–11.27) | ||
PP only | 4 (9.1%) | 9 (13.4%) | 8 (6.9%) | |
Reference | 1.55 (.47–6.04) | 0.74 (.22–2.89) | ||
ITT and PP | 36 (81.8%) | 49 (73.1%) | 80 (69.0%) | |
Reference | 0.60 (.23–1.51) | 0.49 (.20–1.12) | ||
Handling of missing data | 17 (38.6%) | 14 (20.9%) | 38 (32.8%) | .0945 |
Reference | 0.42 (.18–.97) | 0.77 (.38–1.61) | ||
Imputation of missing data | 1 (2.3%) | 4 (6.0%) | 13 (11.2%) | .1461 |
Reference | 2.73 (.39–54.39) | 5.43 (1.03–100.11) | ||
Worst case scenario | 16 (36.4%) | 12 (17.9%) | 27 (23.3%) | .0894 |
Reference | 0.38 (.16–.91) | 0.53 (.25–1.14) | ||
Sensitivity analyses | 1 (2.3%) | 1 (1.5%) | 9 (7.8%) | .1236 |
Reference | 0.65 (.03–16.77) | 3.62 (.65–67.71) | ||
Point estimate with CI | 43 (97.7%) | 58 (86.6%) | 104 (89.7%) | .1400 |
Reference | 0.15 (.01–.84) | 0.20 (.01–1.07) | ||
Figure of point estimate, CI, and noninferiority margin | 1 (2.3%) | 5 (7.5%) | 38 (32.8%) | <.0001 |
Reference | 3.47 (.53–67.69) | 20.95 (4.30–378.24) | ||
Comparison to historical control | 4 (9.1%) | 5 (7.5%) | 11 (9.5%) | .9064 |
Reference | 0.81 (.20–3.43) | 1.05 (.34–3.95) | ||
Misleading conclusions | 3 (6.8%) | 15 (22.4%) | 16 (13.8%) | .0797 |
Reference | 3.94 (1.20–17.85) | 2.19 (.68–9.76) |
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or after (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Supported by pharmaceutical industry | 34 (77.3%) | 49 (73.1%) | 52 (44.8%) | <.0001 |
Reference | 0.80 (.32–1.92) | 0.24 (.10–.51) | ||
Noninferiority in title | 5 (11.4%) | 8 (11.9%) | 46 (39.7%) | <.0001 |
Reference | 1.06 (.33–3.72) | 5.13 (2.03–15.72) | ||
Double-blinded | 30 (68.2%) | 36 (53.7%) | 54 (46.6%) | .0511 |
Reference | 0.54 (.24–1.19) | 0.41 (.19–.83) | ||
Noninferiority margin specified | 43 (97.7%) | 67 (100%) | 111 (95.7%) | .2769 |
Reference | N/A | 0.52 (.03–3.32) | ||
Noninferiority margin justified | 5 (11.4%) | 19 (28.4%) | 59 (50.9%) | <.0001 |
Reference | 3.09 (1.12–9.99) | 8.07 (3.22–24.71) | ||
Justification by clinical basis | 2 (4.6%) | 7 (10.5%) | 23 (19.8%) | .0275 |
Reference | 2.45 (.56–16.99) | 5.19 (1.44–33.30) | ||
Justification by prior studies | 2 (4.6%) | 9 (13.4%) | 26 (22.4%) | .0167 |
Reference | 3.26 (.79–22.10) | 6.07 (1.70–38.75) | ||
Justification by guidelines | 0 (0%) | 1 (1.5%) | 1 (.9%) | >.9999 |
Reference | N/A | N/A | ||
Justification by regulatory bodies | 4 (9.1%) | 6 (9.0%) | 32 (27.6%) | .0015 |
Reference | 0.98 (.26–4.05) | 3.81 (1.39–13.43) | ||
Justification by effect of control | 1 (2.3%) | 5 (7.5%) | 12 (10.3%) | .2636 |
Reference | 3.47 (.53–67.69) | 4.96 (.93–91.78) | ||
Adequate information for sample size re-calculation | 20 (45.5%) | 39 (58.2%) | 93 (80.2%) | <.0001 |
Reference | 1.67 (.78–3.63) | 4.85 (2.31–10.40) | ||
2-sided 95% CI or 1-sided 97.5% CI used | 38 (86.4%) | 59 (88.1%) | 96 (82.8%) | .6222 |
Reference | 1.16 (.36–3.61) | 0.76 (.26–1.94) | ||
Discordance between type I error rate used and CI | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | .7343 |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Early stopping due to logistics | 2 (4.6%) | 4 (6.0%) | 15 (12.9%) | .1746 |
Reference | 1.33 (.25–9.93) | 3.12 (.83–20.34) | ||
Analysis used | .1035 | |||
ITT only | 4 (9.1%) | 9 (13.4%) | 28 (24.1%) | |
Reference | 1.55 (.47–6.04) | 3.18 (1.15–11.27) | ||
PP only | 4 (9.1%) | 9 (13.4%) | 8 (6.9%) | |
Reference | 1.55 (.47–6.04) | 0.74 (.22–2.89) | ||
ITT and PP | 36 (81.8%) | 49 (73.1%) | 80 (69.0%) | |
Reference | 0.60 (.23–1.51) | 0.49 (.20–1.12) | ||
Handling of missing data | 17 (38.6%) | 14 (20.9%) | 38 (32.8%) | .0945 |
Reference | 0.42 (.18–.97) | 0.77 (.38–1.61) | ||
Imputation of missing data | 1 (2.3%) | 4 (6.0%) | 13 (11.2%) | .1461 |
Reference | 2.73 (.39–54.39) | 5.43 (1.03–100.11) | ||
Worst case scenario | 16 (36.4%) | 12 (17.9%) | 27 (23.3%) | .0894 |
Reference | 0.38 (.16–.91) | 0.53 (.25–1.14) | ||
Sensitivity analyses | 1 (2.3%) | 1 (1.5%) | 9 (7.8%) | .1236 |
Reference | 0.65 (.03–16.77) | 3.62 (.65–67.71) | ||
Point estimate with CI | 43 (97.7%) | 58 (86.6%) | 104 (89.7%) | .1400 |
Reference | 0.15 (.01–.84) | 0.20 (.01–1.07) | ||
Figure of point estimate, CI, and noninferiority margin | 1 (2.3%) | 5 (7.5%) | 38 (32.8%) | <.0001 |
Reference | 3.47 (.53–67.69) | 20.95 (4.30–378.24) | ||
Comparison to historical control | 4 (9.1%) | 5 (7.5%) | 11 (9.5%) | .9064 |
Reference | 0.81 (.20–3.43) | 1.05 (.34–3.95) | ||
Misleading conclusions | 3 (6.8%) | 15 (22.4%) | 16 (13.8%) | .0797 |
Reference | 3.94 (1.20–17.85) | 2.19 (.68–9.76) |
Abbreviations: CI, confidence interval; ITT, intention-to-treat analysis; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio; PP, per-protocol analysis.
Comparison of Methodology and Reporting Quality of Studies by Publication Year
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or after (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Supported by pharmaceutical industry | 34 (77.3%) | 49 (73.1%) | 52 (44.8%) | <.0001 |
Reference | 0.80 (.32–1.92) | 0.24 (.10–.51) | ||
Noninferiority in title | 5 (11.4%) | 8 (11.9%) | 46 (39.7%) | <.0001 |
Reference | 1.06 (.33–3.72) | 5.13 (2.03–15.72) | ||
Double-blinded | 30 (68.2%) | 36 (53.7%) | 54 (46.6%) | .0511 |
Reference | 0.54 (.24–1.19) | 0.41 (.19–.83) | ||
Noninferiority margin specified | 43 (97.7%) | 67 (100%) | 111 (95.7%) | .2769 |
Reference | N/A | 0.52 (.03–3.32) | ||
Noninferiority margin justified | 5 (11.4%) | 19 (28.4%) | 59 (50.9%) | <.0001 |
Reference | 3.09 (1.12–9.99) | 8.07 (3.22–24.71) | ||
Justification by clinical basis | 2 (4.6%) | 7 (10.5%) | 23 (19.8%) | .0275 |
Reference | 2.45 (.56–16.99) | 5.19 (1.44–33.30) | ||
Justification by prior studies | 2 (4.6%) | 9 (13.4%) | 26 (22.4%) | .0167 |
Reference | 3.26 (.79–22.10) | 6.07 (1.70–38.75) | ||
Justification by guidelines | 0 (0%) | 1 (1.5%) | 1 (.9%) | >.9999 |
Reference | N/A | N/A | ||
Justification by regulatory bodies | 4 (9.1%) | 6 (9.0%) | 32 (27.6%) | .0015 |
Reference | 0.98 (.26–4.05) | 3.81 (1.39–13.43) | ||
Justification by effect of control | 1 (2.3%) | 5 (7.5%) | 12 (10.3%) | .2636 |
Reference | 3.47 (.53–67.69) | 4.96 (.93–91.78) | ||
Adequate information for sample size re-calculation | 20 (45.5%) | 39 (58.2%) | 93 (80.2%) | <.0001 |
Reference | 1.67 (.78–3.63) | 4.85 (2.31–10.40) | ||
2-sided 95% CI or 1-sided 97.5% CI used | 38 (86.4%) | 59 (88.1%) | 96 (82.8%) | .6222 |
Reference | 1.16 (.36–3.61) | 0.76 (.26–1.94) | ||
Discordance between type I error rate used and CI | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | .7343 |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Early stopping due to logistics | 2 (4.6%) | 4 (6.0%) | 15 (12.9%) | .1746 |
Reference | 1.33 (.25–9.93) | 3.12 (.83–20.34) | ||
Analysis used | .1035 | |||
ITT only | 4 (9.1%) | 9 (13.4%) | 28 (24.1%) | |
Reference | 1.55 (.47–6.04) | 3.18 (1.15–11.27) | ||
PP only | 4 (9.1%) | 9 (13.4%) | 8 (6.9%) | |
Reference | 1.55 (.47–6.04) | 0.74 (.22–2.89) | ||
ITT and PP | 36 (81.8%) | 49 (73.1%) | 80 (69.0%) | |
Reference | 0.60 (.23–1.51) | 0.49 (.20–1.12) | ||
Handling of missing data | 17 (38.6%) | 14 (20.9%) | 38 (32.8%) | .0945 |
Reference | 0.42 (.18–.97) | 0.77 (.38–1.61) | ||
Imputation of missing data | 1 (2.3%) | 4 (6.0%) | 13 (11.2%) | .1461 |
Reference | 2.73 (.39–54.39) | 5.43 (1.03–100.11) | ||
Worst case scenario | 16 (36.4%) | 12 (17.9%) | 27 (23.3%) | .0894 |
Reference | 0.38 (.16–.91) | 0.53 (.25–1.14) | ||
Sensitivity analyses | 1 (2.3%) | 1 (1.5%) | 9 (7.8%) | .1236 |
Reference | 0.65 (.03–16.77) | 3.62 (.65–67.71) | ||
Point estimate with CI | 43 (97.7%) | 58 (86.6%) | 104 (89.7%) | .1400 |
Reference | 0.15 (.01–.84) | 0.20 (.01–1.07) | ||
Figure of point estimate, CI, and noninferiority margin | 1 (2.3%) | 5 (7.5%) | 38 (32.8%) | <.0001 |
Reference | 3.47 (.53–67.69) | 20.95 (4.30–378.24) | ||
Comparison to historical control | 4 (9.1%) | 5 (7.5%) | 11 (9.5%) | .9064 |
Reference | 0.81 (.20–3.43) | 1.05 (.34–3.95) | ||
Misleading conclusions | 3 (6.8%) | 15 (22.4%) | 16 (13.8%) | .0797 |
Reference | 3.94 (1.20–17.85) | 2.19 (.68–9.76) |
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or after (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Supported by pharmaceutical industry | 34 (77.3%) | 49 (73.1%) | 52 (44.8%) | <.0001 |
Reference | 0.80 (.32–1.92) | 0.24 (.10–.51) | ||
Noninferiority in title | 5 (11.4%) | 8 (11.9%) | 46 (39.7%) | <.0001 |
Reference | 1.06 (.33–3.72) | 5.13 (2.03–15.72) | ||
Double-blinded | 30 (68.2%) | 36 (53.7%) | 54 (46.6%) | .0511 |
Reference | 0.54 (.24–1.19) | 0.41 (.19–.83) | ||
Noninferiority margin specified | 43 (97.7%) | 67 (100%) | 111 (95.7%) | .2769 |
Reference | N/A | 0.52 (.03–3.32) | ||
Noninferiority margin justified | 5 (11.4%) | 19 (28.4%) | 59 (50.9%) | <.0001 |
Reference | 3.09 (1.12–9.99) | 8.07 (3.22–24.71) | ||
Justification by clinical basis | 2 (4.6%) | 7 (10.5%) | 23 (19.8%) | .0275 |
Reference | 2.45 (.56–16.99) | 5.19 (1.44–33.30) | ||
Justification by prior studies | 2 (4.6%) | 9 (13.4%) | 26 (22.4%) | .0167 |
Reference | 3.26 (.79–22.10) | 6.07 (1.70–38.75) | ||
Justification by guidelines | 0 (0%) | 1 (1.5%) | 1 (.9%) | >.9999 |
Reference | N/A | N/A | ||
Justification by regulatory bodies | 4 (9.1%) | 6 (9.0%) | 32 (27.6%) | .0015 |
Reference | 0.98 (.26–4.05) | 3.81 (1.39–13.43) | ||
Justification by effect of control | 1 (2.3%) | 5 (7.5%) | 12 (10.3%) | .2636 |
Reference | 3.47 (.53–67.69) | 4.96 (.93–91.78) | ||
Adequate information for sample size re-calculation | 20 (45.5%) | 39 (58.2%) | 93 (80.2%) | <.0001 |
Reference | 1.67 (.78–3.63) | 4.85 (2.31–10.40) | ||
2-sided 95% CI or 1-sided 97.5% CI used | 38 (86.4%) | 59 (88.1%) | 96 (82.8%) | .6222 |
Reference | 1.16 (.36–3.61) | 0.76 (.26–1.94) | ||
Discordance between type I error rate used and CI | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | .7343 |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Early stopping due to logistics | 2 (4.6%) | 4 (6.0%) | 15 (12.9%) | .1746 |
Reference | 1.33 (.25–9.93) | 3.12 (.83–20.34) | ||
Analysis used | .1035 | |||
ITT only | 4 (9.1%) | 9 (13.4%) | 28 (24.1%) | |
Reference | 1.55 (.47–6.04) | 3.18 (1.15–11.27) | ||
PP only | 4 (9.1%) | 9 (13.4%) | 8 (6.9%) | |
Reference | 1.55 (.47–6.04) | 0.74 (.22–2.89) | ||
ITT and PP | 36 (81.8%) | 49 (73.1%) | 80 (69.0%) | |
Reference | 0.60 (.23–1.51) | 0.49 (.20–1.12) | ||
Handling of missing data | 17 (38.6%) | 14 (20.9%) | 38 (32.8%) | .0945 |
Reference | 0.42 (.18–.97) | 0.77 (.38–1.61) | ||
Imputation of missing data | 1 (2.3%) | 4 (6.0%) | 13 (11.2%) | .1461 |
Reference | 2.73 (.39–54.39) | 5.43 (1.03–100.11) | ||
Worst case scenario | 16 (36.4%) | 12 (17.9%) | 27 (23.3%) | .0894 |
Reference | 0.38 (.16–.91) | 0.53 (.25–1.14) | ||
Sensitivity analyses | 1 (2.3%) | 1 (1.5%) | 9 (7.8%) | .1236 |
Reference | 0.65 (.03–16.77) | 3.62 (.65–67.71) | ||
Point estimate with CI | 43 (97.7%) | 58 (86.6%) | 104 (89.7%) | .1400 |
Reference | 0.15 (.01–.84) | 0.20 (.01–1.07) | ||
Figure of point estimate, CI, and noninferiority margin | 1 (2.3%) | 5 (7.5%) | 38 (32.8%) | <.0001 |
Reference | 3.47 (.53–67.69) | 20.95 (4.30–378.24) | ||
Comparison to historical control | 4 (9.1%) | 5 (7.5%) | 11 (9.5%) | .9064 |
Reference | 0.81 (.20–3.43) | 1.05 (.34–3.95) | ||
Misleading conclusions | 3 (6.8%) | 15 (22.4%) | 16 (13.8%) | .0797 |
Reference | 3.94 (1.20–17.85) | 2.19 (.68–9.76) |
Abbreviations: CI, confidence interval; ITT, intention-to-treat analysis; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio; PP, per-protocol analysis.
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or After (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Randomization | .0522 | |||
High risk | 1 (2.3%) | 1 (1.5%) | 3 (2.6%) | |
Reference | 0.65 (.03–16.77) | 1.14 (.14–23.42) | ||
Low risk | 23 (52.3%) | 45 (67.2%) | 87 (75.0%) | |
Reference | 1.87 (.86–4.11) | 2.74 (1.33–5.69) | ||
Unclear | 20 (45.5%) | 21 (31.3%) | 26 (22.4%) | |
Reference | 0.55 (.25–1.20) | 0.35 (.17–.72) | ||
Allocation concealment | .0042 | |||
High risk | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Low risk | 10 (22.7%) | 37 (55.2%) | 57 (49.1%) | |
Reference | 4.19 (1.83–10.24) | 3.28 (1.53–7.58) | ||
Unclear | 33 (75.0%) | 28 (41.8%) | 53 (45.7%) | |
Reference | 0.24 (.10–.54) | 0.28 (.12–.59) | ||
Performance bias | .0350 | |||
High risk | 19 (43.2%) | 31 (46.3%) | 62 (53.5%) | |
Reference | 1.13 (.53–2.45) | 1.51 (.75–3.07) | ||
Low risk | 24 (54.6%) | 31 (46.3%) | 54 (46.6%) | |
Reference | 0.72 (.33–1.54) | 0.73 (.36–1.45) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 0 (0%) | |
Reference | 3.47 (.53–67.69) | N/A | ||
Detection bias | .0214 | |||
High risk | 11 (25.0%) | 26 (38.8%) | 52 (44.8%) | |
Reference | 1.90 (.83–4.54) | 2.44 (1.15–5.48) | ||
Low risk | 32 (72.7%) | 36 (53.7%) | 63 (54.3%) | |
Reference | 0.44 (.19–.97) | 0.45 (.20–.93) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 1 (.9%) | |
Reference | 3.47 (.53–67.69) | 0.37 (.01–9.59) | ||
Attrition bias | .0387 | |||
High risk | 20 (45.5%) | 22 (32.8%) | 36 (31.0%) | |
Reference | 0.59 (.27–1.28) | 0.54 (.26–1.10) | ||
Low risk | 24 (54.6%) | 40 (59.7%) | 79 (68.1%) | |
Reference | 1.23 (.57–2.67) | 1.78 (.87–3.63) | ||
Unclear | 0 (0%) | 5 (7.5%) | 1 (.9%) | |
Reference | N/A | N/A | ||
Reporting bias | .2974 | |||
High risk | 8 (18.2%) | 16 (23.9%) | 17 (14.7%) | |
Reference | 1.41 (.56–3.81) | 0.77 (.31–2.03) | ||
Low risk | 36 (81.8%) | 51 (76.1%) | 99 (85.3%) | |
Reference | 0.71 (.26–1.79) | 1.29 (.49–3.18) | ||
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | |
Reference | N/A | N/A |
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or After (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Randomization | .0522 | |||
High risk | 1 (2.3%) | 1 (1.5%) | 3 (2.6%) | |
Reference | 0.65 (.03–16.77) | 1.14 (.14–23.42) | ||
Low risk | 23 (52.3%) | 45 (67.2%) | 87 (75.0%) | |
Reference | 1.87 (.86–4.11) | 2.74 (1.33–5.69) | ||
Unclear | 20 (45.5%) | 21 (31.3%) | 26 (22.4%) | |
Reference | 0.55 (.25–1.20) | 0.35 (.17–.72) | ||
Allocation concealment | .0042 | |||
High risk | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Low risk | 10 (22.7%) | 37 (55.2%) | 57 (49.1%) | |
Reference | 4.19 (1.83–10.24) | 3.28 (1.53–7.58) | ||
Unclear | 33 (75.0%) | 28 (41.8%) | 53 (45.7%) | |
Reference | 0.24 (.10–.54) | 0.28 (.12–.59) | ||
Performance bias | .0350 | |||
High risk | 19 (43.2%) | 31 (46.3%) | 62 (53.5%) | |
Reference | 1.13 (.53–2.45) | 1.51 (.75–3.07) | ||
Low risk | 24 (54.6%) | 31 (46.3%) | 54 (46.6%) | |
Reference | 0.72 (.33–1.54) | 0.73 (.36–1.45) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 0 (0%) | |
Reference | 3.47 (.53–67.69) | N/A | ||
Detection bias | .0214 | |||
High risk | 11 (25.0%) | 26 (38.8%) | 52 (44.8%) | |
Reference | 1.90 (.83–4.54) | 2.44 (1.15–5.48) | ||
Low risk | 32 (72.7%) | 36 (53.7%) | 63 (54.3%) | |
Reference | 0.44 (.19–.97) | 0.45 (.20–.93) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 1 (.9%) | |
Reference | 3.47 (.53–67.69) | 0.37 (.01–9.59) | ||
Attrition bias | .0387 | |||
High risk | 20 (45.5%) | 22 (32.8%) | 36 (31.0%) | |
Reference | 0.59 (.27–1.28) | 0.54 (.26–1.10) | ||
Low risk | 24 (54.6%) | 40 (59.7%) | 79 (68.1%) | |
Reference | 1.23 (.57–2.67) | 1.78 (.87–3.63) | ||
Unclear | 0 (0%) | 5 (7.5%) | 1 (.9%) | |
Reference | N/A | N/A | ||
Reporting bias | .2974 | |||
High risk | 8 (18.2%) | 16 (23.9%) | 17 (14.7%) | |
Reference | 1.41 (.56–3.81) | 0.77 (.31–2.03) | ||
Low risk | 36 (81.8%) | 51 (76.1%) | 99 (85.3%) | |
Reference | 0.71 (.26–1.79) | 1.29 (.49–3.18) | ||
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | |
Reference | N/A | N/A |
Abbreviations: CI, confidence interval; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio.
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or After (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Randomization | .0522 | |||
High risk | 1 (2.3%) | 1 (1.5%) | 3 (2.6%) | |
Reference | 0.65 (.03–16.77) | 1.14 (.14–23.42) | ||
Low risk | 23 (52.3%) | 45 (67.2%) | 87 (75.0%) | |
Reference | 1.87 (.86–4.11) | 2.74 (1.33–5.69) | ||
Unclear | 20 (45.5%) | 21 (31.3%) | 26 (22.4%) | |
Reference | 0.55 (.25–1.20) | 0.35 (.17–.72) | ||
Allocation concealment | .0042 | |||
High risk | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Low risk | 10 (22.7%) | 37 (55.2%) | 57 (49.1%) | |
Reference | 4.19 (1.83–10.24) | 3.28 (1.53–7.58) | ||
Unclear | 33 (75.0%) | 28 (41.8%) | 53 (45.7%) | |
Reference | 0.24 (.10–.54) | 0.28 (.12–.59) | ||
Performance bias | .0350 | |||
High risk | 19 (43.2%) | 31 (46.3%) | 62 (53.5%) | |
Reference | 1.13 (.53–2.45) | 1.51 (.75–3.07) | ||
Low risk | 24 (54.6%) | 31 (46.3%) | 54 (46.6%) | |
Reference | 0.72 (.33–1.54) | 0.73 (.36–1.45) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 0 (0%) | |
Reference | 3.47 (.53–67.69) | N/A | ||
Detection bias | .0214 | |||
High risk | 11 (25.0%) | 26 (38.8%) | 52 (44.8%) | |
Reference | 1.90 (.83–4.54) | 2.44 (1.15–5.48) | ||
Low risk | 32 (72.7%) | 36 (53.7%) | 63 (54.3%) | |
Reference | 0.44 (.19–.97) | 0.45 (.20–.93) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 1 (.9%) | |
Reference | 3.47 (.53–67.69) | 0.37 (.01–9.59) | ||
Attrition bias | .0387 | |||
High risk | 20 (45.5%) | 22 (32.8%) | 36 (31.0%) | |
Reference | 0.59 (.27–1.28) | 0.54 (.26–1.10) | ||
Low risk | 24 (54.6%) | 40 (59.7%) | 79 (68.1%) | |
Reference | 1.23 (.57–2.67) | 1.78 (.87–3.63) | ||
Unclear | 0 (0%) | 5 (7.5%) | 1 (.9%) | |
Reference | N/A | N/A | ||
Reporting bias | .2974 | |||
High risk | 8 (18.2%) | 16 (23.9%) | 17 (14.7%) | |
Reference | 1.41 (.56–3.81) | 0.77 (.31–2.03) | ||
Low risk | 36 (81.8%) | 51 (76.1%) | 99 (85.3%) | |
Reference | 0.71 (.26–1.79) | 1.29 (.49–3.18) | ||
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | |
Reference | N/A | N/A |
. | 2007 or Before (N = 44) N (%) OR 95% CI . | 2008–2013 (N = 67) N (%) OR 95% CI . | 2014 or After (N = 116) N (%) OR 95% CI . | P Value . |
---|---|---|---|---|
Randomization | .0522 | |||
High risk | 1 (2.3%) | 1 (1.5%) | 3 (2.6%) | |
Reference | 0.65 (.03–16.77) | 1.14 (.14–23.42) | ||
Low risk | 23 (52.3%) | 45 (67.2%) | 87 (75.0%) | |
Reference | 1.87 (.86–4.11) | 2.74 (1.33–5.69) | ||
Unclear | 20 (45.5%) | 21 (31.3%) | 26 (22.4%) | |
Reference | 0.55 (.25–1.20) | 0.35 (.17–.72) | ||
Allocation concealment | .0042 | |||
High risk | 1 (2.3%) | 2 (3.0%) | 6 (5.2%) | |
Reference | 1.32 (.12–28.99) | 2.35 (.39–44.98) | ||
Low risk | 10 (22.7%) | 37 (55.2%) | 57 (49.1%) | |
Reference | 4.19 (1.83–10.24) | 3.28 (1.53–7.58) | ||
Unclear | 33 (75.0%) | 28 (41.8%) | 53 (45.7%) | |
Reference | 0.24 (.10–.54) | 0.28 (.12–.59) | ||
Performance bias | .0350 | |||
High risk | 19 (43.2%) | 31 (46.3%) | 62 (53.5%) | |
Reference | 1.13 (.53–2.45) | 1.51 (.75–3.07) | ||
Low risk | 24 (54.6%) | 31 (46.3%) | 54 (46.6%) | |
Reference | 0.72 (.33–1.54) | 0.73 (.36–1.45) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 0 (0%) | |
Reference | 3.47 (.53–67.69) | N/A | ||
Detection bias | .0214 | |||
High risk | 11 (25.0%) | 26 (38.8%) | 52 (44.8%) | |
Reference | 1.90 (.83–4.54) | 2.44 (1.15–5.48) | ||
Low risk | 32 (72.7%) | 36 (53.7%) | 63 (54.3%) | |
Reference | 0.44 (.19–.97) | 0.45 (.20–.93) | ||
Unclear | 1 (2.3%) | 5 (7.5%) | 1 (.9%) | |
Reference | 3.47 (.53–67.69) | 0.37 (.01–9.59) | ||
Attrition bias | .0387 | |||
High risk | 20 (45.5%) | 22 (32.8%) | 36 (31.0%) | |
Reference | 0.59 (.27–1.28) | 0.54 (.26–1.10) | ||
Low risk | 24 (54.6%) | 40 (59.7%) | 79 (68.1%) | |
Reference | 1.23 (.57–2.67) | 1.78 (.87–3.63) | ||
Unclear | 0 (0%) | 5 (7.5%) | 1 (.9%) | |
Reference | N/A | N/A | ||
Reporting bias | .2974 | |||
High risk | 8 (18.2%) | 16 (23.9%) | 17 (14.7%) | |
Reference | 1.41 (.56–3.81) | 0.77 (.31–2.03) | ||
Low risk | 36 (81.8%) | 51 (76.1%) | 99 (85.3%) | |
Reference | 0.71 (.26–1.79) | 1.29 (.49–3.18) | ||
Unclear | 0 (0%) | 0 (0%) | 0 (0%) | |
Reference | N/A | N/A |
Abbreviations: CI, confidence interval; N/A, not applicable, where numbers were too few in a cell within the 2 × 2 table to make accurate estimate of odds ratio; OR, odds ratio.
Discussion
This systematic review assessed the methodological and reporting quality of antibiotic noninferiority trials. The increase in numbers of antibiotic noninferiority trials over time likely reflects overall increase in new antibiotic development as a result of financial incentives and increased regulatory flexibility [21]. Overall, the majority of antibiotic noninferiority trials were reasonably well conducted and reported. Only a minority of trials had substantial deficiencies in the study design, reporting, or interpretation of results. Deficiencies included a lack of justification of the noninferiority margin, lack of reporting of ITT with PP analyses, and having misleading conclusions. With the exception of justification of the noninferiority margin, there may be a trend of a decrease in methodological and reporting quality over time. Industry-supported studies were less likely to be stopped early and to report inconclusive results.
Deficiencies identified in our systematic review were also previously found in systematic reviews of noninferiority trials not specifically on antibiotics. Justification of noninferiority margin was missing in 63% of studies in our systematic review, whereas it ranged from 54% to 80% in prior systematic reviews [4–6]. Henanff et al found misleading conclusions in 12% of studies [4], and we found misleading conclusions in 15% of studies. Reporting of both ITT and PP analyses occurred in 42%–54% of studies in prior systematic reviews [4–6, 22], which was lower than the 73% of studies found in our systematic review. This may be due to the requirement of both ITT and PP analyses in regulatory body guidelines on noninferiority RCTs [6].
Similar to our review, a prior systematic review showed that industry-supported studies were more likely to be of better quality in terms of methodology and have favorable outcomes [23]. This was attributed to possible publication bias, where studies with unfavorable results may have been suppressed from being published by pharmaceutical industries [23]. Our study results suggest that another factor contributing to more favorable results could be that pharmaceutical industry studies had enough resources to see studies to completion. Finally, FDA guidance likely acted as quality assurance for pharmaceutical industry trials. Infectious Diseases Society of America activism has contributed to the publication of noninferiority trial guidance for different infection syndromes by the FDA [24–28]. These guidance documents outline standards such as noninferiority margin and population analysis [24–28]. Pharmaceutical industry trials must uphold these high standards to receive approval for the drug.
A prior systematic review on noninferiority trials of all drugs showed no improvement over time before and after publication of the CONSORT statements [6]. We found a possible trend of declining quality over time. This difference occur because this prior systematic review included studies up to 2009 [6], and our systematic review included additional studies from 2009 to 2019.
Our review has several implications that could improve the methodological and reporting quality of future antibiotic noninferiority RCTs. First, the reporting quality of antibiotic noninferiority trials may be declining over time, despite publication of the CONSORT guidelines [2, 3]. One possible explanation is the exponential increase in the number of noninferiority trials. In the past, a noninferiority design was rare and novel. The investigators who conducted such trials were likely more familiar with and attentive to the study design and reporting. The decreasing proportion of industry-funded studies over time may also contribute to the decrease in quality, because industry-supported studies were more likely to fulfill certain quality indicators, such as blinding, and not having misleading conclusions. In this systematic review, we have highlighted deficiencies including lack of justification of the noninferiority margin, lack of reporting of both ITT as well as PP analyses, and having misleading conclusions. In the future, authors and journals should focus on these areas during the study design, manuscript writing, and peer review process.
Second, we found that nonpharmaceutical supported studies were more likely to be stopped early for logistical reasons leading to inconclusive results. This suggests that feasibility factors may be overlooked in the study design and review of grant application stage by government and other public funding agencies. Funding agencies should require pilot/feasibility data before funding large noninferiority trials; however, these agencies should have a low threshold to support such pilot/feasibility studies. It is of utmost importance that noninferiority trials in infectious diseases can be conducted without the support of the pharmaceutical industry to optimize the use of existing antibiotics such as studies on shorter durations, oral instead of parental routes, better safety, and lower cost.
Our study has several strengths. First, we undertook a comprehensive and inclusive literature search of 4 databases. Second, the data extraction was systematic and rigorous with completion by 2 independent reviewers. Third, the assessment of reporting quality and risk of bias was thorough and based on established guidelines [2, 3, 20] as well as prior studies to improve comparability [4–6].
There are several limitations that merit mentioning. First, we excluded publications not in English; however, only 22 (0.5%) studies were excluded based on language alone. Second, quality based on the reporting of the published article may not necessarily reflect the quality of the study itself. Discrepancies between protocols or trial registry entries and the published article are not an uncommon occurrence [29]. Therefore, it is theoretically possible that a rigorously conducted study may not be well reported. Third, our strict expectations for methodological and reporting quality may not be practical for all trials to fulfill. For example, blinding for a noninferiority trial on intravenous vs oral antibiotics would necessitate an intravenous placebo for an extensive period. This would be impractical and expose the oral antibiotic treatment arm to unnecessary and significant risk associated with long-term intravenous catheters. However, these were rare occurrences.
In conclusion, we found room for improvement in the methodology and reporting of antibiotic noninferiority RCTs that authors and journals can work on during the study design, manuscript writing, and peer review process. Publication of better quality noninferiority studies ensures that antibiotic therapy used in clinical practice is in accordance with the best possible evidence. Although noninferiority trials provide almost all of the current evidence on new antibiotics, it should be acknowledged that noninferiority trials might limit the possibility of major advances over existing antibiotics [11]. Therefore, we hope to see a shift toward testing for superiority in future antibiotic trials [7, 12].
Supplementary Data
Supplementary materials are available at Clinical Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.
Notes
Author contributions. A. D. B., M. L., and D. M. conceived and designed the study. A. D. B., A. S. K., C. K. L. L., P. T., X. X. L., V. M., A. C., V. R. K., A. F., and Y. L. performed abstract screening and data extraction from full text. A. D. B. performed the analysis and wrote a first draft of the manuscript. All authors reviewed and revised the manuscript. All authors approved the final manuscript to be submitted.
Acknowledgments. The authors thank Neera Bhatnagar for her guidance on search strategy.
Potential conflicts of interest. M. L. reports a contract with the WHO to help update antibiotics in the Essential Medicines List. All other authors declare that they have no competing interests. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.