A Comparative Analysis of Statistical Methods to Estimate the Reproduction Number in Emerging Epidemics, With Implications for the Current Coronavirus Disease 2019 (COVID-19) Pandemic

Abstract Background As the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic continues its rapid global spread, quantification of local transmission patterns has been, and will continue to be, critical for guiding the pandemic response. Understanding the accuracy and limitations of statistical methods to estimate the basic reproduction number, R0, in the context of emerging epidemics is therefore vital to ensure appropriate interpretation of results and the subsequent implications for control efforts. Methods Using simulated epidemic data, we assess the performance of 7 commonly used statistical methods to estimate R0 as they would be applied in a real-time outbreak analysis scenario: fitting to an increasing number of data points over time and with varying levels of random noise in the data. Method comparison was also conducted on empirical outbreak data, using Zika surveillance data from the 2015–2016 epidemic in Latin America and the Caribbean. Results We find that most methods considered here frequently overestimate R0 in the early stages of epidemic growth on simulated data, the magnitude of which decreases when fitted to an increasing number of time points. This trend of decreasing bias over time can easily lead to incorrect conclusions about the course of the epidemic or the need for control efforts. Conclusions We show that true changes in pathogen transmissibility can be difficult to disentangle from changes in methodological accuracy and precision in the early stages of epidemic growth, particularly for data with significant over-dispersion. As localized epidemics of SARS-CoV-2 take hold around the globe, awareness of this trend will be important for appropriately cautious interpretation of results and subsequent guidance for control efforts.

The reproduction number, R, is a key epidemiological parameter that quantifies the average number of new infections caused by a single infected individual. When a pathogen emerges in an entirely susceptible population, this parameter is referred to as the basic reproduction number, R 0 . When some population-level immunity exists, the parameter is referred to as the effective reproduction number, R e . These reproduction numbers provide valuable information about a pathogen's potential for spread in a population and the associated implications for control efforts [1][2][3]. Reproduction number estimates for any 1 pathogen are time and context specific, with factors such as contact patterns, population immunity, and behavioral change contributing to variability in these estimates [4,5]. For these reasons, the reproduction number is usually monitored over time to track the progress of an outbreak. The instantaneous reproduction number, R t , estimates the average number of secondary infections generated by an infected individual at time t, and sequential estimates over the course of an outbreak can provide valuable insights into the need for interventions and/or the effectiveness of control programs already in place. In the case of the Walling a and Teunis method, R t is used to refer to the case reproduction number, which quantifies transmission within a cohort of individuals-that is, individuals with the same date of infection or symptom onset-providing a retrospective quantification of transmissibility.
Numerous mathematical and statistical methods have been developed to estimate the reproduction number of an emerging pathogen [6][7][8][9][10][11][12]. The choice of method to be used largely depends on the available data, with more data typically allowing the parameterization and use of more complex methods. However, in the early stages of an outbreak, epidemiological data are often sparse and highly uncertain. In the case of emerging pathogens, data can be particularly limited, as surveillance systems may be unprepared for the detection and reporting of a new pathogen. In addition, limited understanding of the dynamics of a new pathogen can often inhibit the parameterization and use of more complex transmission models. This has particularly been the case for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which was first reported in China in December 2019 and has reached more than 235 countries and territories as of October 2020 [13]. Though new evidence is rapidly emerging on aspects of transmission dynamics, such as generation intervals and the proportion of asymptomatic infections [14], uncertainties remain as to the duration of immunity, the role of seasonality, and the effects of immunological cross-reactivity with endemic human coronaviruses. Incorporating such uncertainties into mechanistic models of transmission can prove challenging in early outbreak contexts, where statistical methods are often the only available tool to infer the level of transmission from a limited amount of data, such as a time series of reported case numbers. Quantifying local transmission patterns of SARS-CoV-2 has been, and will continue to be, critical for guiding the pandemic response [15][16][17][18]. Understanding the accuracy and limitations of these methods in the context of real-time outbreak analysis scenarios is crucial to optimally inform outbreak response activities, including the need for new control efforts or the relaxation of efforts already in place.
Here, we conduct a comparative performance analysis of 7 commonly used statistical methods to estimate the reproduction number at sequential time points during the early stages of an emerging epidemic, in line with a real-time outbreak analysis scenario.

METHODS
Incidence data were simulated using a deterministic compartmental SEIR (susceptible, exposed, infectious, recovered) model of transmission, assuming closed populations of variable sizes and homogeneous mixing; random, time-constant basic reproduction numbers (R 0 ); a variable number of infections seeding the epidemics; underreporting; and different levels of stochasticity in the reporting and infection process (details are given in Supplementary Methods S1. Estimating an average, fixed R 0 , we assessed the performance of each statistical method by fitting it to the simulated epidemic data at sequential time points in the epidemic growth phase, in line with a real-time outbreak analysis scenario. A detailed description of each of the 7 statistical methods is given in Supplementary Methods S2. A fixed gamma distribution for the generation time interval, with a mean of 20 days and standard deviation of 7.4 days, was assumed to be known for all methods [19]. We initiated the analysis 6 weeks into each epidemic, fitting to the first 6, 9, 12, and 15 weeks, and so on (approximating to 2, 3, 4, 5, etc. generation times), up to the peak of the epidemic, simply defined as the week with the maximum number of cases from the entire epidemic curve. The performance of each method was assessed at each of these stages, with methods being fit to an increasing number of data points. We used a number of metrics to assess method performance: bias, calculated as the absolute difference between the estimated and true R 0 values; coverage, calculated as the proportion of times the estimated 95% confidence intervals (CIs) included the true R 0 value; Pearson correlation coefficient; width of 95% CIs; and the root mean square error (RMSE). To ensure a systematic comparison of performance, we only compared the results from sections of the epidemic time series where all methods were able to produce estimates of R 0 .
We applied each method to weekly case-notification data from the 2015-2016 Zika epidemic in Latin America and the Caribbean. National-level data were available for 37 countries, 3 of which reported their peak incidence within 6 weeks of the first reported case and were subsequently removed from the analysis. In the same manner as the simulated data, we applied each of the 7 methods to an increasing number of data points (using the first 6, 9, 12, 15, etc. weeks), up to the peak of the epidemic.

RESULTS
Biases in estimates of R 0 (estimated R0-actual R 0 ) when fitted to an increasing number of time points in the case time series (6,9,12, and 15 weeks) of simulated data are shown in Figure 1 and Supplementary Figure S2 for the scenario of data with no added noise (ie, assuming that a constant proportion of infected individuals are detected and reported as cases). Estimates of R 0 were frequently overestimated at all time points assessed, the magnitude of which were generally greater for higher values of actual R 0 (Supplementary Figure S2). As expected, higher values of R 0 used for data simulation generally resulted in epidemics that peaked at earlier time points, subsequently resulting in a reduction in the number of epidemic simulations where the growth phase continues beyond 9, 12, and 15 weeks, and so on, and resulting in an underrepresentation of high R 0 simulations at the later epidemic stages assessed (Supplementary Figure S2). To ensure a systematic analysis of methodological performance at sequential time points in the epidemic time series, the assessment of performance was restricted to epidemic simulations that peaked at or after 15 weeks, highlighted by the blue points in Supplementary Figure S2.
The average bias between estimated and actual R 0 values was highest in the earliest stages assessed for all methods considered ( Figure S4), and negative binomial noise (Supplementary Figure S5). Coverage of actual R 0 values-that is, the proportion of simulations in which the 95% CI contained the actual value of R 0 -decreased when fitted to an increasing number of time points in the case time series for all methods assessed and in all noise scenarios (Figure 2). This decrease in coverage of actual R 0 values corresponds to reduced uncertainty-that is, width of Figure 2. Comparative analysis of the performance of the 6 methods at different stages of the epidemic. Columns represent the 3 data noise scenarios explored (no noise, Poisson noise, and negative binomial noise). Rows represent different performance metrics: absolute bias (the absolute average difference between estimated and true R 0 values), uncertainty (95% confidence interval width), coverage (proportion of times in which the true R 0 value is within the estimated 95% confidence intervals), PCC, and RMSE. Abbreviations: BR, Bettencourt and Ribeiro; EG_Lin, linear exponential growth rate method; EG_MLE, maximum likelihood exponential growth rate method; EG_P, Poisson exponential growth rate method; PCC, Pearson correlation coefficient; RMSE, root mean squared error; WP, White and Pagano method; WT, Wallinga and Teunis.
Downloaded from https://academic.oup.com/cid/article/73/1/e215/5932264 by Imperial College London Library user on 12 July 2021 the 95% CIs-surrounding the estimates of all methods when fitted to an increasing number of time points, as shown in Figure 2. In the scenario of data with no noise, coverage ranged from 46% (Bettencourt & Ribeiro) to 96% (Wallinga and Teunis) at 6 weeks and from 25% (linear exponential growth) to 75% (Wallinga and Teunis) at 15 weeks. When fitted to simulations with Poisson random noise, coverage ranged from 54% (EpiEstim) to 92% (Wallinga and Teunis) at 6 weeks and from 50% (Poisson and maximum-likelihood exponential growth) to 74% (Wallinga and Teunis) at 15 weeks. For simulations with negative binomial random noise, coverage ranged from 51% (EpiEstim) to 98% (linear exponential growth) at 6 weeks and from 45% (Poisson and maximum likelihood exponential growth) to 90% (linear exponential growth) at 15 weeks. A significant increase in the levels of uncertainty in the presence of increasing random noise was observed for each of the exponential growth methods (Figure 2). This increase in uncertainty was most notable for the linear exponential growth method, with average CI widths (averaged across all time points assessed) of 0.87, 2.70, and 4.13 when fitted to data with no noise, mild noise (Poisson), and high noise (negative binomial), respectively. Only minor increases in uncertainty were observed for the EpiEstim (1.55, 1.61, and 1.73 for no noise, mild noise, and high noise, respectively), White and Pagano (2.24, 2.33, and 2.52 for no noise, mild noise, and high noise, respectively) and Bettencourt and Ribeiro methods (1.35, 1.76, and 1.84 for no noise, mild noise, and high noise, respectively). No significant differences in levels of uncertainty were observed for the Walling a and Teunis method with the addition of random noise to the data (3.74, 3.49, and 3.48 for no noise, mild noise, and high noise, respectively).
The Pearson correlation coefficient between estimated and actual R 0 values was close to 1 for all methods in the scenario of data with no added noise at all time points, though a slight reduction in this correlation was observed at 15 weeks for all methods except linear exponential growth ( Figure 2). Reduced correlations between estimated and actual R 0 values were observed for the scenarios of data with random noise at all stages of the epidemic, the most significant reduction of which was observed at the earliest time point assessed (6 weeks). The RMSE decreased when fitted to increasing numbers of data points for all methods considered and in all data noise scenarios. Trends in method-specific RMSE were similar for data with no noise and Poisson noise, though RMSEs increased significantly in the presence of negative binomial noise for all methods, particularly in the earlier epidemic stages assessed. Figure 3 shows the R 0 estimates obtained from fitting each of the 7 methods to empirical outbreak data, with the examples of French Guyana, Martinique, Puerto Rico, and the US Virgin Islands highlighting empirical outbreaks where estimates of R 0 decreased when fitting to an increasing number of time points in the case time series during the phase of early epidemic growth.
The estimates obtained for all Latin American and Caribbean countries are shown in Supplementary Figure S7. R 0 estimates were generally higher in the early stages of the epidemic for all methods assessed, with estimates decreasing gradually over time. Confidence interval widths were also observed to generally decline when fitted to an increasing number of data points. Frequent inconsistencies between estimates of R 0 produced by different methods on the same data were observed, particularly during the early growth phase of the epidemics, such as in French Guyana, Martinique, and Brazil.
Density distributions of bias in R 0 estimates when fitting to epidemic simulations using different generation time distributions are shown in Figure 4. Trends in bias over time were similar when fitted to simulations for Zika and Ebola generation time distributions. A larger average bias was observed for most methods and time points when fitted to simulations for the SARS generation time distribution. Biases in estimates of R 0 when the mean generation time is incorrectly specified are shown in Supplementary Figure S8. Overestimation of the mean generation interval resulted in an overestimation of R 0 estimates, while underestimation of the mean generation interval resulted in an underestimation of R 0 estimates, as compared to results where the mean generation interval was correctly specified. A further sensitivity analysis regarding the assumed reporting fraction showed reduced bias at the earliest fitting stage (6 weeks) when assuming 100% of infection reporting, as compared to assuming only 20% of infection results in reported cases (Supplementary Figure S9).

DISCUSSION
In the early stages of an outbreak response, efforts are often dedicated to estimating pathogen transmissibility in order to provide information on the potential for spread in the current population and to inform the type and scale of interventions required for control [20,21]. Our results show variable accuracy of R 0 estimates obtained, both between individual methods and across different stages of the early growth phase of the epidemic. Estimates of R 0 obtained for the earliest stages of the epidemic assessed in this analysis (ie, at 6 and 9 weeks, corresponding to approximately 2 and 3 generation time intervals, respectively) were associated with larger bias and uncertainty for all methods assessed. Using simulated epidemic data, we found that estimates of R 0 frequently overestimated the actual R 0 value used in the simulation process, particularly in the early fitting stages, even when the true case time series is observed. Estimates of R 0 became increasingly accurate for all methods when fitted to an increasing number of weeks in the case time series. A higher average absolute bias was observed for all methods when fitting to simulated data with added random noise, particularly in the earlier stages assessed (Supplementary Figure S6).
Whilst the overestimation of R 0 in the early stages of an epidemic may be preferable to underestimation for purposes relating to the planning of outbreak response activities, reductions in the magnitude of this overestimation over time can potentially lead to misconceptions about trends in transmissibility. The trend of a decreasing average R 0 estimated in the early stages of epidemic growth was also observed for some countries when fitting to national case surveillance data from the 2015-2016 Zika epidemic in Latin America and the Caribbean (eg, French Guyana, Martinique, Puerto Rico, and the US Virgin Islands, as shown in Figure 3). The limited efficacy of existing interventions against vector-borne pathogens such as Zika suggest the possibility that methodological bias could explain these trends, rather than reductions in transmission at that time. In addition, an early analysis of the SARS-CoV-2 epidemic in China estimated a declining trend in the reproduction number, from 7.93 (95% CI, 5.00-12.00) on 29 December 2019 to 2.60 (95% CI, .57-5.17) on 18 January 2020 when using the Wallinga and Teunis method, attributing this reduction to the effectiveness of prevention and control measures taken at that time [22]. However, the largest decrease in R t estimated by Liu et al [22] was observed between 29 December 2019 and 2 January 2020, prior to the implementation of significant interventions, from which point the estimates of R largely stabilized for the remainder of the period of estimation. Awareness of this bias is crucial for critical evaluation of the transmissibility estimates obtained in the early epidemic stages and for accurately interpreting subsequent implications for control [23].
In the current context of the SARS-CoV-2 pandemic, caution is required when using these methods for the estimation of R 0 . As countries expand surveillance systems to better manage the pandemic, the assumption of constant reporting of cases implicit in these methods likely does not hold true until the capacity of the surveillance system has stabilized and a consistent case definition is applied. In these contexts, whilst testing capacity is growing, the use of hospital admission or death data, where available, may be preferable for inferring R 0 if the reporting of these data is believed to be constant in time. Delays in the time from infection to hospitalization and/or death, however, result in significant lags between when transmission  events occur and when they can be quantified. Regardless of the metric used for estimation of R 0 , we urge caution when interpreting short-term fluctuations in these estimates, as variable data quality and methodological accuracy may play a role in these trends.
There are a number of limitations in our analysis, including the simplicity inherent in the assumptions of homogenous mixing in fully susceptible and closed populations that were used to simulate the case data. These assumptions do not reproduce the subexponential growth dynamics that can occur in empirical epidemics, as driven by complex spatial structures, local contact networks, and other socio-behavioral factors not accounted for in this analysis [24,25]. Many methods exist that account for additional complexities in the epidemic process, such as simultaneous estimation of R and the serial interval, the use of household or contact tracing data, or the importation of cases from external populations [26][27][28]. The results of this analysis are based on weekly epidemic case-time series, which may not fully reflect results obtained when daily case reports are available. However, whilst the larger number of data points provided by daily time series may potentially improve methodological performance in the early epidemic stages, daily case counts may be subject to larger overdispersion and substantial day-of-the-week effects. In addition, noise was added to the case time series in the form of random errors to reflect stochasticity in the infection and reporting processes, inherently assuming the noise to be random. This analysis does not account for systematic forms of noise/bias, such as reporting delays or changes in health-seeking behavior or case definitions. Here, we show that R 0 estimates frequently overestimate the actual R 0 values even in the absence of noise, when we assume that a constant proportion of infections is accurately detected and reported, when the true generation time is known, and when the assumption of exponential growth is met (both in the simulation model and in the estimation method). Our finding that the bias in R 0 estimates persists even in the absence of underreporting in the simulation stage suggests that the bias may be attributable to methodological differences, which warrants further research. The difference in bias observed at the earliest fitting stage under different assumptions of reporting rates can be explained by missing infections at the beginning of the case time series, when underreporting is present. Delayed detection and reporting of cases at the beginning of an epidemic is common, particularly for newly emerged pathogens, and caution should be taken when considering the first reported cases, which for many pathogens are unlikely to be the first acquired infections.
In this analysis, we highlight the varying strengths and limitations of 7 commonly used statistical methods for estimating R 0 in emerging epidemics. We show how the performance of these methods can vary over time when fit to increasing amounts of data, in line with a real-time outbreak analysis scenario, and demonstrate the sensitivity of methodological performance to varying levels of random noise in the data. This has important implications for the ongoing SARS-CoV-2 pandemic, as we show that true changes in transmissibility in the early stages of epidemic growth may be difficult to disentangle from changes in methodological accuracy and precision, particularly for data with significant overdispersion. Cautious interpretation is warranted when using these methods to infer transmission patterns in the early stages of epidemic growth, as apparent declines in R 0 estimates may be misattributed to the effectiveness of control efforts, and lead to incorrect conclusions about the course of the epidemic. Several generations of pathogen transmission may need to be observed for accurate R 0 estimation by these methods. An awareness of this trend in bias over time is crucial for appropriate interpretation of R 0 estimates and any subsequent implications for the planning of outbreak response activities.

Supplementary Data
Supplementary materials are available at Clinical Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.
Notes M. O. and C. H. contributed equally to this work. The data and R code necessary to reproduce this analysis are available on GitHub: https://github. com/meganodris/R0-methods-comparison.