Capturing the time-varying drivers of an epidemic using stochastic dynamical systems

Epidemics are often modelled using non-linear dynamical systems observed through partial and noisy data. In this paper, we consider stochastic extensions in order to capture unknown influences (changing behaviors, public interventions, seasonal effects etc). These models assign diffusion processes to the time-varying parameters, and our inferential procedure is based on a suitably adjusted adaptive particle MCMC algorithm. The performance of the proposed computational methods is validated on simulated data and the adopted model is applied to the 2009 H1N1 pandemic in England. In addition to estimating the effective contact rate trajectories, the methodology is applied in real time to provide evidence in related public health decisions. Diffusion driven SEIR-type models with age structure are also introduced.


Introduction
Epidemic models are often used to simulate disease transmission dynamics, detect emerging outbreaks (Unkel and others , 2012), and assess public health interventions (Boily and others, 2007). In order to capture the dynamics of epidemics, the main focus is generally made on their intrinsically 0 To whom correspondence should be addressed. dynamic elements such as the depletion of susceptibles or the population immunity evolution. Nevertheless, there are time-varying extrinsic factors that are crucial to the epidemic course. These may include social cycles (holidays), public interventions and climatic variations. This has been illustrated for diseases such as cholera, malaria (Cazelles and others , 2005;Ionides and others, 2006) or influenza (Shaman and Kohn, 2009). These studies were conducted either by relating climatic and incidence time-series (Cazelles and others, 2005), which does not disentangle the effect of intrinsic and extrinsic factors, or by experimentally assessing the virus resistance in different climatic conditions (Shaman and Kohn, 2009) requiring an extrapolation to the population scale. Overall, the time-varying nature of epidemics poses a challenging statistical problem stressing the need for suitable computational tools (Ferguson, 2007).
This paper considers a flexible modelling framework that encompasses time-varying aspects of the epidemic via stochastic differential equations. We aim at providing robust inferential procedures, incorporating the uncertainty associated with key parameters and accounting for data and model limitations. In order to provide an accurate and feasible computational toolbox, we provide Markov Chain Monte Carlo (MCMC) algorithms utilising recent developments such as particle MCMC (PMCMC) algorithms (Andrieu and others, 2010) and adaptive techniques (Roberts and Rosenthal, 2009). Modelling aspects are presented in Section 2, while the computational framework is presented in Section 3. In Section 4 we evaluate the performance of the proposed adaptive PMCMC schemes on simulated data. In Section 5 we present various applications of the methodology to the 2009 A/H1N1 pandemic, and conclude, in Section 6, with some relevant discussion. Further simulations can be found in the Supplementary Materials.

Epidemic models with time-varying coefficients
We adopt a SEIR model as a guide in this paper, although the methodology can be applied to other dynamical systems. The model is set in (1); S accounts for susceptible, E for infected but not infective, I for infective, and R for removed individuals. New infections occur at a rate βS t It N , implying that the susceptible individuals make effective contacts at rate β (the effective contact rate), and only a fraction It N of these contacts are made with infective individuals. The average period spent in compartments E and I is given by k −1 and γ −1 respectively.
The basic reproduction number, R 0 , represents the number of secondary infections from a primary infected individual in a fully susceptible population. A related quantity is the effective reproduction number, R t , refers to the number of secondary cases from an infected individual at time t. R t is a context-dependent quantity of high interest to policy makers as it indicates the possibility for the epidemic to grow (R t > 1) or to decrease (R t < 1) (Anderson and May, 1992). Epidemic models can be quite detailed (including individual characteristics, geographic information etc.) or basic, such as the SEIR model, that geographically aggregates the cases and assumes deterministic transmission processes, occurring at a given frequency each time infected and susceptible meet. The latter are easier to estimate and interpret, but are based on strong assumptions that could lead to poor inference. In this paper we adopt stochastic extensions of the deterministic SEIR models. The additional dynamic error is likely to contain structural mis-specifications and can subsequently be explored and potentially revised. We focus on large-scale epidemics, for which random effects in transmission processes can be considered to be well-approximated deterministically (Kurtz, 1981). We adopt the paradigm that attributes the model limitations mainly to the time varying nature of the effective contact rate, henceforth denoted as β t , rather than to the variability in individual characteristics or in transmission processes.
An early approach to estimate R t can be found in Fine and Clarkson (1982). It can be implemented through discrete generation models or by reconstructing the chain of transmission (Cauchemez and others , 2006;Griffin and others, 2011). However, as R t estimates contain both the effects of evolving transmissibility and immunity, quantitative conclusions can hardly be generalised to situations where the immunological situation is different. We therefore concentrate on estimating β t rather than R t . A number of approaches use a finite-dimension function space for the trajectory of β t . Low-dimensional examples can be found in Cauchemez and others (2008), in which β t is modeled as a piece-wise linear function. In some higher-complexity models, as in  and Ionides and others (2006), β t is estimated freely with a few-weeks resolutions. Loosely speaking, as the number of parameters for the trajectory of β t increases, model-induced biases fade out at the expense of the variance. A compromise is required to improve robustness and is often controlled through a regularising parameter. For example, in He and others (2011), β t is estimated using cubic splines, calibrated via AIC.

Diffusion driven epidemic models
We consider models where diffusion processes are used for some of the coefficients in (1). Although alternative formulations are possible, as discussed in Section 2.1, we focus on β t to get (2) where µ x (·) denotes the drift, σ x (·) the volatility and h(·) is a positive-valued function. The assigned diffusion may capture features such as behaviour changes, preventive measures, seasonal effects, holidays etc. When prior knowledge on β t is available, it can be reflected in µ x (·) and σ x (·); e.g. if the contact rate is expected to converge, an Ornstein Uhlenbeck process can be chosen. Other options may include a sigmoid or a sinusoidal form; see for example (Rasmussen and others, 2011). In absence of prior information or when the researcher wants to impose little restrictions, a Brownian motion can be used, with µ x (·) ≡ 0 and σ x (·) ≡ σ (i.e. θ x = σ). This model, with h(·) ≡ log(·), is henceforth denoted as BM. The obtained output can be either reported or used as an exploratory tool to construct a more structured model; see Section 5.3 for an application. The choice of BM implies a continuous, yet non-differentiable, path satisfying the Markov property. In cases where β t is believed to evolve as a smooth function in time, higher order Brownian motions could be used. Loosely speaking, these may be regarded as equivalent to non-parametric approaches such as cubic splines (Wahba, 1990), with the model in (2) imposing a prior on β t and σ being a regularising factor. The rate β t can be perceived as a product of a smooth and a rough component; the former being a population average of the intrinsic transmission procedure and latter containing extrinsic factors such as the amount of contact among individuals. It is therefore important to build a framework that contains both smooth and rough models.
The above model can be estimated with an Extended Kalman Filter (EKF), as in Cazelles and Chau (1997). EKF allows for fast computations, but is based on Taylor and Gaussian approximations whose error could be non-negligible; see Supplementary Materials for a relevant simulation experiment. Nevertheless, the EKF can still be used as a tool to construct efficient proposal distributions for MCMC schemes. It can also be used to optimize sequential Monte Carlo (SMC) algorithms, but either at a strong computational cost (Särkkä and Sottinen, 2008) or crude time discretisations (Dukic and others, 2009). Next, we develop a general framework for efficient MCMC schemes that allow for good approximations.

Data augmentation via MCMC for diffusion driven epidemic models
This section presents a general inferential framework for diffusion-driven epidemic models. We adopt the Bayesian paradigm to incorporate parameter uncertainty and prior information in the estimates of β t trajectories. The problem can also be cast as estimating partially observed hypoelliptic diffusions, thus presenting various difficulties (Pokern and others, 2009). We begin by setting the model and justifying the need for data augmentation. Existing MCMC algorithms are considered but they can lead to extremely inefficient MCMC chains. We address the issue by taking advantage of the specific model structure to construct adaptive PMCMC schemes.

Model and data augmentation setup
For ease of exposition we focus on models satisfying (2), but the framework covers models with different ODE systems or more time-varying coefficients, as in Section 5.3. Being in continuous time, t can take any value between t 0 and t n . We denote the path of the ODE states vector V t = {S t , E t , I t , R t } between observation times t i and t j by V i:j . The data, y 1:n = {y t 1 , .., y tn }, usually provide information for I t at specific times (prevalence data) or for integrals of V t (incidence data). In either case, we assume that they are obtained with error as the collection procedure is typically associated with additional uncertainty. The noise distribution is denoted with P y with density f (y 1:n |V 0:n , θ y ). Note that, in the model of (2), V t can be written as a deterministic function, g(·), of x t and the parameters θ v = (k, γ, V 0 ). This function is the solution of the ODE and can be written as an intractable time integral involving x t . Hence, the model becomes y 1:n |V 0:n , θ y ∼ P y (y 1:n |V 0:n , θ y ), V 0:n = g(x 0:n , θ v ) Denote with P x the distribution of the diffusion x t defined from the SDE above. We require the existence of a unique weak solution which translates into some mild assumptions on µ x (.) and σ x (.); e.g. locally Lipschitz with a linear growth bound, see for example Øksendal (2003). The distribution of P x may also be viewed as a prior on x t , or else β t . The model can now be defined from P y , P x , and the assigned priors on θ = {θ y , θ v , θ x }, denoted by π(θ) π(x 0:n , θ|y 1:n ) ∝ f (y 1:n |V 0:n , θ y ) × dP x × π(θ) Given direct observations on x t , it would have been possible to draw approximationfree inference on dP x using the approach of Beskos and others (2006). However, this is not possible in our case given the non-linear functionals in g(·) that render (3.4) intractable.We proceed by discretizing the path of x t , and therefore of β t and V t . More specifically, we introduce m points between each pair of successive observation times t i and t i+1 (i = 0, 1, . . . , n − 1). When referring to the discrete representation of a path, the superscript dis will be used; for example for a step δ = 1 m+1 , the discrete skeleton of x t will be denoted by x dis 0:n = {x 0 , x δ , x 2δ , . . . , x tn }. The presence of x dis 0:n allows for approximations of (4) through the Euler-Maruyama scheme to evaluate dP x Moreover, given x dis 0:n ,the ODE can be solved numerically to obtain V dis 0:n and evaluate f (·). The approximation error can be made arbitrarily small by increasing the user-specified parameter m.

Data augmentation via Gibbs schemes
Model (3) can be put in the context of Chib and others (2006), Golightly and Wilkinson (2008) or Kalogeropoulos (2007). In these approaches, a Gibbs scheme can be used to sample from the joint posterior in (4) of x dis 0:n and θ. The data augmentation algorithm alternates between drawing x dis 0:n given θ, and updating θ conditional on the augmented path x dis 0:n . The MCMC protocol ensures that the chain provides samples from the marginal posteriors of x dis 0:n and θ. Nevertheless, the properties of the algorithm may become unacceptably poor. There are two essential issues associated with such schemes. The first concerns the non-trivial step of sampling on the diffusion pathspace of x t . The second problem is caused by the high posterior correlations between x dis 0:n and θ, leading to reducible chains as m increases .
The majority of the literature on data augmentation schemes for diffusions handles the conditional updates of x dis 0:n with an independence sampler. As it is difficult to find good proposal distributions for the entire x dis 0:n , the path is usually split into blocks. Overlapping blocking strategies are essential to ensure that all points are updated and continuity of the path is retained. An alternative way to update x dis 0:n is to use the particle filter via the Particle Gibbs algorithm of Andrieu and others (2010). But unless the issue of high posterior correlation between x dis 0:n and θ is resolved, none of these schemes will improve the overall MCMC performance. The problem is caused by the quadratic variation process of Thus, the conditional posterior of σ converges to a point mass as δ tends to 0. In practice this translates into an increasingly slow MCMC algorithm with a convergence rate of O(m) . Schemes with a fixed m (Cori and others, 2009) could work in some occasions but the approximation error could be substantial. In some cases, the problem can be tackled with suitable reparametrisation. The approach of  involves transforming to a diffusionẋ t with unit volatility. An alternative scheme is offered by Chib and others (2006) where the driving Brownian motion of x t is being used. In these algorithms the ODE states vector V dis 0:n becomes a function of σ,ẋ 0:n and θ v . Hence, in a Metropolis step, every proposed value of σ * is associated with the corresponding values of V dis 0:n * .
This succeeds into breaking the perfect dependence between V dis 0:n and σ, even for m → ∞. But since components of V dis 0:n (or functionals thereof) are observed with error, the associated proposed values V dis 0:n * should be close to the data for the move to be accepted. As the observation error becomes small and the data increase, this becomes increasingly difficult and leads to very small moves for σ and poor MCMC mixing. More details and simulations supporting this argument are provided in the Supplementary Materials (Appendix E). Consequently, we overcome this issue by updating x dis 0:n and θ jointly via the PMCMC algorithm, which is essential as it is not straightforward to implement joint updates with the other approaches mentioned in this section.

Adaptive Particle Markov Chain Monte Carlo algorithms
Particle filters are SMC algorithms used to recursively explore conditional densities in state space models (Doucet and Johansen, 2011). For given values of θ, N particles (x j i ) are sequentially propagated from t 0 to t n . In various time steps t i , the trajectories that best fit the data y 1:i are given more weight through resampling. Algorithm 1 shows how they can be applied in our context. The quantity L i+1 (θ) provides unbiased estimates of p(y 1:i |θ)

Algorithm 1 Particle Filter algorithm
..,N from p(x 0 |θ) and calculate (Ṽ j 0 ) j=1,...,N by solving the ODE (for example with the Euler scheme) , end for and the resampling step is essential to control the variance of that estimate over time. Algorithm 1 also provides a random sample from p(x 1:i |y 1:n , θ). In order to sample from π(x 1:n , θ|y 1:n ), the PMCMC algorithm can be used. PMCMC was introduced in Andrieu and others (2010) and successfully inte-grates particle filters in MCMC algorithms. Its implementation is presented in Algorithm 2. The issues of Section 3.2 are now addressed as x dis 0:n and θ Algorithm 2 Particle MCMC algorithm (particle Marginal Metropolis Hastings version)

Initialise:
Set current θ value,θ, to an initial value. Use Particle Smoother (PS) according to Algorithm 1 to computep(y 1:n |θ) = L(θ) and samplexθ 1:n from p(x 1:n |y 1:n ,θ) for It = 1 to N Iterations do Sampleθ * from Q(θ, .) Use PS to compute L(θ * ) and samplexθ * 1:n fromp(x 1:n |y 1:n ,θ * ) Doθ =θ * (andxθ 1:n =xθ * Recordθ andx θ 1:n end for are sampled jointly. In other words x dis 0:n is being numerically integrated out, while a sample from its posterior is obtained at each MCMC iteration. While the PMCMC algorithm is theoretically valid even for a single particle, large values of N are usually required for reasonably stable acceptance rates and large moves in the θ space; see the Supplementary Materials for a relevant simulation exercise. It is therefore essential to update the d-dimensional θ at once, marking the proposal Q(θ, .) crucial to the overall MCMC performance. In this paper we propose to use the adaptive Metropolis algorithm of Roberts and Rosenthal (2009). After transforming the parameters to take values in the real line we use a Normal distribution centered at the current value of θ and with covariance given by ǫΣ. Static random walk metropolis proposals set Σ = I d or Σ =Σ and tune ǫ to obtain acceptance rate of 0.234. Adaptive schemes change the value ǫ for each iteration i through diminishing adaptation; e.g. by ǫ i+1 = exp {log(ǫ i ) + α n 1 (AccRate − 0.234)} where α 1 = 0.999 and 'AccRate' denotes the acceptance rate up to iteration i. The covariance matrix Σ i+1 can also be updated as where α 2 is usually set to 0.05, Σ i is the posterior covariance matrix estimated by the draws up to i and Σ 0 should be specified in advance. In this paper we enhance the above adaptive algorithms utilising information from the EKF to estimate the covarianceΣ or Σ 0 . One choice, EK-Mode, is the observed information matrix at the mode identified by EKF, evaluated through numerical differentiation. Another choice, EK-MCMC, is to run an approximate MCMC scheme based on the EKF approximation of the likelihood and compute the posterior covariance from the draws. Note that the computational burden of these methods is marginal with regards to the PMCMC. As demonstrated in Section 4, the use of EKF can result in substantial improvement.

Simulation Experiments
The proposed algorithms are illustrated and tested on simulated data in this section. We focus on the BM model, where log(β t ) follows a Brownian motion with volatility σ, corresponding to the case of little information on the shape of β t . The trajectories of β t were drawn either from the BM model itself (experiment 1) or from a deterministic sigmoid curve (experiment 2). The data y i , i = 1, . . . , 50 represent noisy observations of weekly new cases of the epidemic week i kE t dt. We complete the model by assigning a Normal distribution to each log(y i ) with mean log( week i kE t dt) and variance τ 2 . The parameters were tuned to obtain realistic epidemic incidence curves, and observations were generated setting τ = 0.1. The assigned priors were informative for k, γ and R(t 0 ) and vague for E(t 0 ), I(t 0 ), σ and τ , as in Section 5.1. We used 3,000 particles and 100,000 MCMC iterations after a long burn-in period. Fig. 1 shows estimates and 95% pointwise credible intervals of the path, provided by the adaptive PMCMC initialized with EK-MCMC. The posterior output is in good agreement with the simulation trajectories suggesting that the underlying trajectory of β t can be estimated reasonably well from the partial and noisy observations considered. More can be found in the Supplementary Materials (appendix C), where we also considered a value of τ = 0.05 and obtained similar results. Next, we use the data of experiment 1 to compare the proposed adaptive PMCMC schemes. Comparison is made in terms of the effective sample size ESS = (1 + 2 i≥1 η(i)) −1 , with i η(i) being the sum of the lagged sample auto-correlations, as in Geyer (1992). We record the minimum ESS among the MCMC components and multiply by 100 to monitor the percentage of the total iterations that can be considered as independent. We consider three covariance matrices for each of the two adaptive algorithms defined in Section 3.3: I d and the ones from EK-Mode and EK-MCMC. For the schemes that adapt ǫ the minimum ESS was 0.008% (I d ), 0.19% (EK-Mode) and 0.54%(EK-MCMC), whereas for the schemes that adapt Σ we got 0.57%, 1.24% and 1.38% respectively. Clearly, adapting Σ is crucial to obtain a reasonable performance, unless the matrices obtained from EK-Mode or EK-MCMC are used. The proposed adaptive algorithms induce substantial improvement that is expected to intensify as the dimension of θ increases.

Data, model and estimates
The proposed methodology is illustrated on data from the A/H1N1(2009) pandemic in England between June and December 2009. The data consists of estimates of weekly ILI cases y 1:n given by the Health Protection Agency (Baguelin and others, 2010). The estimates were obtained from the recorded ILI cases among a selected sample of GPs. They accounted for over-reporting due to similarities in symptoms with other respiratory diseases, based on subsequent virological positivity tests. Corrections for asymptomatic infections and the patients' propensity to consult were also made. Overall the two datasets are different by a multiplicative coefficient c = 10, whose value is also supported by a further serological survey (Miller and others, 2010). In our analysis c is initially held fixed to 10, but this choice is explored further in Section 5.2. We adopt a model that admits noisy data to reflect the associated uncertainty. The noise model of Section 4 was used, combined with a BM formulation of P x . Vague priors, N >0 (0, 10 6 ), were put on τ , σ and β 0 . The priors for k and γ were obtained from additional data sources (Baguelin and others, 2010), the results of which are summarised through Normal distributions that place 95% probability in a symmetric manner between 1.55 and 1.63 days for the latent period k −1 , and between 0.93 and 1.23 days for the infectious period γ −1 . A Dirichlet distribution was used for the initial proportions in compartments S, E, I, R, constraining the mean of the one in R to be 0.15, its variance 0.15 2 , and the means of the other initial proportions to be equal.
The adaptive EK-MCMC algorithm was applied to the data and Fig. 2 depicts the incidence curve together with the posterior mean and pointwise 95% credible intervals. Estimates of β t are also displayed indicating various changes over time. The changes in β t are consistent with the argument that schools closure for holidays have been driving the epidemic: different values are observed during school and holidays periods, appearing to be synchronised with schools opening and closing. Posterior summaries for the static parameters, as well as a sensitivity analysis on the priors can be found in the Supplementary Materials. These suggest that inference is quite sensitive to the choice of prior for k and γ, but not for the remaining parameters. It would be interesting to repeat the procedure under an evidence synthesis framework and vague priors.
5.2 Application in real time. Was the first wave waning due to depletion of susceptibles?
In this section the methodology of the paper is applied in real time, i.e. considering partial datasets from June 2009 up to the 20th of July, the 7th of September and the 26th of October. Each time the algorithm is run from scratch to provide samples from the joint posterior π(x 1:i , θ|y 1:i ). From a computational cost point of view this procedure can be improved further by utilising previous MCMC runs, for example under the SMC 2 framework (Chopin and others, 2011). We did not pursue this direction further, as the PMCMC algorithm runs quite fast (less than 2 hours on a standard PC).
In order to reduce uncertainty, especially at early stages, the value of τ was set to 0.1 rather than being estimated as in Section 5.1. We otherwise use the same model as before. A model with integrated Brownian motion was also fit but BM was chosen in terms of DIC; see Supplementary Materials (Appendix C). The main results are shown in Fig. 3. On August 1 st , the first wave of the epidemic had waned, incidence rates were decreasing and schools had closed. There were two competing scenarios to explain the epidemic decline: (i) holidays had caused the waning of the epidemic by lowering the effective contact rate. Hence, a similar or stronger wave could occur when schools would reopen in September in colder climatic conditions. (ii) The epidemic had stopped independently of holidays because a critical proportion of the population had been infected, conferring a sufficient level of herd immunity to stop the epidemic. In this case, no second wave was to be expected in September. On August 1 st there was great uncertainty around the value of c (Baguelin and others, 2010), which is crucial in distinguishing between the two scenarios. We therefore conducted the following exercise.
The PMCMC algorithm, run up to August 1st, provides samples from the posterior of the difference in β t between July 13 th (before the decrease in incidence) and August 1 st . For c = 10, the 97.5% point of this posterior is −0.32, indicating a decrease in β t . The latter supports scenario (i), as the competing scenario is associated with a zero-decrease in β t . Nevertheless, as this value depends on c, the algorithm was run for different values of it ranging from 20 to 150. The results appear on Fig. 4. Note that the 97.5% point of interest increases as a function of c and reaches 0 for a correction factor close to 70. As this level seemed unrealistic (Baguelin and others, 2010), the experiment provides evidence in favour of scenario (i) highlighting the danger of a second wave in September, that actually occurred. Such evidence can be important for decision-makers, especially when considering implementations of preventive measures as vaccines.

A multiple age group diffusion driven SEIR model
The analysis of Section 5.1 can be used to construct more structured models. For example, the effect of holidays is evident and may differ from children to adults, thus casting doubts on the assumption of a homogeneous population. It seems more natural to consider a model with two age groups (c:children and a:adults) and target all possible effective contact rates among them. In our notation β ca refers to the effective contact rate from children to adults and S c denotes the number of susceptible children. For reasons of parsimony we assign Brownian motions to log(β cc t ), log(β aa t ) and treat β ca , β ac as constant. We also set we set β ca = β ac = b, in line various multiple age groups epidemic models in different settings (e.g. Whitaker and Farrington (2004)). The dynamic part of the model is now given by The data from the A/H1N1(2009) pandemic provide incidence estimates for children and adults separately so they can be used to estimate the model of (7). If only final outcome data were available, not all effective contact rate parameters would be estimable. However, the temporal dataset provides extra information by the relative variation of susceptible and infective population in adults versus children. We applied the EK-MCMC scheme, which was essential in order to obtain reasonable MCMC performance. Fig.  5 depicts the results. Unlike earlier attempts with versions of a multi-group model with a single diffusion driving all contact rates, the fit appears to be good. The trajectory of children seems to be similar with that of Fig. 2 thus stressing their role to the evolution of the epidemic. More details, including posterior summaries for the parameters and information about the priors can be found in the Supplementary Materials (Appendix C).

Discussion
In this paper we examined epidemic models where some of the parameters are represented by diffusions or integrals thereof. The main motivation was to account for various time varying drivers (virus evolution, seasonality, schools closure, etc), while maintaining a simple interpretation. We present a unified framework that supports data augmentation MCMC schemes based on fine partitions on the diffusion path. The associated approximation error can be controlled by the user without affecting the MCMC performance and can be viewed as an extension of the approaches by ; Chib and others (2006) to the more challenging observation regime of this paper. The consideration of the algorithms in a continuous time setting revealed major issues associated with Gibbs data-augmentation schemes. This justifies the use of particle MCMC, which updates paths and parameters jointly, while pointing directions for future research on Gibbs schemes. We also presented a computational machinery based on the PMCMC algorithm (Andrieu and others, 2010), that was integrated in an adaptive MCMC context. We consider EKF based adaptive algorithms that can offer substantial improvement, especially in cases with many static parameters. This paper is one of the first applications of PMCMC in epidemic models and data; standard PMCMC schemes were also used in Rasmussen and others (2011).
Initially we relied on a simple SEIR model but such an analysis can be viewed as an exploratory tool towards more structured models; e.g. the age-structured model of Section 5.3 that appears to be an improved representation of reality. This approach can help in developing richer models and testing alternative scenarios for public health interventions, or to bring further insights on extrinsic factors such as climate on the dynamics of epidemics. Moreover, this framework can support multiple sources of data, of potentially different nature: Rasmussen and others (2011) has shown how time series and genealogies can be combined in a PMCMC inference framework for more informative estimates. While we worked mainly with influenza time series, the developed methodology can be applied to other cases; current work considers its application as part of the CHARME project (Boily and others, 2007). The presented approach may also be thought as an alternative to the white noise modeling of environmental stochasticity introduced in Bretó and others (2009), as it offers to the possibility to capture the dynamics of environmental drivers. A potential next step will be to combine environmental with demographic stochasticity, modelling infections as Poisson processes which rates depend on a time-varying β.
The inferential framework presented in this article shares the "plug and play" feature of the Iterated Filtering methodology. While extra care and further study is required for specific models or datasets, its algorithmic aspects can be decoupled from the modeling aspects. This provides the possibility to develop generic inference packages: we are currently working towards its integration in a generic inference platform inspired from the R package POMP.

Supplementary Materials
Supplementary material is available online at http://biostatistics.oxfordjournals.org. It contains implementation details for the PMCMC algorithm (Appendix A), a comparison with the EKF (Appendix B), additional information on Sections 4 and 5 (Appendix C). Also the sensitivity analysis (Appendix D) and a detailed exposition of the issues of Section 3.2 (Appendix E).    Figure 5: Offline estimates of the effective contact rate among children and adults during the A/H1N1 2009 influenza pandemic using a 2-classes age-structured model and age-specific incidence data. Green dots indicate observed incidence estimates among each age group provided by the Health Protection Agency (first and second panels). Black dotted lines indicate the mean of the pointwise posterior density. Dark and light blue areas respectively indicate 50% and 95% credible intervals of the posterior density. Holidays are indicated by a light grey area. Here we provide supplementary information for various parts of the main paper. Appendix A illustrates the effect of various algorithmic parameters and suggestions on how they can be set in practice. Appendix B presents a simulation based comparison of the particle filter with the Extended Kalman Filter (EKF), whereas in Appendix C we provide more details on the analyses of Sections 4 and 5 of the main paper. Appendix D contains a sensitivity analysis for the priors assigned to the parameters of the model in Section 5.1. Finally, in appendix E, more details are given regarding the formulation of the models in continuous time and their potential implications on the associated data augmentation schemes.

Determining the Euler discretization time-step
In general, solutions of the nonlinear ODEs encountered in epidemic models are not available in closed form. In order to evaluate π(x 0:n , θ|y 1:n ), the trajectory of the system needs to be discretised according to a given time-step δ to provide an approximate solution. The Euler approximation ensures that as δ tends to 0,π δ (x 0:n , θ|y 1:n ) converges to π(x 0:n , θ|y 1:n ). In practice a sequence of decreasing δ values is chosen and quantities such as E[p δ (σ|y 1:n )] or E[p δ (τ |y 1:n )], are monitored. For sufficiently small values of δ, a convergence is generally observed, as shown in Fig. 1 for two different datasets of weekly influenza data. In this case, a δ = 0.1 day would be a reasonable choice. We note at this point that the computational cost is of O(δ −1 )

Determining an optimal number of particles
The PMCMC algorithm is theoretically valid regardless of the number of particles N parts used in the particle smoother, as shown in Andrieu and others (2010). Nevertheless, the smaller the number of particles used in the particle smoother, the more noisy the estimate of the likelihoodp(y 1:n |θ) becomes. This noise has a negative impact on the acceptance rate of the MCMC algorithm run in the θ space. Consequently, N parts has to be big enough so that it won't affect the acceptance rate, keeping in mind that the cost is of O(N parts ). Hence, a compromise needs to be achieved. Fig. 2 shows how the acceptance rate increases as the number of particles gets higher. Note that the acceptance rate has a plateau form, indicating that it is perhaps not worthwhile to increase N parts beyond some point. We repeat the experiment for two different values of the measurement error parameter (τ ). This figure shows that when the observational noise decreases, more particles are needed, which is explained by the fact that the particles need to be 'closer' to the observations.

Appendix B: Assessing the validity and limitations of the Extended Kalman approximation
Here, we compare the particle filter with the EKF approach to assess the the extent of the gain of avoiding the approximations of the latter. A set of 100, 7-month long, time-series of weekly influenza cases were drawn from the BM model. In order to ensure realistic epidemic datasets, we 'reverse-engineered' randomly selected influenza time-series (y Goog,j 1:n ) 100 j=1 from the freely available Google FluTrend data (Ginsberg and others , 2008). For each of the datasets, we obtained estimates of (β Sim,j 0:n ) 100 j=1 and the corresponding parameters (θ j ) 100 j=1 . These quantities were then used to generate influenza time-series (y Sim,j 1:n ) 100 j=1 . The static parameters of the model (initial condi-tions, k, γ, σ and τ ) were assumed known, to isolate the problem of estimating β t from accounting for parameter uncertainty and perform more relevant comparisons. We compare the following two estimators of β Sim,j i : β F ilt,j i =Ê(β j i |y j 1:i , θ * j ) obtained from the filtering distribution whereÊ(.) denotes Monte Carlo estimates of the relevant expectations, and the the EKF estimatorβ EKF,j i =Ẽ EKF (β j i |y j 1:i , θ * j ) whereẼ EKF (.) denotes expectation under EKF. The performance of the estimators is measured through their bias and Mean Squared Error (MSE). The results indicate a better performance forβ F ilt,j i . The bias of the estimates provided by the EKF is 0.0285 while use ofβ F ilt,j i reduces the bias by about 78% (0.0063). The corresponding relative reduction in MSE is smaller (10%, 0.0270 to 0.0242), indicating a bias-variance tradeoff. Use of the smoothing distribution estimatorβ Sm,j i =Ê(β j i |y Sim,j 1:n , θ * j ) is associated with a further 87% (0.0032) reduction in the MSE, while keeping the bias at the same low levels. The estimatorsβ F ilt,j i andβ Sm,j i are associated with a tolerable computational cost of 2 hours on a standard PC. In conclusion, the bias introduced by the Extended Kalman approximation is non-negligible with regards to the level of accuracy that can be obtained with exact particle methods on this type of datasets. Nevertheless, this study has shown the approximation to be robust and motivates the use of the approximated model as a proxy for the exact one, for example to initialize the particle MCMC algorithm in the way presented in this article.
Appendix C: details on the simulations and results of sections 4 and 5 Assessing the validity of the particle MCMC through simulations This section deals with a series of experiments that were conducted in order to assess the validity of the particle MCMC, and to illustrate on different examples how the trajectory β 1:n could be captured from noisy weekly cases observations. Experiment 1 is based on a trajectory of β 1:n simulated from a random walk model with volatility σ 2 = 0.07 2 . Two different corresponding epidemic datasets have been generated, for given and equal initial conditions and biological parameters, respectively with observational noise τ = 0.1 (experiment 1.a) and τ = 0.05 (experiment 1.b). Similarly, two epidemic datasets resulting from an effective contact rate following a significantly decreasing sigmoid were generated with respectively τ = 0.1 (experiment 2.a) and τ = 0.05 (experiment 2.b).
For each of these datasets, our proposed methodology was run to estimate σ, τ and β 1:n . cFig. 1 of the main text contains estimates of the incidence time series as well as β 1:n for the experiments with τ = 0.1 (Exps 1.a and 2.a). Corresponding figures for Experiments 1.b and 2.b are shown in Fig.  3 of this document. Moreover, Table 2 presents the mean, median and 95% credible intervals for the estimates of σ and τ in each of the experiments. In experiment 2 the estimates seem to be in good agreement with the true values, as the latter are contained in the 95% credible intervals. The aim was to assess the robustness of the proposed methodology to model misspecification, fitting a Brownian motion to a smooth sigmoid curve. The algorithm performs reasonably well, succeeding in capturing the trajectories of β t and all the parameters except for τ which is slightly underestimated. A potential explanation for this is that part of the variability is absorbed from the volatility parameter of the Brownian motion.

A/H1N1 pandemic
We provide here with the corresponding trace plots (Fig. 4) for the parameter estimates of Section 5.2. The trace plots indicate good mixing of the PMCMC algorithm for every parameter. This was achieved by following the procedure presented in Appendix A.

Age-structured model: children and adults
Additional information on the analysis of section 5.3. can be found in Table  3.

Illustration of alternative approaches on the real time example
In Fig. 5, we repeat the real-time analysis, conducted in the Section 5.2 of the main paper, under two alternative approaches. First, we consider a model with an integrated Brownian motion (iBM) on x t , implying smoother β t trajectories as opposed to the non-differentiable paths induced by the Brownian motion (BM) formulation. The choice between those models is not trivial and could depend on the context of the epidemic. We decided to adopt the model with Brownian motion (BM) on the basis of the Deviance Information Criterion (DIC) of Spiegelhalter and others (2002); as we can see from Fig. 3 of the main document and Fig. 5 of the SM, the BM model is consistently better in that respect. Nevertheless, there is uncertainty on the performance of the DIC criterion in the setting of this. It would be quite interesting to explore this further in the future and compare with alternative model choice criteria.
The second approach adopts and applies the methodology of maximum likelihood via iterated filtering (MIF), introduced in (Ionides and others, 2006). Initially, estimates θ are obtained by maximising p(y i:n |θ) subject to some constraints set by the priors. Second, a particle filter was run with θ fixed to its estimated value. As expected, given that we are not accounting for parameter uncertainty, the resulting pointwise 95% credible intervals are narrower; roughly 50% on the 6-month dataset and even more at early stages with less information on θ.

Appendix D: Sensitivity analysis
We explore in this section the robustness of the obtained estimates. We concentrate on the example of Section 5.1 in the main paper, where the observational noise was estimated and informative priors were used for the initial proportion of immune individuals in the population (R(0)), and the lengths of the latent (k −1 ) and infectious periods (γ −1 ). For each of these quantities, the mean of the prior densities have been tilted by respectively -20%, -10%, +10% and +20%. In all cases but for the tilted parameters themselves, the median of the corresponding posterior densities lie both in the 95% and 50% credible intervals of the untitled case. The resulting medians estimates are shown in table 4, along with the originally obtained summary statistics. Furthermore, for all the runs with tilted priors, the resulting β 1:n trajectories remain completely in the original 50% credible intervals.

Appendix E: details of Gibbs data augmentation scheme
In this section we provide more details on the Gibbs schemes discussed in section 3.3 of the main text. Stochastic epidemic models presented in this paper can be written as dx t = µ x (x t , θ x )dt + σ x (x t , θ x )dB t , 0 < t < t n y 1:n |V 0:n , θ y ∼ P y (y 1:n |V 0:n , θ y ), V 0:n = g(x 0:n , θ v ) where V t represents the ODE states vector observed trough partial and noisy data y 1:n . The rest of the model is defined in section 3.1. Since it contains intractable densities we work with the time discretised versions x dis 0:n and V dis 0:n and proceed using the Euler approximating scheme. A Gibbs algorithm alternates between updating the trajectories of x dis 0:n (and consequently V dis 0:n ) given θ, and vice versa. Nevertheless, as the Euler time step δ goes to 0, the quadratic variation process of x t uniquely determines the value of θ x in σ x (.) and the algorithm degenerates . In practice this translates into a mixing time of O(m).
In the context of diffusion driven epidemic models this problem was dealt with suitable reparametrisations such as the ones in Chib and others (2006) or Kalogeropoulos (2007). The latter uses the Lamperti transform, i.e.
Assuming that σ x (·; θ x ) is continuously differentiable, an application of Ito's lemma provides the SDE of the transformed diffusion u t as: where Let P u denote the distribution of u t . Girsanov formula provide its density with respect to that of a standard Brownian motion, denoted by W, and the state vector can be written as The model can be defined from (3), (6) and (1). It contains intractable quantities but can be accurately approximated given the time discretisation of the diffusion path. An alternative reparametrisation, defined in discrete time, was suggested in Chib and others (2006). It uses the transformation below In our setting the driving Brownian motion of x dis 0:n , denoted by w dis 0:n and provided by (5), can be used to provide a discrete skeleton of the state vector V dis 0:n = h w (w dis 0:n , x 0 , θ x , θ v ).
The model is now given by w dis 0:n , that can be transformed to x t for which the Euler-Maruyama approximation can be used, and 1 which can be approximated using the discretised state vector V dis 0:n . Data augmentations schemes can be used for the models above. Gibbs versions of such schemes will alternate between updating u dis 0:n (or w dis 0:n ) and consequently V dis 0:n given θ, and θ conditional on either u dis 0:n (or w dis 0:n ). The first step can be done either by the overlapping block strategies in Chib and others (2006) and Kalogeropoulos (2007) or with a particle filter in the context of a particle Gibbs algorithm. The second step of updating θ x given u dis 0:n or w dis 0:n is usually implemented through a random walk Metropolis algorithm: • Let θ c x and V disc 0:n be the current values of θ x and V dis 0:n respectively. Propose θ * x from q(θ * x |θ c x ).
• Compute V dis * 0:n = h u (u 0:n , x 0 , θ * x , θ v ) • Accept with probability 1 ∧ π(θ * x , V dis * 0:n |y 1:n , u dis * 0:n θ v , θ y )q(θ c x |θ * x ) π(θ c x , V disc 0:n |y 1:n , u disc 0:n θ v , θ y )q(θ * x |θ c x ) For the Chib and others (2006) formulation, u dis 0:n can simply be replaced with w dis 0:n in the algorithm above. Unfortunately both of the above algorithms may perform poorly. The problem is that every proposed value of θ x implies a proposed trajectory of the ODE states vector V t . As parts or functionals of this trajectory are observed with error, the proposed value of θ x will not be accepted unless its associated V t trajectory is close to these observations. Consequently, only small steps can be made on the θ x space and the algorithm mixes very slowly. The problem intensifies as the noise variance becomes smaller and as the time horizon of the epidemic increases. Implementations of such algorithms in the simulated and real data of this paper are in line with this argument. Figure 6 displays the posterior draws for σ in the dataset of the simulation experiment 1 of Section 4 in the main paper. The posterior draws of σ were obtained from a particle Gibbs algorithm combined with the algorithm above; note that in this model u dis 0:n and w dis 0:n are equal. In order to isolate the problem, the algorithm was run on σ and β t only, and all the other parameters were held fixed at the values they where simulated from (a value of τ = 0.1 was used). The 'true value' of σ was 0.07 and we used a δ = 0.1. As clearly shown in the traceplot the mixing of the chain is quite poor, thus casting doubts on the reliability of its output. The difference in mixing quality with the corresponding traceplot of Fig. 4 (bottom middle plot), corresponding to the PMCMC algorithm, is substantial.
To sum up, both formulations of Gibbs data augmentation schemes (with or without reparametrisation) are very likely to lead to inaccurate and inefficient MCMC algorithms. The use of particle MMH algorithms, termed as PMCMC in this paper, is therefore essential and the main paper focused on its implementation on diffusion driven epidemic models. PMCMC seems to provide a solution to the problem, but future research on Gibbs schemes with alternative reparametrisations addressing this problem, would be very helpful.