Pair-based likelihood approximations for stochastic epidemic models

Summary Fitting stochastic epidemic models to data is a non-standard problem because data on the infection processes defined in such models are rarely observed directly. This in turn means that the likelihood of the observed data is intractable in the sense that it is very computationally expensive to obtain. Although data-augmented Markov chain Monte Carlo (MCMC) methods provide a solution to this problem, employing a tractable augmented likelihood, such methods typically deteriorate in large populations due to poor mixing and increased computation time. Here, we describe a new approach that seeks to approximate the likelihood by exploiting the underlying structure of the epidemic model. Simulation study results show that this approach can be a serious competitor to data-augmented MCMC methods. Our approach can be applied to a wide variety of disease transmission models, and we provide examples with applications to the common cold, Ebola, and foot-and-mouth disease.


Proofs of Lemmas 1 and 4
We give two methods of proof for the first result in lemma 1; the other parts of both lemmas can be proved in a similar fashion. First note that Thus E g j ,g k [exp(−βτ kj )] = E g j ,g k exp(−βτ kj )1 {i j <i k } + E g j ,g k exp(−βτ kj )1 {i k <i j <r k } + E g j ,g k exp(−βτ kj )1 {i j >r k } = E g j ,g k 1 {i j <i k } + E g j ,g k exp(−β(i j − i k ))1 {i k <i j <r k } + E g j ,g k exp(−β(r k − i k ))1 {i j >r k } and all three terms in (1) can be evaluated directly; for instance, if r j < r k then and the third term is zero since i j < r j < r k , whence 1 {i j >r k } = 0. Combining (2) and (3) yields the first result in lemma 1. Alternatively, a direct probabilistic proof is as follows. From (1) we have E g j ,g k [exp(−βτ kj )] = P g j ,g k (i j < i k ) + E g j ,g k [exp(−β(i j − i k ))|i k < i j < r k ] P g j ,g k (i k < i j < r k ).
Both terms in (4) can be evaluated by considering a reverse-time construction of i j and i k , as follows (see also the proof of lemma 2 below for a similar approach). Recall that r j < r k . Start at time r k and run a Poisson process of rate δ k backwards in time. If a point occurs at time t ∈ (r j , r k ) then set i k = t, and run a new Poisson process of rate δ j backwards in time from r j and set i j equal to the time of its first point. If not, at time r j increase the Poisson process rate to δ j + δ k ; when the first point appears at time t < r j , set i j = t with probability δ j (δ j + δ k ) −1 , otherwise set i k = t. In the former case reduce the Poisson process rate to δ k , and in the latter reduce it to δ j . Continue back in time; the next point to appear will be the remaining infection time. It is straightforward to see that this construction yields i j and i k such that r j − i j and r k − i k are independent Exp(δ j ) and Exp(δ k ) random variables, respectively. Under this construction, i k < i j if and only if (i) the Poisson process has no point in (r j , r k ), which has probability exp(−δ k (r k − r j )), and (ii) independently of this, i j is chosen when the first point of the rate-(δ j + δ k ) Poisson process appears, which has probability δ j (δ j + δ k ) −1 . Thus the first term in (4) is in agreement with (2). Next, since i j < r j < r k we have Finally, to evaluate the expectation term in (4), consider a Poisson process of rate β that runs backwards in time from i j , independently of the rate-δ k Poisson process that is used if i k < i j . The expectation is simply the probability that the former process has no points before the first point of the latter process, yielding which along with (5) yields the same expression as (3). The remaining cases in lemma 1, and 4, can be proved using either direct calculation or probabilistic arguments. For the case of Erlang distributed infectious periods, the probabilistic arguments require splitting infectious periods into exponentially-distributed stages. Full details can be found in Stockdale (2018).

Proof of Lemma 2
Note that in the following, we focus exclusively on the individuals in K and ignore all other individuals in the population. For ease of exposition, re-label the members of K as 1, . . . , K. Given these individuals' removal times r 1 < . . . < r K , their infection times i 1 , . . . , i K can be constructed as follows. Start at time r K and consider the Markov process (S(t),Ĩ(t)) : t < r K which runs backwards in time such that (i) (S(t),Ĩ(t)) → (S(t) + 1,Ĩ(t) − 1) at rate δĨ(t); (ii) at each removal time,Ĩ(t) increases by 1. HereS(t) andĨ(t) denote respectively the numbers of susceptibles and infectives in K at time t and initially (i.e. just before time r K ) there are no susceptibles and one infective. Transitions of type 1 generate infection times, and at each such event one of the currently infected individuals is selected uniformly at random to be the individual who is infected at that time. The process terminates as soon as S(t) = K, which occurs when the last of the required infection times is generated; call this time i a .
Consider the total time during which infectious pressure is exerted, i.e.
This quantity can be similarly constructed in reverse time as follows. For u < r K define and observe that, starting at r K and moving backwards in time, T is a piecewise linear function that increases at rateS(t)Ĩ(t) at time t. Next, consider a random time-transformation in which (i) the clock runs at rateĨ −1 when there areĨ ≥ 1 infectives, and (ii) the clock stops running wheneverĨ = 0. This has no impact on the ultimate value of T , because in each time period during whichS(t)Ĩ(t) does not change, T increases by the same amount in both the original and transformed time scales. In the transformed time scale, still in reverse time, (i) T increases at rateS when there areS susceptibles, and removal times do not affect this rate because they do not change the number of susceptibles, while (ii) the number of susceptibles increases by one at rate δ, corresponding to an increase at rate δĨ(t) in the original time-scale.
Combining these observations means that, in the transformed time-scale, T is as a function which starts at zero, and as soon as the first susceptible appears increases at rate 1; then increases at rate 2 once the second susceptible appears, and so on until all K susceptibles appear. Since the times at which susceptibles appear are the points of a Poisson process of rate δ, it follows that where the Y j are independent Exp(δ) random variables, and the result follows since jExp(δ) ∼ Exp(δ/j).

Preliminaries
| is the number of elements of N n that (i, j) and (k, l) have in common, which can equal either 0, 1 or 2. We require the following result, which is essentially Theorem 2.1 from Barbour and Eagleson (1985) restricted to our setting.

Simulation study
In this section we assess the performance of the PBLA and ED approximations using simulated data. Throughout we assume that β ij = βN −1 for all i, j, and that infectious periods are identically distributed, since it seems most natural to assess the methods in the most basic setting. More complex settings are considered in the real-life data examples in the main text. We will also be concerned with the basic reproduction number R 0 , defined as the number of infections caused by a single infective in an infinite population of susceptibles (see e.g. Andersson and Britton, 2000). Our main focus is on the standard PBLA method, since we found that the alternative approximations described in section 3.3 in the main text were either numerically similar, or performed less well (see 2.4 below for details).

Exponential infectious periods
We first consider the performance of the PBLA and ED methods as various combinations of β, γ, R 0 and N vary. Specifically, in each setting we simulate a large number of data sets, typically 500 or 1000, each consisting of removal times r 1 , . . . , r n . For each data set we then find maximum likelihood estimates of (β, γ) for both the PBLA and ED approximation using standard numerical optimisation methods. To provide a visual representation of the estimates we use them to create kernel density estimates.
2.1.1 Varying β and γ with N and R 0 fixed Figure 1 illustrates that both the PBLA and ED methods are largely unaffected by variations in the actual values of β and γ, at least while N and R 0 are fixed. The PBLA method appears to perform slightly better in terms of being able to estimate the true parameter values.
2.1.2 Varying N with β and γ fixed Figure 2 shows the impact of varying N while β and γ are fixed. Again we see that there is little variation, and that PBLA performs better than the ED method. Note that since R 0 is fixed, increasing N leads to larger values of the outbreak size n, and so here we see evidence that both approximation methods are able to cope with large n values.  Figure 2, shows how variation in R 0 impacts the performance of the PBLA and ED methods, specifically for R 0 = 0.5, 1.5 and 2. For both approximations, the performance deteriorates as R 0 increases. To understand this, first note that the value of N − n contributes to the true likelihood (5), in this case via the product term As the proportion of the population infected, n/N , increases, so the relative importance of the φ terms compared to the other terms in (5) decreases, and thus to obtain a good overall approximation it becomes increasingly important to have a good approximation for the ψ terms. However, the latter are themselves the subject of the least accurate approximation, essentially because each of the ψ j terms is itself approximated by a product. Roughly speaking, we found that provided n/N is less than around 0.7, then the relative importance of the φ terms is sufficient to provide good approximations for β and γ. However, for larger values of n/N this is no longer the case, and the overall approximation suffers. From a practical point of view this does not seem to be particularly restrictive, since real-life outbreaks rarely infect such high proportions of the susceptible population.

Erlang infectious periods
Suppose now that I j ∼ Γ(m, γ) for all j = 1, . . . , n, where m is a positive integer. Then R 0 = βmγ −1 . We now briefly show that the broad conclusions of the simulation study for the case of exponential infectious periods also hold true in this setting. We also show that the approximations improve as the variance of the infectious period decreases while its mean is kept fixed.
2.2.1 Varying β and γ with N and R 0 fixed Figure 4 illustrates that both the PBLA and ED methods are largely unaffected by variations in the actual values of β and γ, at least while N and R 0 are fixed. The PBLA method appears to perform slightly better in terms of being able to estimate the true parameter values. 2.2.2 Varying N with β and γ fixed Figure 5 shows the impact of varying N while β and γ are fixed. The results are broadly similar to the exponential infectious period case, with relatively little variation in the mode of the estimates as N increases, but that the variance of the estimates decreases with N . The latter feature suggests that having more data enables better estimation in the sense that the average error is reduced. We illustrate this graphically in figure 6 which shows mean square error values as N varies. This error reduction is clearly a desirable property of both the PBLA and ED approximation methods, but one that is not obviously true at first sight since larger data sets lead to more approximations being used in both methods.

2.2.3
Varying R 0 with m and γ fixed. Figure 7 shows the impact of varying R 0 whilst keeping the infectious period distribution parameters m and γ fixed. As for the exponential infectious periods case, and for the same underlying reasons, estimation for both β and γ becomes less accurate as R 0 increases. If the mean of the infectious period distribution, m/γ, is kept fixed, then increasing m will reduce the variability of the infectious period. In this situation we expect the accuracy of both the ED and PBLA methods to improve, since the underlying assumptions of independence between different components of the likelihood become less unrealistic. Figures 8 and 9 illustrate that this is indeed the case. What is most striking is that marked improvement in estimation from m = 1 and m = 2, while increasing m further only provides modest further gains.

Computational performance
One motivation for the PBLA approach is to dispense with the need for data augmentation. However, since the PBLA likelihood can be costly to compute, it is natural to ask how it compares to the standard data-augmented MCMC (DA-MCMC) method described in O'Neill and Roberts (1999) which generates samples from the posterior density π(β, γ|r). To address this question we first generated several simulated data sets, consisting of removal times, from the SIR model. On each data set we then ran both the standard DA-MCMC algorithm, with β, γ and all unknown infection times updated at each iteration, and an MCMC algorithm with the PBLA likelihood in which β and γ were updated separately at each iteration using a Gaussian random walk proposal mechanism. Both algorithms were initialised in stationarity, following burn-in, and then run for the same number of iterations, n s . We then calculated the effective sample size ESS = n s (1 + 2 ∞ k=1 ρ(k)) −1 , where ρ(k) denotes the sample correlation at lag k, and where the sum was truncated at lag M if ρ(M + 1) < 0.05. We assigned independent Exp(10 −4 ) prior distributions to both β and γ. Figure 10 shows the effective sample size per second for DA-MCMC and PBLA for exponential infectious periods and Erlang infectious periods with shape parameters m = 2, 5. In each case PBLA eventually outperforms DA-MCMC as N increases, although the improvement is less marked as m increases, due to the increasing cost of computing the PBLA likelihood.

Comparison of alternative approximations
The main text describes five possible likelihood approximations, namely (i) the standard approximation (section 3.2), (ii) using f i for expectations (3.3.1), (iii) separating all χ and ψ terms (3.3.2),(iv) approximating the product of ψ terms (3.4.2) and (v) use of the central limit theorem (3.4.2). Note that the last two approximations were developed for the special case of the general stochastic epidemic whereas the first three are more general.
Our numerical experiments showed that the standard approximation generally performed best, although approximations (iv) and (v) are less numerically intensive. Figure 11 shows typical results based on 1000 simulations for the general stochastic epidemic model. Two key observations are that the f i expectation method is generally inferior to the others, and that the central limit theorem approximation improves as N increases, as expected.