Estimating Epidemic Incidence and Prevalence from Genomic Data

Abstract Modern phylodynamic methods interpret an inferred phylogenetic tree as a partial transmission chain providing information about the dynamic process of transmission and removal (where removal may be due to recovery, death, or behavior change). Birth–death and coalescent processes have been introduced to model the stochastic dynamics of epidemic spread under common epidemiological models such as the SIS and SIR models and are successfully used to infer phylogenetic trees together with transmission (birth) and removal (death) rates. These methods either integrate analytically over past incidence and prevalence to infer rate parameters, and thus cannot explicitly infer past incidence or prevalence, or allow such inference only in the coalescent limit of large population size. Here, we introduce a particle filtering framework to explicitly infer prevalence and incidence trajectories along with phylogenies and epidemiological model parameters from genomic sequences and case count data in a manner consistent with the underlying birth–death model. After demonstrating the accuracy of this method on simulated data, we use it to assess the prevalence through time of the early 2014 Ebola outbreak in Sierra Leone.

Since bootstrap particle filters are usually applied exclusively to hidden Markov models (HMMs) in which the sequence of hidden states is produced by a Markov process and the sequence of observations are uncorrelated emissions from the hidden state, it is not necessarily clear that our application is appropriate. However, as we show below, the joint probability P (O, E) takes the same form as a HMM, despite the presence of backward-time correlations between the elements of the observed process O. We demonstrate that this equivalence of form is sufficient for our application of PMMH to be correct.

Form of the phylodynamic likelihood
As in the manuscript, O, the union of the sampled tree T and the unsequenced samples S (i.e. sample observations that do not correspond to nodes on the tree) is composed of a sequence of "observations" O j which include the portion of the sampled tree between observed events (coalescence events or sequenced/unsequenced sampling events) j − 1 and j, including the event j itself (but excluding event j − 1).
Given that the sampled phylogeny is produced by a backward-time Markov process conditional on the trajectory (eq. 7 in the manuscript), the joint probability of the observations and the trajectory can be written (S1) Note that here we omit conditional dependence on the model parameters A 0 , η ,σ and T from P (E, O) and P (E) for notational brevity.
Given that the trajectories are generated forward in time by a continuoustime Markov process, it is of course also possible to write Again, for brevity we also omit the dependence on the initial conditions of the trajectory.
Combining these two equations yields When conditioning on O j+1 , only the portion of the trajectory in the interval corresponding to O j affects the probability of observing the partial tree. We therefore find This has the form of a HMM with autocorrelated observations. Since the observations (which make up the sampled tree we are computing the proba-bility for) are constant during the particle filter calculation, this correlation between observations does not harm the validity of the algorithm.
This can be seen by considering that the above expression has the form where w j (E j ) is the weight corresponding to simulated "particles" drawn from the distributions P (E 1 ) and P (E j |E j−1 ). This is precisely the form of the joint probability distribution targeted by the bootstrap particle filter and, by extension, PMMH. This correspondence is sufficient, since one could easily construct a sequence of artificial observations y j with emission probabilities defined to be P (y j |E j ) ≡ w j (E j ), causing the equation to be interpreted as the joint probability for the observations and hidden state of a true HMM.
Despite this formal equivalence, there remains a subtle yet important distinction between the algorithm presented in the manuscript and the standard application of particle filters to HMMs. In contrast to the standard situation, the correlations between subsequent observations O j in our application (which manifest as the dependence of the weights not only on O j but also on O j+1 ) mean that the average of the weights computed for observation j cannot be directly interpreted as the marginal probability P (O 1 , . . . , O j ). Instead, only the product of these averages is meaningful and can correspond to the full marginal joint probability. While this does not impact our use of the algorithm, this acknowledgement of this fact may help the reader appreciate that information from both "past" and "future" observations are taken into account in the inference of each portion of the prevalence trajectory.

Correctness demonstration for large particle numbers
As further evidence that bootstrap particle filtering can indeed be applied to this system, we can easily show that the algorithm consistently estimates the marginal probability for the observations given the phylodynamic model parameters in the limit of a large number of particle simulations.

Observation 1
The first step of the filtering algorithm involves sampling M independent partial trajectories E By the law of large numbers we see that the arithmetic mean for the first set of weights approaches In the same limit, the sampling distribution forĒ where the second line holds provided the limits in the numerator and denominator exist, and that the limit in the denominator is non-zero.

Observation j
Now consider the j-th step, where we have M partial trajectories E where we define f j−1 (E j−1 ) to be limiting distribution of the resampled particle states for observation j − 1. Following the same arguments above, the arithmetic mean for the n th set of weights approaches giving a recursion relation for f j (E j ).
Using this relation, we find that (S10)

Observation N
Finally, for the N th observation, the weight of a particle with state E (a) (S11) Thus, the arithmetic mean of the weights in the N th interval is The product of the arithmetic means of the weights in each interval therefore yields N j=1 Z 1 = P (O) (S13) which is the marginal density of the observed process (sampled phylogeny and unsequenced samples), as required.