PDE-limits of stochastic SIS epidemics on networks

Stochastic epidemic models on networks are inherently high-dimensional and the resulting exact models are intractable numerically even for modest network sizes. Mean-field models provide an alternative but can only capture average quantities, thus offering little or no information about variability in the outcome of the exact process. In this paper we conjecture and numerically prove that it is possible to construct PDE-limits of the exact stochastic SIS epidemics on regular and Erd\H{o}s-R\'enyi networks. To do this we first approximate the exact stochastic process at population level by a Birth-and-Death process (BD) (with a state space of $O(N)$ rather than $O(2^N)$) whose coefficients are determined numerically from Gillespie simulations of the exact epidemic on explicit networks. We numerically demonstrate that the coefficients of the resulting BD process are density-dependent, a crucial condition for the existence of a PDE limit. Extensive numerical tests for Regular and Erd\H{o}s-R\'enyi networks show excellent agreement between the outcome of simulations and the numerical solution of the Fokker-Planck equations. Apart from a significant reduction in dimensionality, the PDE also provides the means to derive the epidemic outbreak threshold linking network and disease dynamics parameters, albeit in an implicit way. Perhaps more importantly, it enables the formulation and numerical evaluation of likelihoods for epidemic and network inference as illustrated in a worked out example.


Introduction
An epidemic is a complex phenomenon that arises from a pathogen spreading over the contact structure of a population. Similar spreading phenomena occur in various disciplines, from biology and social sciences to engineering. Unsurprisingly, much modelling effort has been put into studying spreading processes on networks, as they offer a natural framework to mimic real-life contact patterns [3] and the important heterogeneities within these. The use of networks is extremely intuitive with each individual encoded as a node, and all its contacts (to other individuals) as links. Unfortunately, the resulting exact probabilistic model does not scale well with the size of the network, N . Even when relatively simple models, such as susceptible-infected (SI) or susceptibleinfected-susceptible (SIS), are considered, the exact model has 2 N equations, which quickly becomes intractable.
To address this high dimensionality, mathematical descriptions often focus on population-level statistics (e.g. expected number of infected people at any given time). This has led to a number of so-called mean-field models [21,31,30], offering a good first approximation of the evolution of some population-level or averaged quantities. These include pairwise models based on moment closure techniques [20,21], effective-degree [24], edge-based-compartmental [27] models and even PDE models [34]. All such mean-field models share a number of caveats [33]. For example, (i) in general the agreement between these and the exact stochastic model breaks down close to the epidemic threshold, (ii) there are very few cases where it is possible to prove mathematically that the mean-field model is the limit of the exact stochastic process (this has only been done for SIR epidemics and configuration networks [8,18]) and (iii) they give no estimate of the variability observed in the exact process. It is also well-known that such mean-field models only work for a limited class of networks; epidemics on clustered networks are not well-understood, except for idealised clustered networks.
There are ongoing efforts to try to understand and answer rigorous mathematical questions when it comes to analyse or approximate dynamical processes on networks, see [32] for a recent summary. Progress in this area is usually achieved by bringing in and combining results and techniques from different areas of mathematics. One particularly promising prospect for SIS epidemics on networks is to consider them as Birth-and-Death (BD) processes. In a recent paper [9], we conjectured and confirmed numerically that SIS epidemics are well captured by BD processes, whose rates encode characteristics of both the network structure and the epidemic dynamics. This was tested on Regular, Erdős-Rényi and Barabási-Albert networks. This choice was motivated by the intuition that epidemic spread is driven by the 'birth' of new infected nodes. However, this occurs at a rate which is proportional to the number of S-I (active) links, and these are readily observable during explicit stochastic simulations of the epidemic on networks.
In this paper we build on the above observation and take the next natural step, that is, to consider the large N limit of the BD process, i.e., the one-dimensional PDE (Fokker-Planck equation). We focus on Regular and Erdős-Rényi networks and show that the resulting rates in the BD process are density-dependent such that the limit is well defined in the sense of [19]. We compute the rates numerically and also provide a parametric form for them. We show that the resulting PDE agrees well with the output of explicit simulations of stochastic epidemics on networks. The existence of the PDE limit has multiple advantages. First, it reduces further the dimensionality of the system. Second, it gives us the opportunity to compute an epidemic threshold even in an implicit form. Third, it provides the means to get a handle on the variability of the stochastic process with the solution of the PDE providing a likelihood that can computed cheaply and efficiently for inference purposes. Finally, the good agreement between the PDE and the exact process provides further evidence that the BD model may indeed serve as a valid approximation of the exact process (the relation between the exact, BD and PDE-limit model is illustrated in figure 1) and, where a formal proof, beyond numerical validation, may be possible. This paper is structured as follows. In Section 2 we briefly outline the Birth-and-Death approximation of SIS epidemics as in [9]. In Section 3 we numerically test and prove that the conditions for the existence of the large N limit, that is, the existence of the PDE, are met. This is done for Regular, Erdős-Rényi networks. We then show that the solutions of the partial differential equations agree well with the empirical distributions based on simulations of the true process. In Section 4 we draw some conclusions and outline further research directions.
Master equation Figure 1: Schematic illustration of various approximations of the exact stochastic SIS epidemics on networks. The PDE limit comes as a result, and further confirms the validity, of the Birth-Death approximation conjectured in [9].

Birth-and-Death Approximation of SIS Epidemics
We briefly describe the model proposed in [9] and which conjectures that exact stochastic SIS epidemics on networks can be approximated by BD processes. A standard SIS model on an undirected unweighted network G with N is considered, where each node is either susceptible (S) or infected (I). Infected nodes spread infection to their neighbours with constant per-link rate τ and recover with rate γ (independent of the network). This stochastic process results in a continuous time Markov chain on a space state of cardinality 2 N , which forbids analysis even for relatively small values of N . Instead, we consider the population-level count of infected nodes, defined as where I i is an indicator function equal to 1 if node i is infected at time t and 0 otherwise. k(t) ∈ [0, N ], where k(t) = 0 indicates the state where no infection is present in the network. Given the stochasticity of the process, k(t) is itself a random variable taking values on state space of cardinality (N + 1), which is computationally a lot more computationally. We note that each time an infection/recovery occurs, the value of k(t) changes by discrete jumps of size ±1, respectively. This has led to the conjecture [9] that the population-level count process can be approximated by a Birth-and-Death process, governed by the following master equation: where p k (t) is the probability of having k infected nodes at time t, c k = γk is the global recovery rate when k nodes are infected, and a k is the rate at which the population goes from k to k + 1 infected individuals.
The approximation is exact in the case of a complete or fully connected network, where the a k rates are given by the expression a k = τ k(N − k). In the general case, the a k 's are random variables themselves, since the rates at which infections happen are the product of τ times the total number of S − I links in the network, a random variable that reflects the topology of the network and the way in which the epidemic positions the k infected nodes on the network. This means that the epidemic at population-level is not Markovian, making an exact treatment much more difficult and still out of reach.
However, by using the master equation, we can recast this process as a Markovian one using a suitable approximation of each rate a k . A natural proposal is a quantity that captures the average rate of infection, weighted by the time spent in the observed states, that is: where ξ k are the observed counts of the number of S-I links on the network when k infected nodes are present and and t ξ k is the lifetime of this particular state. This quantity is responsible for driving the epidemic: The higher the number of S-I links, the larger the rate of generating more infected nodes. The a k 's can be obtained by averaging over many realisations of the epidemic on different realisations of networks from the same family. This can also be interpreted as averaging out stochasticity at link-level and keeping it at population-level. Hence, the variability in epidemic paths will be due to the stochasticity of the master equation itself, guaranteeing the Markov property of the Birth-Death process. The solution of equation 1 with these proposed rates has been shown to be in excellent agreement with the average from many simulations for various network models and epidemics [9].

Fokker Planck equation as a limit of the Birth-Death process
Master equation 1 can be used as a starting point to build its continuous (in space) limit, i.e., the Fokker-Planck equation [14,21]. The idea is to approximate the solution p k (t) by considering it as a discretisation of a continuous function f (t, x) in the interval [0, 1], defined as For the large N limit to exist, it is known [21,23,11,29,4] that the rates of the master equation need to satisfy the following density condition: It is worth noting that condition (3) is not guaranteed to hold for any network model, and must therefore be validated on the network models of interest. We expect this condition to hold for networks we refer to as frequency-dependent networks, whereby local topology is preserved as N increases (regular networks, for example). In Section 3, we explore in more detail which network models satisfy this condition.
In the density-dependent case, it can be shown [21,29,4] that f (t, x) satisfies the following forward Fokker-Planck equation: with initial condition f (0, x) = δ(x − x 0 ), where the diffusion coefficient σ 2 (x) and the drift µ(x) are related to the a k and c k rates via: Boundary conditions are naturally emerging from two considerations: (1) if the process hits k = 0 at some time (disease-free state) it will stay there forever, (2) the number of infected nodes cannot be greater than N at any given time. Such physical constraints translate naturally in this framework into Dirichlet and Robin boundary conditions: Fokker-Planck equations of this kind have been extensively studied numerically, especially in the biological context of population random genetic drifts [10,6,2,5], as well as analytically [13,35,22].
In particular, in [22], this equation is studied in the limit of large t to characterise the so-called quasi-steady state [26,7] (where the only steady state possible is absorption at 0), whereas in [5,6] various numerical schemes to solve such equations are employed and compared in terms of numerical instabilities and performance. In the Appendix we describe our numerical scheme of choice, which is an adaptation of a finite volume method (FVM) scheme already described in [6].  Table 1: Values of network and epidemic parameters for the benchmark scenarios chosen to test the PDE limit of large networks. R 0 has been computed using the formula R 0 = τ k τ +γ as described in [21].

Validation of the density dependence condition
In order to use eq. (4) we need to verify that the rates of the BD process satisfy condition (3).
Recoveries are independent of the network, therefore, the condition is automatically satisfied as the expression for these rates is c k = γk. Infection rates, instead, need to be inspected more closely, as their values are dependent on the topology of the network. Indeed, condition (3) has an important physical interpretation: whilst globally epidemics might look very different at different network sizes, locally the network structure may not change. For example if we consider a regular network with degree m, each node maintains the same number of neighbours, independent of network size. Thus regular networks are excellent candidates to test the density-dependence assumption. This is similarly true for Erős-Rényi networks with fixed average degree. On the contrary, if we  Table 1 and with N = 1000. In each panel 10 realisations of the epidemics are plotted,. The parameters used to generate such networks are reported in table [1], higher epidemics corresponding to higher values of R 0 .
consider a Barabási-Albert network model, we observe that whilst the average degree remains constant, the hubs become more connected as network size increases, which in turn means that the variance of the degree distribution also increases. This growing influence of the hubs with network size will drastically affect the spreading pattern. For this reason, we do not expect the densitydependent condition to hold for Barabási-Albert networks. Finally, fully-connected networks enter this framework by noting that the infection rates have an analytical expression a k = τ k(N − k). However, these clearly violate condition (3). This can be corrected by requiring that τ scales as τ /N = ct in the limit of large N . This seems to be in line with the frequency-dependent consideration since the rescaling means that the average number of neighbours a node can infect does not simply increase with N . This case is well-known in the literature [17,1], albeit in a SDE formulation, so we limit our treatment of it to reporting the exact Fokker-Planck equation for the fully connected network, i.e.
To test the scaling hypothesis, the infection rates, based on eq. (2), computed for different values of N and different networks are plotted in figures 3 and 4. Using eq. (3), these rates are rescaled and plotted again in the same figures confirming that they define a universal rate. When N is scaled, a little variability between the (k, a k ) scaled curves emerges, due to the high degree of stochasticity of the process. However, the difference is so small that the Fokker-Planck equation and its solution appear insensitive to the exact choice of the rates. For here onwards, we use the rates corresponding to the N = 1000 case.

Comparing PDE and simulations
Since the limit of large N is of interest it is beneficial to have a continuous function that fits the discrete a k rates (2). In [9], the following three-parameter model was proposed: Corresponding scaled rate (k, a k N ) curves. The scaling hypothesis can be checked by noticing that the higher the values of N , the closer the scaled curves get to the limiting universal curve. As N increases, the differences between scaled rates decrease. In the inset, the small mismatch between curves with N ≥ 1000 are highlighted using a 30x zoom. For completeness, the (k/N, γk/N ) curve is also reported (in black); it intercepts the scaled curves around the steady state. This model can be fitted to the a k curves via a least-square approach, by minimizing the following cost: In [9], we showed that this approach leads to good agreement with simulations from different network classes, in particular, regular and Erdős-Rényi networks as considered in this paper.
In the following, we make use of this function to model the infection rates of master equation (1). However, using a simple function to model the complexity of the a k rates is adding an additional layer of approximation to our approach. Therefore, in addition to eq. (6) we also consider a cubic spline of the a k rates, as it provides a much better fit to the rates based on eq. (2) and therefore yields better results. To summarise, the rates of infection are first found based on simulations via eq. (2). As this approach produces a discrete function that cannot be used as is in the Fokker-Planck equation, we propose two alternatives: (a) the (C, a, p) model, eq. (6), and (b) a spline. The PDE is considered with both rates and the numerical solution of the PDE is computed via a Finite Volume Method (several other numerical schemes [28,5,6] were tested) as it guarantees that the solution remains non-negative and preserves mass, see Appendix.
To show the agreement between the Fokker-Planck equation (4) and results from simulations on networks, we selected six combinations of network and epidemic parameters, as described in Table 1.
We selected three networks from each family (regular and Erdős-Rényi) and tuned the parameters such that for each family we could get three epidemics with different characteristics, i.e., transient and quasi-steady state. To show this, in figure (2) we illustrate a few realisations of epidemics on networks of size N = 1000 for each scenario. Further, we provide the reproduction number R 0 = τ k τ +γ [21] for each of this scenario. Parameters were chosen so that, for each family, the three quasi-steady states showed a prevalence of circa 0.25, 0.5 and 0.75, respectively. To find the (k, a k ) curves via minimisation of (7), we generated data as follows: for each scenario, we created 50 realisations of the network, and on each we ran 200 realisations of the epidemic, half of which started from k 0 = 1, the other half from k 0 = N . This was done in order to obtain observations over the whole range of possible values of the infected nodes. Indeed, when epidemics start from low k 0 values, they only very rarely reach a prevalence much higher than the quasi-steady-state.
The numerical solutions of equation (4) are compared with results based on Gillespie simulations [15,16], see figures 5 and 6. The same quality of agreement holds for all scenarios we tested, as long as the size of the network is ≥ 1000. For small networks, there is a finite-size effect that does not allow for as good a fit. Interestingly enough, although there are small differences between different a k curves, as long as N ≥ 1000, the exact choice of N has little impact on the numerical solutions of the PDE. This supports our conjecture that there is indeed a large N limit and, therefore, a universal scaled a k curve which is approached as N increases. As can be seen, the spline consistently leads to a better approximation. This is simply due to a tighter fit to the discrete data compared to the fit based on the (C, a, p) model. We note, however, that the (C, a, p) model captures the trend of the epidemic and the quasi-steady state is fitted well.
To realise the comparisons provided in figures 5 and 6 we proceeded as follows. We considered the same network realisations and epidemic parameters used to find the (k, a k ) rates. We fixed the initial condition to be k 0 = 1 and ran 200 simulations on each realisation. Each individual path  Figure 6: Same scenario as in figure 5, but using the parameters given in the fourth row of Table 1, i.e., the first parameter configuration for Erős-Rényi networks.
was then sampled at regular times in order to build the empirical distribution p(x, t). Note that all simulations were kept, even those that died out early. This is because the numerical scheme preserves the total probability and can account for these early extinctions.
The PDE with the (C, a, p) model is Our three-parameter model, (C, a, p), can be used to derive the epidemic threshold. In terms of the PDE, see equation 8, and as figures 5 and 6 show, an epidemic is supercritical when the drift term is positive. This implies that the epidemic threshold is equivalent to at the start of the epidemic, that is x ≃ 0. The fact that the values of p are typically close to one leads to taking the limit x → 0 in the expression within the brackets above leads to This expression reduces to the well-known condition R 0 = τ γ ≥ 1 for fully connected networks. Indeed, scaled rates for such networks can be computed exactly to be a(x) = τ N 2 x(1 − x), meaning that C = τ N 2 , a = 0 and p = 1 in this case. This equation is implicit, as, of course, both C and a depend on the network and epidemic parameters in a non-trivial way. Therefore it cannot be used as it is, but it offers an interesting interpretation since a determines whether the (k, a k ) curves are left-or right-skewed, see [9]. Furthermore, the topology of the underlying network plays an important role in determining the shape of this curve; for example, Barabási-Albert networks lead to (k, a k ) curves with a left skew [9]. Thus, all else being constant, networks with high degree heterogeneity are more likely to see the threshold go past the critical value.

Inference of infection rates using the Fokker-Planck approximation
In this last section, we provide a simple example of the usefulness of the Fokker-Planck approximation: inferring epidemic and network parameters given data. Specifically, we consider the case in which a single trajectory of BD (or Gillespie simulation of the epidemic on an explicit network) process is observed at discrete time-steps, i.e.: where (k 1 , . . . , k n ) ∈ {0, . . . , N } n and (t 1 , . . . , t n ) ∈ [0, T ] n are the sets of states and times (0 ≤ t 1 < · · · < t n ≤ T ), respectively. To set up the inference, we express the likelihood using the transition probability function of a BD process as follows (using the independence of increments and time homogeneity): Unfortunately, for a large state space, these transition probabilities are numerically expensive to compute. Additionally, inferring the full set of rates a, c may not be efficient. Instead, we recast this problem as that of inferring the C, a, p parameters of the Fokker-Planck approximation. This is a much more tractable numerical problem, that can still provide useful information about the underlying network and epidemic, as showed in [9]. This amounts to replacing the previous likelihood function with the following: where f (t, x; x ′ , C, a, p) is the transition probability density obtained from equation (4), the coefficients are given by the C, a, p model, x i = k i N for all i ∈ [1, n] and the initial data is a Dirac delta at location x ′ ∈ (0, 1). In this example, f is computed numerically using the finite-volume numerical scheme described in the Appendix. To illustrate the accuracy of this approach, we consider a set of parameters from the choices of Table 1 (figure 7 shows the behaviour of the system when parameters are those on the 5th row of Table 1, i.e. C = 1.36e − 05, a = 3.44e − 2, p = 9.7e − 1) and generate a trajectory from a single realisation of the SIS epidemic on a Erdős-Rényi network of size 1000, via Gillespie algorithm. This dataset is shown in figure 7 and consists of n = 30 distinct data points taken from the epidemic curve. These are then scaled to [0, 1] (taking x i = k i N for all i ∈ [1, n]). The dataset is then used to find a Maximum Likelihood Estimator (MLE) by simply maximizing the likelihood function from equation (3.3) with respect to C, a, p, that is finding (Ĉ,â,p) = arg max L F P (y; C, a, p) .
To show that this method provides a good estimate, we simply plot the MLE rates, (Ĉ,â,p), against the rates that were used to simulate the data in figure 7. The true and estimated rates are indeed= in good agreement. We repeated the inference scheme for all benchmark cases in Table 1 (not shown). The agreement was similar to that shown in figure 7 (right panel).
It is worth noting that the goodness of inference depends on how many points the dataset contains and also how much of the transient and the quasi-steady state is captured. In the transient, the drift dominates and the process is more stochastic. On the other hand, in the quasi-steady state, the drift coefficient tends to 0 and fluctuations around it are mainly due to diffusion. Hence, both regimes are needed if drift and diffusion are to be inferred correctly. Data in the transient or in the quasi-steady state alone can lead to sub-optimal inference as different parameter combinations that provide good fit can be found. The example reported in figure 7 is an illustration of how useful the PDE limit of epidemics on finite networks can be in a network and epidemic inference setting. Further, the approximation that we provide can also be used in a Bayesian approach, by first setting a prior over the parameters C, a, p for instance.

Conclusions
In this paper we conjectured the existence of PDE limits for exact SIS epidemics on regular and Erdős-Rényi networks. The key to our approach is to use a BD approximation which then has a PDE limit provided that the coefficients of the BD process are density-dependent. Hence, one of the main challenges was to verify, at least numerically, that this was the case. We could do this for regular and Erdős-Rényi networks and argued that 'frequency-dependent' networks are likely to satisfy this condition without imposing further scaling on other parameters. Intuitively this means that simply increasing the number of nodes in a network will not change what a node experiences locally, e.g. the number or distribution of neighbours it has.
To solve the PDE numerically, we employed a second order in time finite volume method whose stability was proven in [6]. We compared such numerical solutions to probability distribution sampled from the Gillespie simulation. The agreement is in general good and, as expected, it becomes excellent as N increases. The existence of the PDE limit is not surprising, given that the coefficients of the BD process are density dependent. However, it is important to note that the agreement between the solutions of the PDEs and empirical distributions based on simulations provides strong support for the validity of the BD process, strengthening the evidence provided in [9], and thus closing the loop illustrated in figure 1.
A PDE perspective on epidemics provides several efficiency gains. The first is to do with computational efficiency and the possibility to quantify variability. More importantly perhaps, the solution of the PDE serves as a likelihood which can be very efficiently computed/evaluated and can form the basis of many networks and epidemic inference models, see Section 3.3. This is in contrast with approaches where the networks are explicitly modelled [25] and computational complexity can make inference out of reach.
At least two separate avenues of future research emerge. First, and perhaps most importantly, a theoretical justification for the Birth-and-Death approximation is still needed, if indeed that is possible. Second, there is a need to investigate the extent to which this method can be extended to other network families and epidemic dynamics. Nevertheless, given that handling exact epidemic models on networks is still challenging even for networks of modest size, we believe that proposing new ways to approximate epidemics is worthwhile and may contribute new modelling and analysis perspectives. space derivatives, so to balance out the required space precision, we matched it with a second-order discrete time derivative. This improved the stability of the solution.
The difference between instances of FVMs is how the current term is discretised. In [6], several different schemes are explored. In particular, the FVM that performed better was the so-called central scheme, in which j n where σ 2 (x i ) = a(x i ) + γx i and µ(x i ) = a(x i ) − γx i .
The initial condition f (x, 0) = δ(x − x 0 ) is approximated by a Normal distribution f (x, 0) ≈ N (x 0 ,σ) withσ ≪ 1. The stability of the solution with respect to the varianceσ is discussed in [6], and we have chosenσ = h 2 . The mismatch that can be seen in figures 5 and 6 at 0 is due to the fact that the algorithm cannot reproduce a δ in 0, and should not be considered a problem, as the mass outside of 0 is correctly computed by the numerical solver. To test whether absorption at 0 could have been a problem for the solver, we repeated the calculation allowing for a small re-infection rate at 0 ǫ > 0, without noticing differences in the results. Our implementation is available online at https://github.com/Fdl1989/PDElimitofepidemics.