Predicting and forecasting the impact of local outbreaks of COVID-19: use of SEIR-D quantitative epidemiological modelling for healthcare demand and capacity

Abstract Background The world is experiencing local/regional hotspots and spikes in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes COVID-19 disease. We aimed to formulate an applicable epidemiological model to accurately predict and forecast the impact of local outbreaks of COVID-19 to guide the local healthcare demand and capacity, policy-making and public health decisions. Methods The model utilized the aggregated daily COVID-19 situation reports (including counts of daily admissions, discharges and bed occupancy) from the local National Health Service (NHS) hospitals and COVID-19-related weekly deaths in hospitals and other settings in Sussex (population 1.7 million), Southeast England. These data sets corresponded to the first wave of COVID-19 infections from 24 March to 15 June 2020. A novel epidemiological predictive and forecasting model was then derived based on the local/regional surveillance data. Through a rigorous inverse parameter inference approach, the model parameters were estimated by fitting the model to the data in an optimal sense and then subsequent validation. Results The inferred parameters were physically reasonable and matched up to the widely used parameter values derived from the national data sets by Biggerstaff M, Cowling BJ, Cucunubá ZM et al. (Early insights from statistical and mathematical modeling of key epidemiologic parameters of COVID-19, Emerging infectious diseases. 2020;26(11)). We validate the predictive power of our model by using a subset of the available data and comparing the model predictions for the next 10, 20 and 30 days. The model exhibits a high accuracy in the prediction, even when using only as few as 20 data points for the fitting. Conclusions We have demonstrated that by using local/regional data, our predictive and forecasting model can be utilized to guide the local healthcare demand and capacity, policy-making and public health decisions to mitigate the impact of COVID-19 on the local population. Understanding how future COVID-19 spikes/waves could possibly affect the regional populations empowers us to ensure the timely commissioning and organization of services. The flexibility of timings in the model, in combination with other early-warning systems, produces a time frame for these services to prepare and isolate capacity for likely and potential demand within regional hospitals. The model also allows local authorities to plan potential mortuary capacity and understand the burden on crematoria and burial services. The model algorithms have been integrated into a web-based multi-institutional toolkit, which can be used by NHS hospitals, local authorities and public health departments in other regions of the UK and elsewhere. The parameters, which are locally informed, form the basis of predicting and forecasting exercises accounting for different scenarios and impacts of COVID-19 transmission.


Description of Mathematical Model
We propose a novel model which breaks down the typical infectious compartment of a Susceptible-Exposed-Infectious-Recovered-Dead (SEIR-D) model into two compartments, one strand to model individuals who become infected with COVID-19 and will be going to hospital, and the other strand to model individuals who won't need to go to hospital and thus remain undetected by healthcare requirements. The objective of separating the strands like this is so we can fit the model to available data presented to us by the Sussex Clinal Commissioning Group (CCG) and local County Council authorities (Brighton & Hove, East and West Sussex) using our novel inference algorithm detailed in Sections 2.1 and 2.2.
The mathematical model takes the form of a temporal epidemiological dynamic system of ordinary differential equations governed by the interactions depicted in Figure 1 supported by non-negative initial conditions:Ṡ = −β U + I N S, t ∈ (0, T ], In this setting, let N denotes the total regional population in the Sussex regions of the UK (with N approximately 1.7 million). Then, S(t) denotes the proportion of the total population N who are susceptible to the disease, COVID-19. These become exposed to the disease, i.e. they are carrying the disease but are not currently infectious, to form the E(t) subpopulation at rate λ(t) which represents the current infectivity. The rate λ(t) is the product between β, the average transmission rate, and the probability that a contact to an individual leads to an infection (U (t) + I(t))N −1 . The E(t) subpopulation is in an incubation compartment and individuals become infectious at an average rate γ E . A proportion of E(t) becomes infectious but, in spirit of the healthcare demand, remain undetected, denoted U (t), with probability p. The other proportion of E(t) becomes infectious and will require hospitalisation in the future, denoted I(t), with probability 1 − p. The subpopulation that does not require hospitalisation can either progress to recover, at an average rate γ U with probability 1−m U , to form the recovered population, denoted by R U (t), or die, at an average rate γ U with probability m U , to form the dead population, denoted by D U (t). Considering the infectious population that will be going to hospital, these individuals will become hospitalised, denoted by H(t), and thus be in hospital care at an average rate γ I . Once in hospital, patients can evolve in two separate pathways, a proportion of the hospitalised population can fully recover at an average rate γ H to form the subpopulation denoted by R H (t). Alternatively, if they can not recover, then they die whilst in hospital at an average rate µ H , to form the dead population denoted by D H (t).
In the spirit of epidemiological models of this nature, β denotes the average transmission rate, γ −1 E denotes the average incubation time, p denotes the average proportion of infectious individuals who will not require hospital treatment, γ −1 U denotes the average infectious period, m U denotes the average infected fatality ratio for undetected cases, γ −1 I denotes the average infectious period from becoming infectious to being admitted to hospital, γ −1 H denotes the average hospitalisation period for those who recover and µ −1 H represents the average hospitalisation period for those who die.

Methodology for parameter inference
Now that we have formulated the full model system to be studied, we next present the methodology for parameter inference given reliable datasets as mentioned in the previous section. Indeed, we were presented with hospital data, made up of daily hospital admissions (the red dashed arrow between I(t) and H(t) in Figure 1), daily hospital discharges (the red dashed arrow between H(t) and R H (t) in Figure 1) and daily hospital bed occupancy (the red dashed H(t) compartment in Figure 1), as well as weekly death data, which was split into weekly hospital deaths (the blue double dashed arrow between H(t) and D H (t) in Figure 1) and weekly outside of hospital deaths (the red dashed arrow between U (t) and D U (t) in Figure 1). We note here that the weekly death data outside of hospital is dominated by deaths in care homes. In this section we will go into the mathematical details of the inference algorithm described in the Inferring model parameters given hospital datasets section of the manuscript. The algorithm is two fold. First we exploit the linear relationships between mortality in hospitals and discharges from hospitals. Second we derive the observational model, a representation of equations (1)-(5) described only in terms of the model parameters and compartments that are captured by the mathematical interpretation of the data, and use an inverse approach to fit that to data using the maximum likelihood estimate. These two methods combined give us values for all the model parameters and initial conditions.

Mathematical notation
Given that we have hospital discharge data and those that die in hospital, we start by considering (5), where we describe the number of individuals moving from the hospital compartment H(t) to the hospital recovered R H (t) and then we describe the number of individuals moving from the hospital compartment H(t) to the death in hospital compartment D H (t). We then seek to find a linear relationship between the two descriptions and use linear regression analysis to estimate the relationship between them.
We begin by considering the hospital discharged data which is available for 84 consecutive days starting March 24th, 2020. We denote the set of data points by C H , with each data point denoted as C H,i , for i = 1, . . . , 84. By considering (5) and (7), the rate of individuals being discharged in a given day from hospital is given by γ H H. This allows us to compute the number of discharges per day in the following way where t is a given day. We assume that the data is perturbed with Gaussian noise of unknown mean, denotedx H , and unknown standard deviation, denoted σ H , which leads to the following relationship where i denotes a time point of the data and ξ H,i ∼ N (0, σ 2 H ). We now consider the data for those who die in hospitals. The data for deaths in hospitals is available on a weekly basis, starting on week 14 (week ending April 3rd, 2020) up to week 24 (week ending June 12th, 2020). We denote the set of data points by C w D H , with each data point denoted as C w D H ,j , for j = 14, . . . , 24. By considering (5) and (9), the rate of individuals dying in hospital in a given day is given by µ H H. In a similar fashion, we can calculate the number of deaths per day in the following manner Since the data is weekly rather than daily, the number of deaths per week is calculated by We again assume that the data is perturbed with Gaussian noise of unknown mean, denoted byx D , and unknown standard deviation, denoted by σ D , which leads to the following relationship where t j is the last day of week j and ξ D,j ∼ N (0, σ 2 D ).

Regression analysis
From (10) and (12) we find which is a consequence of equations (5), (7) and (9), and gives light to one of the parameters, γ H µ −1 H , to be estimated. We are going to treat the daily deaths as unknowns in our model, by solving the least squares problem defined by (11) and (14), together with For simplicity, we assume that the covariance for the weekly mortality data is seven times larger than the covariance for the daily discharged data, i.e. σ 2 D = 7σ 2 H . The rationale of this assumption is that the relationship assumes that on the same time scale, mortality and discharged data would have the same noise levels, and that mortality data is composed of independent daily measurements. We note that this is a conservative assumption, since data collection methods are diverse.

Inference for hospital model
In this section we present the inference equations and the observational model we derived in order to infer the parameters and initial conditions. The concept is as follows: we define variables that describe the data we have, in terms of the model compartment and parameters, and express the model to only be in terms of these new variables and the model parameters. This establishes what parameters we can identify and infer from the data.

Mathematical notation
The data we have to utilise is daily hospital admissions, daily hospital discharges, daily hospital bed occupancy and weekly non-hospital deaths. Note, we do not consider hospital deaths as we have already used this data to find the relationship on γ H µ −1 H . In terms of the hospital data, considering (5), we know thatḢ = H in − H out := admitted − (discharged + died) and thus describes the daily hospital admissions, and describes the daily hospital discharges. Similarly, considering (6), we have that describes the daily deaths outside of hospital. Note, for ease of notation we consider daily death data rather than weekly, and we have denoted (18) this way so as not to confuse with (10), since although it is the same data, the variables are being used in different settings. So far we have not introduced the hospital bed occupancy data, since this is described directly by H(t) we will include this straight into the log-likelihood function to be fit directly to the data. Before we derive the observational model we re-introduce the full model for ease of exposition, where we have omitted the deaths and recovered compartments (equations (6)-(9)), namelẏ Here we have also denotedβ = βN −1 andμ H = 1 + µ H γ −1 H for ease of notation.

Observational Model
We first look to manipulate equations (20)-(24) to only be in terms of the compartments U (t), I(t) and H(t) and their derivatives so that an easy application of calculus gives us the reduced equations in terms of c Ad , c Dis and c D U . Since we have three variables, we look to derive three equations in terms of U , I and H. One notices that (24) is already in terms of the required compartments. We can use (23) to gain E in terms of the required compartments, namely which we can then use in (22) to gain our second necessary equation. To gain the third equation we utilise (25) to manipulate (20) and (21) to be in terms of the required compartments, which leads us to the following system of equationsḢ ...
For ease of notation, let us denote by x :=ċ Ad , y :=ċ D U and z :=ċ Dis . Thus, using (17)-(19) in (26)-(28), the observational model for (20)-(24) is Note, the observational model involves all the parameters so given enough data points, and since the Jacobian is invertible, this implies structural identifiability.

Inference
Now that we have the observational model of (20)-(24) in terms of the data, we can proceed to fit the data and gain the inferred parameter values by minimising a log-likelihood function. In a similar manner to Section 2.1 we denote the set of daily hospital admissions data as C Ad , with each data point denoted as C Ad,i , for i = 1, . . . , 84. Similarly, we denote the set of daily hospital discharge data as C Dis , with each data point denoted as C Dis,i , for i = 1, . . . , 84. We denote the set of daily hospital bed occupancy data as C HO , with each data point denoted as C HO,i , for i = 1, . . . , 84. Finally, we denote the set of weekly death data outside of hospital as C w D U , with each data point denoted as C w D U ,j , for j = 14, . . . , 23. Assuming that the data is collected subject to centered Gaussian errors, scaling the mortality data error to account for weekly data, we have the following negative log-likelihood function to minimise where I and J are defined as before. We have introduced the weight w 1 = 10 −1 so that the data is comparable, i.e. so that the magnitude of the daily hospital occupancy is of the same magnitude as the daily hospital admissions and discharge data. Without any constraints, (32) has multiple local minimisers. We therefore impose two physically relevant constraints based on the following assumptions. First, we assume that the population within the model, excluding the recovered and death compartments, is between 90% and 100% of the total population N that we consider. This reflects the fact that an unknown fraction of the population is already immune or are not susceptible, due to the shielding programmes in place, for example. Second, we assume that the effective reproductive number, defined as at the beginning on the simulation is less than 1 but greater than 0.6. This reflects the effect of the lockdown on the population dynamics and avoids non-feasible parameters that involve very high infection rates leading to close to 100% of infections in a short period of time, which is unrealistic in the time period of the data we are considering. Note, R 0 is the basic reproductive number and is equivalent to R t when S 0 = N , which is typically comparable to the very beginning of an epidemic, i.e. when there are very few individuals infected. This is not necessarily the case here as we don't know how many individuals have already been infected at the beginning of the lockdown period, when our first bit of data is, and thus the beginning of our simulation, hence why we explicitly differentiate between notation here. Using the method of next-generation matrices described in [4], we obtain the following expression for the basic reproductive number Utilising a similar method as in Section 2.2.2, one can obtain the initial conditions S 0 , E 0 , U 0 and I 0 in terms of the data, whereby we already have H 0 since H 0 = C HO,1 by definition. Including the constraints, the negative log-likelihood then reads where, for ease of notation, we have left the initial conditions and effective reproductive number in their original notation. To aid clarity in notation, we have introduced N 0 = S 0 +E 0 +U 0 +I 0 +H 0 . For the simulation, we took the weights w 2 = 10 2 and w 2 = 10 −9 to make all the terms of the log-likelihood comparable. We minimise (35) using the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with box constraints (L-BFGS-B) [1,5,6]. The box constraints were used to ensure positivity of the relevant parameters as well as positivity of the initial conditions.

Robustness analysis of parameters
Due to the size of the model along with the number of parameters, we don't consider a large-scale investigation into parameter sensitivity but conduct a rudimentary investigation. In order to measure the sensitivity of each parameter we calculate the change in the log-likelihood (in %) for a change in each of the parameters independently, i.e. we only change one parameter at a time. The optimal parameters (and thus chosen parameters in Table 1 in the manuscript) make the smallest mismatch, i.e. the smallest output of the log-likelihood, of all the parameter values considered in the minimisation algorithm, and so we calculate the relative percentage change using the perturbed parameter from the optimal parameters. We note here that the observed value from the log-likelihood will never be zero due to the error in the data. We increase and decrease each of the fitted parameters by up to 1% from its optimal value and demonstrate the change in the log-likelihood in Table 2. As one can see, γ U and m U are not sensitive to changes at all but seemingly β, γ E , p, γ I and γ H are. We postulate that this is in fact a structural issue of the model and the data rather than a parameter issue. If all the other parameters are fixed, both p and γ E control the number of individuals we see in each of the branches describing U or I. We have data describing the number of individuals in the I branch, using hospital capacity data, and so both these parameters are actually very well characterised by the data, and thus any change in their values, with all the other parameters fixed, produces a large mismatch. Similarly, with p and γ E fixed, γ I controls the number of individuals moving from I to H and γ H controls the number of individuals leaving H, and again, since we have the hospital admissions, occupancy and discharge data available, this value is also well characterised by the data. We also postulate that γ U and m U are not as sensitive as the other parameters due to the lack of death data. It is expected that β is sensitive given that it is the main driving force behind generating infections in the model. These assertions are backed up by the results in Table 2 and Figure 2. For these results we took 30-50 data points and fitted the parameters to the data. One can see that γ E , p and γ I have a very small range within the different means, and they have a small range of standard deviations. Concentrating on the standard deviations, this means that the parameters would mostly not even be changing by 0.01% demonstrating that the actual fitting of the data is not sensitive at all. Qualitatively, changing each of the parameters by 1% results in an almost identical forecasting picture for the majority of the parameters, as can be seen in Figure 3. The reason that p looks largely different is due to the fact that p is much larger in value than the other parameters and so a 1% shift results in a larger range of values. For ease of demonstration we only demonstrate the hospital bed occupancy.