Probabilistic forecasting of daily COVID-19 admissions using machine learning

Accurate forecasts of daily Coronavirus-2019 (COVID-19) admissions are critical for healthcare planners and decision-makers to better manage scarce resources during and around infection peaks. Numerous studies have focused on forecasting COVID-19 admissions at the national or global levels. Localized predictions are vital, as they allow for resource planning redistribution, but also scarce and harder to get right. Several possible indicators can be used to predict COVID-19 admissions. The inherent variability in the admissions necessitates the generation and evaluation of the forecast distribution of admissions, as opposed to producing only a point forecast. In this study, we propose a quantile regression forest (QRF) model for probabilistic forecasting of daily COVID-19 admissions for a local hospital trust (aggregation of 3 hospitals), up to 7 days ahead, using a multitude of different predictors. We evaluate point forecast accuracy as well as the accuracy of the forecast distribution using appropriate measures. We provide evidence that QRF outperforms univariate time series methods and other more sophisticated benchmarks. Our findings also show that lagged admissions, total positive cases, daily tests performed, and Google grocery and Apple driving are the most salient predictors. Finally, we highlight areas where further research is needed.


Introduction
As a response to Coronavirus-2019 (COVID-19) pandemic, governments worldwide have enacted a series of public health and social measures aiming to restrict the spread of the virus (see WHO, 2021). Integral to the reasoning for the measures is the notion of protecting healthcare provision from reaching capacity, both for COVID-19 patients and other essential health services. There are many instances 2 B. ROSTAMI-TABAR ET AL.
where hospital capacity has been at least locally reached, resulting in redirection of patients (to other hospitals) and cancellations of essential operations (as doctors are moved to COVID-19 specific wards) (Bobashev et al., 2020;Weissman et al., 2020). Even when this is not the case, 'many countries face health workforce challenges, including shortages, maldistribution and misalignment between population health needs and health worker competencies' (World Health Organization, 2020, pp. 12).
To allocate resources intelligently then, hospitals and regional authorities need to have forward visibility of estimates of admissions (Gallivan & Utley, 2005). These forecasts need to be localized (region level, e.g. area of coverage by 1-3 hospitals) and short-to mid-term in terms of the time they cover. Localized rather than the aggregate (country level) daily forecasts allow for planning and reshuffling of resources between different wards and jurisdictions as needed.
This need for localized, daily forecasts, however, presents challenges in terms of modelling. Daily time series admissions are count data i.e. non-negative integer time series. As we move to finer granularities (e.g. from global/country to local/region, and from monthly to daily), and during periods when a virus is subdued, these count data can often become low volume. Crucially, given the inherent variability in the admissions time series, it is meaningful to focus on modelling the full forecast distribution of admissions for planning, rather than to focus only on a point forecast (in line with recent recommendations in the wider forecasting literature, see Goltsos et al., 2021).
Recent peaks of infections of the Omega family of variants tell us that forecasting COVID-19 admissions remains relevant (BBC, 2022). It has also been pointed out that 'changes in climate and land use will lead to opportunities for viral sharing among previously geographically isolated species of wildlife' (Carlson et al., 2022). This increased circulation of viruses might very well trigger different emergencies, perhaps similar to what COVID-19 has brought. Accurate long-term prediction of the number of COVID-19 patients needing admission is challenging, see Ioannidis et al. (2022). Prediction accuracy for COVID-19 hospital admissions at shorter horizons has been encouraging [see, for example, (Bekker et al., 2023)]; however, the literature on short-term forecasting is still rather scarce, and our study aims to contribute to this area of research.
In this paper, we aim at forecasting daily COVID-19 admissions up to 7 days ahead by utilizing various predictors, all of which are publicly available. We examine whether these variables can explain the variability in daily COVID-19 admissions and use significant predictors to build a forecasting model. We propose a Quantile Regression Forest (QRF) model and show that it outperforms univariate (exponential smoothing and ARIMA families) and features enabled sophisticated benchmarks [linear regression (LR), ARIMA-X, Facebook Prophet]. We generate both point and probabilistic forecasts and evaluate model performance using the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) for point forecasting and the Ranked Probability Score (RPS) to assess probabilistic forecast accuracy.
The remainder of this article is structured as follows: In section 2, we review the literature and position our work within it; in section 3, we present various data sets used in this study, followed by discussing the modelling approach and benchmark methods in section 4. We then describe the experimental setup and performance evaluation metrics in section 5; in section 6, we present and discuss our results; in section 7, we summarize our findings and present ideas for future research.

Related research
Forecasting COVID-19-related variables has evolved rapidly with numerous mathematical, statistical and computational intelligence models proposed to forecast epidemic-related variables at global, country and regional levels. The mathematical models have a long history (Kermack & McKendrick, 1927), whereby the modelling relies on dividing the populations into compartments such as Susceptible, Exposed, Infected, Removed and Dead. They are referred to as SEIRD models, and are comprised of a series of differential equations (Fanelli & Piazza, 2020;Hamzah et al., 2020;Martelloni & Martelloni, 2020;Sardar et al., 2020). Statistical models include LR, ARIMA, exponential smoothing and trend and seasonal components (Benvenuto et al., 2020;Feroze, 2020;Rostami-Tabar & Rendon-Sanchez, 2021). Computational intelligence models include neural networks, deep-learning, clustering techniques and agent-based models (Balcan et al., 2009;Fong et al., 2020;Hazarika & Gupta, 2020;Moftakhar et al., 2020;The Gleam Project, 2020;Nikolopoulos et al., 2021;Qian et al., 2021). There are also hybrid models, such as the ones proposed by Zaplotnik et al. (2020), that combine social networks (to mimic the connections between people) and a virus transmission model in a framework of data assimilation.
The forecasts are usually aimed at supporting government decision-making and planning in hospital networks, although other key aspects such as supply chain stress are emerging (Ivanov, 2020;Nikolopoulos et al., 2021). Forecasting tools for local operations (1-3 hospitals) appear to be less common than those for region (4-10) or country levels (10+), despite their importance in the context of strained and changing resource planning and operation. At a local level, forecasts of admissions-related variables are produced following two general approaches. The first involves starting from the national level and downscaling to the region and city to finally reach the local level. The second involves starting at the local level, by supporting or guiding the models with the aid of national or regional estimates. The rest of this section summarizes relevant research on forecasting Covid-19 admissions-related variables, including admissions, during COVID-19. To grasp the context, we include approaches for higher levels above hospitals (Table 1 summarizes the overview of models). Zhao et al. (2020) modelled admissions at the hospital level in Switzerland by use of geographical downscaling of projections from the regional level. The total occupancy of a hospital was divided in groups covering existing and COVID patients. For modelling COVID patients, the total number of ICU projected through a SEIR models was spatially disaggregated. Future patients requiring ICU were assigned geographically (by using latitude and longitude) to each administrative subdivision (canton) via stratified random sampling inside the canton with a weighting factor based on population density. Each case was assigned to the nearest hospital in travel time. The probability of attending a hospital was calculated through Monte Carlo simulations. The authors strike a balance between complexity and precision that is insightful: model at national and regional level with epidemiological models, and then disaggregate to model admissions and ICU usage at hospital level via geographic location and population density. Deasy et al. (2020) forecast bed occupancy in England. COVID-19 diagnosed cases per commissioning regions are used to forecast the likely distributions of new COVID-19 infections. Subsequently, mortality rates and required percentage of ICU by region obtained from previous literature are used to estimate ICU admission and ICU mortality per case. They employed Monte Carlo simulation to build a link between the expected daily incidence distribution and the expected excess ICU bed occupancy due to COVID-19. Qian et al. (2021) built Capacity Planning and Analysis System (CPAS), where projections are made first of admissions at the hospital level, then at the regional level and finally at the national level in the UK. Patient distribution is calculated first, followed by daily admissions based on trend forecaster (which uses an enhanced compartmental model). An agent-based model simulates outcomes for patients admitted, based on risk profiles produced by an individual risk predictor (fed with patients records) and then simulated outcomes are aggregated to the desired level. The compartmental model that drives the trend forecast is embedded in Bayesian hierarchical models and Gaussian processes to cope with the 4 B. ROSTAMI-TABAR ET AL.
scarcity of observations. This is an example of a system driven by a high-level model (compartmental) that produces forecasts from the hospital to the national level. The utility of short-term forecasts of COVID-19 hospital activity has been demonstrated during the pandemic to inform public health policy (Funk et al., 2020;Leclerc et al., 2021). Growth models (Harvey, 1984) and hierarchical forecasting methods (Nenova & May, 2016)h a v e been used by Gitto et al. (2021) to predict the demand for ICU beds at the national and regional level in Italy during the most severe period of the outbreak. The hierarchical models were used to guarantee consistency between national and regional forecasts.
From these forecasting approaches, several observations can be made. It seems that models for admissions at the local level benefit from complex systems. However, the study of their accuracy is not systematic, probably because of their use in earlier stages of the pandemic and the complexity inherent in their training. In addition, the dependencies between different levels, from national to hospital, are likely to make them more challenging to maintain than simpler models. Amid the changing circumstances, with assumptions and applicability of models being constantly challenged, it is important to be able to directly forecast admissions at the local level (1-5 hospitals) without having to rely on complex hierarchies and associated data which may be hard to obtain. This is the focus of the present article.

Data
This study has been motivated by a real-life forecasting problem faced by data analysts in a Clinical Commissioning Group (CCG), managing three hospitals in the East Midlands region of England (local level).
The variable we aim to forecast is the total daily number of COVID-19 patients admitted to three hospitals of a region (covered by the CCG), we refer to this as daily COVID-19 admissions or simply admissions. To that end, this study utilizes various sources of daily time series data collected at the CCG level, including (i) daily COVID-19 admissions, (ii) COVID-19 confirmed cases based on age groups, (iii) calls to the UK COVID-19 hot-line (NHS111) based on age groups, (iv) Google and Apple human mobility, (v) COVID-19 tests conducted and (vi) percentage of the population vaccinated with the first and second doses. We share the collected data used in this article to improve reproducibility (Boylan, 2016).
The admission data and all predictors are collected for a region corresponding to a CCG in the East Midlands region of England between 18 March 2020 and 30 June 2021. Therefore, they all correspond to the same area/geographical location. We use the time series from 18 March 2020 up to the end of March 2021 as in-sample (training) and from the first of April 2021 until the end of June 2021 as out-of-sample (test). In this section, we briefly describe these data sets and their sources.

Daily COVID-19 admissions
The daily COVID-19 admissions data are obtained from the official UK government website 1 for data and insights on Coronavirus . This database contains the total number of daily COVID-19 admissions in three hospitals of a CCG in the East Midlands region of England. We have aggregated daily admissions across three hospitals to obtain the total daily COVID-19 admissions time series that is illustrated in Fig. 1. There are three peaks in the time series corresponding to three observed waves. Therefore, the series contains multiple short piecewise trends. The time series does not show any day-ofweek seasonality and there is no difference in hospital admission between working days and weekends. However, there are significant autocorrelation coefficient, as shown in Fig. 2. This indicates that the number of admissions in the future is associated with its past values.
It should be noted that the number of admissions is very low for many days; in fact, there are many days without any COVID-19 patient admitted to the hospitals. This adds another level of complexity to forecast daily admissions at the local level, compared with national or global levels.

Potential predictors
Various predictors are used to forecast future COVID-19 daily admissions, which are described below.
• Confirmed positive cases The number of daily COVID-19 confirmed cases is extracted from the Public Health UK 2 .D a t a are obtained for age grouped, and confirmed cases are calculated for age band over and under 70. We consider total positive cases and total cases per age group age ≥70 and age < 70.
• Phone call data (NHS111 call) The number of daily phone calls received at the National Health Service 111 (NHS 111) corresponding to the CCG is substantial. Data are publicly available and extracted from NHS Digital 3 . Data are obtained for different age band, and they are aggregated to calls from over (age ≥70) and under 70 (age < 70). The threshold of age ≥70 for NHS111 call and positive cases is chosen based on the data granularity provided for age groups and also recommendations from the literature (Bonanad et al., 2020;Sattar et al., 2020) • Human mobility We use publicly-available data on human mobility from Google and Apple. Both data sets contain aggregated and anonymized information on users' movement on their online platform. Google data show the percentage change in the length of time people spend in different types of locations (i.e. residential, retail, transit, grocery and workplace). Apple mobility data indicate the change in volume of people driving, walking or taking public transit in their communities.
The mobility data provided by Google and Apple describe the change in people's behaviour after the implementation of social distancing measures with respect to a baseline day. The baseline day is the median value of each measure from the 5-week period 3 January-6 February 2020 for Google and 13 January 2020 for Apple. The provided data are normalized using pre-pandemic data to adjust for differences in population and mobility levels across the area considered in this study. Detailed definitions of human mobility measures are freely accessible for Google 4 and Apple 5 .
This data set contains the total number of daily tests conducted including (rapid) lateral flow device and polymerase chain reaction tests. It includes confirmed positive, negative or void COVID-19 virus test results. The data set has been downloaded from Public Health England Coronavirus website. 6

• Percentage of population vaccinated
We also use the time series of percentage of population from the region receiving the first and the second dose of COVID-19 vaccines. The data set is extracted from the UK government website 7 . 2 https://coronavirus.data.gov.uk/downloads/demographic/cases/specimen_date-latest.csv 3 https://digital.nhs.uk/data-and-information/publications/statistical/mi-potential-covid-19-symptoms-reported-through-nhspathways-and-111-online/latest 4 https://www.google.com/covid19/mobility/ 5 https://aws.amazon.com/marketplace/pp/prodview-ucdg4nudicxzi#offers 6 https://coronavirus.data.gov.uk/details/testing 7 https://coronavirus.data.gov.uk/details/download 8 B. ROSTAMI-TABAR ET AL.  Figure 3 indicates the correlation between daily COVID-19 admissions and potential predictors. The number of daily admissions on any particular day might be associated with various predictors described in section 3.2 and their past values/lags. The sample cross-correlation function (CCF) illustrated in Fig. 3 can be used to identify lags of predictors with significant autocorrelation coefficients. Each dot indicates the linear association between the time series of daily COVID-19 admissions and one lagged predictor that ranges from minus one to plus one. The Y-axis shows different predictors, and X-axis the number of lags. For example, consider Google transit and lag =−1, the CCF value shows the correlation between the COVID-19 admissions on day t and the Google transit percentage change on day t − 1. If lag = 0, we measure the correlation on day t. Figure 4 presents the correlation between potential predictors at the same period (i.e. lag = 0). It is clear that some predictors are highly correlated and provide similar information. This should be considered when selecting predictors to forecast daily COVID-19 admissions to avoid multicollinearity.

Relationship between daily admissions and potential predictors
In addition to investigating the linear association between predictors and admissions illustrated in Figs 2-4, we also explore the potential non-linear association between these variables. To that end, we looked at the scatterplot of each predictor versus admissions at the same period t (i.e. lag = 0), which is visualized in Fig. 5. We observe that the association between some predictors such as percentage of population vaccinated, total number of test conducted, Apple mobility-transit and hospital admission is not necessarily linear. We have also investigated the association between lagged predictors [e.g. t − 1(lag1),t − 2(lag2),...,t − 21(lag21)] and COVID-19 admissions at time t, and the shape of the plots and strength of relationship does not seem to be materially different and are omitted.
Based on the preliminary analysis of the collected data, we should consider a toolbox of forecasting models that can take into account the following features: • Multiple predictors, their lags and their linear/non-linear association with daily COVID-19 admissions.
We note that the preliminary data analysis is based on the in-sample time series of admissions and all predictors. We consider several forecasting models that can capture the systematic information outlined above.

Forecasting models
In this section, we first present the constructed model to forecast daily admissions, followed by a brief description of benchmark methods.

QRFs
In this study, we employ a QRF, as proposed by Meinshausen (2006), to generate probabilistic forecasts of COVID-19 daily admissions. QRF are an extension of random forests that were proposed by Breiman (2001). In random forests, each tree is grown using a bagged version of the training data (i.e. selecting a random subset of the data with replacement), and a random subset of features are employed for split point selection at each node. Random forests comprise an ensemble of trees, such that the final prediction for a new test feature is the averaged response of all trees.
We use regression trees as they have been shown to be accurate in generating forecasts for a range of applications, including predicting the risk of emergency admission using electronic health records 10 B. ROSTAMI-TABAR ET AL.  (Rahimian et al., 2018), forecasting patient waiting times in an emergency department (Arora et al., 2020), real-time prediction of patient readiness for discharge (Bishop et al., 2021) and predicting the risk of hospital admission (Hong et al., 2018). In the context of this work, an attractive feature of QRF is that the forecasts from this method are based solely on realizations of the historical data, which helps ensure that the predictions of admissions are positive and integer count values. This attribute of QRF is particularly useful as it helps avoid the issue of truncating the distribution values below zero and rounding of fractional prediction values. Although random forests have been commonly used for approximating the conditional mean of the target variable, QRF allow for estimating the full conditional distribution using a non-linear and non-parametric modelling framework. Using QRF, our main objective is to model the total number of admissions for n days, y ={ y 1 ,y 2 ,...,y n } (target vector size n × 1), as a function of a m features X ={ x 1 ,x 2 ,...,x m (feature matrix size n × m). Note that y i denotes the number of admissions for day i, which are modelled using a m-dimensional feature vector x i ={x i1 ,x i2 ,...,x im }, where x i ∈ R m . Specifically, we aim to train a model , with model parameters denoted θ , to estimate the conditional distribution of admissions for day i as, Fˆ(y i |x i ) = (x i ,θ ). Using regression trees, for a given test feature vector say x test that belongs to a leaf node l(x test , ), we calculate weight associated with a tree as: Using the above weights, the forecast from the single tree is obtained as a weighted average of the target variable i.e. n i=1 w i X test , ψ × y i . To obtain forecasts using an ensemble of regression trees, the weights are averaged over all K trees ( 1 , 2 ,..., K )asw i (X test ) = 1/K × K j=1 w i (X test , ψ j )). The conditional distribution for admissions is then calculated asF(y i |X = X test ) = n i=1 w i (X new )I(y i < y). For further details on QRF, please refer to Meinshausen (2006). In this study, we employed 25 features (m = 25) in the model to generate a one-step-ahead forecast for admissions (t + 1). We used lagged admission values up to a week before [t,( t-1),..., (t-6)], while for all other variables, we considered the most recent lagged value (t-1). Our feature matrix is rich, as it includes a range of different predictors including, lagged admissions, confirmed COVID-19 cases, NHS 111 calls, along with human mobility data obtained from Google and Apple, as discussed earlier in Section 3. The names and brief description of the 25 features used in the modelling are provided in Table 2.
To generate forecasts of the daily admissions for horizons (h) varying from 1 day to 7 days ahead, we construct a horizon-specific QRF model. For forecasting admissions, for say, the next day (horizon h = 1), we estimate a QRF model by minimizing the 1-day-ahead error on the out-of-bag (OOB) samples using only the training data. For a given horizon (or model), we estimate the QRF hyperparameters, namely the: (1) number of trees, (2) minimum leaf size and (3) number of features used for split-point selection. The hyperparameter tuning was based on the minimization of the mean absolute deviation calculated using the OOB observations. Each bagged tree makes use of around two-thirds of the randomly selected observations used for constructing the tree, whereas the remaining one-third of the observations not used for constructing the tree is referred to as the OOB data that were used only for hyperparameter tuning. Having identified the optimal hyperparameters, the QRF model was trained using the entire training data.
This modelling strategy results in a total of seven QRF models i.e. one model for each horizon. Note that the hyperparameters were tuned separately for each QRF model. Once estimated, the hyperparameter values were kept fixed and used to retrain each of the seven QRF models using the entire training data, and these models were then subsequently used to generate forecasts for the out-of-sample data. Note that the QRF model estimated by minimizing h-step-ahead error on the OOB observations was used to generate h-step-ahead forecasts for the complete out-of-sample period. An attractive feature of this modelling scheme is that it negates the need to generate forecasts of the independent variables. It can be argued that the shortcomings of this modelling approach, which relies on building horizonspecific models, are that the forecasts for different horizons are treated as being independent, and this approach is computationally expensive. Figure 6 shows a simple example of probabilistic forecasts of 1-day-ahead COVID-19 admissions for the complete out-of-sample period, generated using QRF. In the figure, the actual admission counts Percentage of the population vaccinated with first dose yesterday Google residential lag1 Google mobility index for residential yesterday Google transit lag1 Google mobility index for transit yesterday Apple walking lag1 Apple mobility index for walking yesterday Percentage second dose lag1 Percentage of the population vaccinated with second dose yesterday NHS111 7typlus lag1 NHS111 calls received for the age group above 70 yesterday are denoted by black dots, the median of the probabilistic forecasts is issued as the point forecast, whereas the shaded regions correspond to 50 and 90% prediction intervals of the forecast distribution. The probabilistic forecasting perspective attempts to quantify the uncertainty of the forecast admissions. Risky decisions tend to be concentrated on the extremes: although the unexpectedly high hospital admission creates crowding (Au et al., 2009) and inefficiency, the unexpectedly low admissions may create a waste of resources. Probabilistic forecasts are thus tools to support decisionmakers/planners not only with most likely outcomes but also with these low probability, high-cost situations (Rostami-Tabar & Ziel, 2022). Different hospital managements may have different attitudes to risk, depending upon the availability of resources, and thus it becomes imperative to consider the whole distribution while making decisions. The probabilistic forecasts can and should be seen as a core ingredient of any risk management practice within healthcare (Rostami-Tabar et al., 2023).

Benchmarks
We use various benchmark methods to compare their forecast accuracy with employed QRF model. Exponential smoothing and ARIMA models are among the most widely used approaches in time series  We also consider time series LR, ARIMAX and Prophet to account for the association between predictors and daily admissions.
• Time Series LR Time series LR is widely used in practice and is a common benchmark for even complex forecasting tasks. The LR algorithm learns to construct a weighted sum of its input features. We use TSLM() function in fable package (O'Hara-Wild et al., 2020). The LR model uses trend(), season() and all predictors described in Table 2. The function trend() accounts for linear trend and the season() creates dummy variables for the day of week. We denote Y t as the number of patients admitted to the hospital on a day t. The selected predictors to be included in the LR model are described in Table 2 are presented using the following notations: In building the LR model, we first create a LR model to forecast one step ahead. 14 B. ROSTAMI-TABAR ET AL.
To select the predictors to include in the model, a forward stepwise selection based on Akaike Information Criterion Corrected (AICc) is used. In choosing predictors, we account for linear association with admissions and the existing collinairity between predictors. These decisions are informed by the data analysis presented in Figs 3-5. To forecast the admissions for h-step-ahead, we use the lags of predictors chosen for one-step-ahead forecast rather than forecasting the predictors. Therefore, we always use the actual information until day t to forecast the future. It is interesting to note that we also tested the approach of forecasting the predictors, which led to worse behaviour.
• ARIMAX Although ARIMA only uses time series information, an ARIMAX model can be constructed by adding predictors to the ARIMA model. We use the same information as ARIMA model plus all predictors used in the LR model, except the hospital admission lagged values. These are removed as ARIMA models account for autoregressive lagged values. We employ the ARIMA() function in fable package in R to train and estimate the parameters of ARIMAX models and forecast daily COVID-19 admissions (O'Hara-Wild et al., 2020).

• Prophet
Prophet is a forecasting procedure created by Facebook. The method shown to be accurate for daily time series, especially on daily data seen on Facebook. The model accounts for multiple seasonality, piecewise trend and any number of predictors. The idea behind piecewise trend is that if the time series follows different linear trends over different regions of the series, then it could be modelled as connected linear pieces (Taylor & Letham, 2018).
The response variable on a day t can be written as the sum of three functions of time plus an error term: where g(t) represents a piecewise linear trend, s(t) indicates various types of seasonality, h(t) captures the holiday effects, Xreg(t) accounts for predictors used in the LR model and ε t is the error term. We should note that given the short length of the daily time series admissions, modelling the effect of holiday effects is not feasible and this has not been considered in this study.
Prophet is an automated method, which makes it easy to use in many organizations. It is robust to missing data and shifts in the trend, and typically handles outliers well (Taylor & Letham, 2018). We use the fable interface for prophet() in the fable.prophet package in R (O'Hara-Wild, 2020).
The experiment could benefit from including other benchmarks such as Theta (Assimakopoulos & Nikolopoulos, 2000) to reduce the running time or in cases where more data are available, it would be worth investigating methods based on artificial neural networks (ANNs), including deep learning.

Experimental setup
We first split the data set into training (80%) and test (20%) sets. We apply the forecasting models on the training set and evaluate the forecast performance of the proposed QRF approach and benchmarks on the test set. Evaluation is conducted using a rolling origin forecasting study on the test set with re-estimation. Forecasting horizon is chosen up to h = 7 which corresponds to 7 days that allows decision makers to use forecast to inform decisions on resource allocation and planning in the health service. It should be noted that, given the nature of the daily time series of COVID-19 admissions where a considerable number observations are zero or low volume, the benchmark methods may generate negative forecast Downloaded from https://academic.oup.com/imaman/advance-article/doi/10.1093/imaman/dpad009/7161870 by guest on 14 June 2023 values. Therefore, we first transform the time series using sqrt(), then train and produce the forecasts using all benchmarks. Next, the generated forecasts are transformed back to the original scale (rounded to nearest integer) and the truncation at zero is applied to avoid producing negative forecasts.

Forecast accuracy evaluation
In this study, we consider point forecast accuracy metrics, as well as a metric to evaluate the accuracy of the forecast distribution. We evaluate the point forecasts using RMSE and MAE. MAE is appropriate for the median and RMSE for the mean forecast.
For evaluating the forecasts of the discrete probability distributions, we employ the ranked probability score (RPS), which was introduced by Epstein (1969). The RPS quantifies the difference between the actual observations and the cumulative distribution of forecasts, such that a smaller score is better.
For both point and probabilistic accuracy metrics, we first report the overall performance of each model by averaging the score of each measure computed for all rolling origins and all horizons. We also report the accuracy measure for each model and each forecasting horizon. To that end, we aggregate the score of different rolling origins for a given horizon.

Empirical results and discussion
In this section, we first discuss the important predictors of the daily COVID-19 admissions derived from QRF, and report the forecasting performance of QRF model and different benchmarks for point and probabilistic forecasts.
In the context of this study, identifying the most salient features can help provide valuable insights into the key drivers of admission counts, which can potentially be of practical benefit to hospitals and policymakers.
In Fig. 7, we plot the feature importance scores for the most salient features, whereby the ranking of features was obtained using random forests (for horizon equal to 1 day). The feature rankings were calculated using only the OOB observations. The feature rankings were obtained using the permutation feature importance measurement; for details, see Breiman (2001). Specifically, to measure the influence of a feature in predicting the response, we calculate the change in model error over the OOB observations when a feature is permuted. Note that permuting the more salient features will result in a higher increase in the prediction error. We considered features that could potentially improve the forecast accuracy without making the feature matrix high dimension. We considered multiple lags (more than 1 day) for other exogenous features, but it did not result in any noticeable improvement in the modelling. It could be surmised that some of the features (such as vaccination status, number of cases etc.) tend to vary smoothly and this variation could potentially be captured in the hospital admissions data over longer lead times. Interestingly, to predict admission count for a given day, lagged admission counts turn out to be strong predictors, which is broadly in keeping with the autocorrelation and partial autocorrelation in the admissions time series (Fig. 2).
Using the feature rankings obtained from RF for each horizon, we generated out-of-sample forecasts using only the 5, 10, 15 and 20 most salient features and compared their performance with a model that employed all 25 features. Specifically, the QRF model performance for a different number of most salient features was investigated for each horizon. Using only 5 or 10 most salient features resulted in the worst performance overall across all horizons. Moreover, 20 most salient features provided a slightly better performance for 1-and 2-day-ahead forecasts than using all 25 features. For longer horizons 16 B. ROSTAMI-TABAR ET AL. (3-to 7-day-ahead forecasts), employing all 25 features during the modelling gave the most accurate performance.
Features relating to the total number of positive COVID-19 cases and the number of daily tests performed were also salient, as one might expect. For mobility, features corresponding to Google grocery and Apple driving were salient; however, the mobility features ranked much lower compared with lagged admission counts. Based on these results, it could be argued that the temporal correlations in the admissions count time series can be accommodated using lagged admissions count at a local hospital level. Table 3 presents an overall forecasting performance of all models for both point forecast and forecast distribution. The result indicates that QRF model outperforms benchmarks for all forecast accuracy measures. Next, we look at the performance across different forecasting horizons.
In Fig. 8(a and b), we plot the RMSE and MAE for all models considered in this study for forecast horizons ranging from 1 day to 7 days ahead. As discussed earlier, we generate forecasts using a rolling forecast origin. Thus, the error metrics for an h-step-ahead forecasts are calculated using n-h + 1t e s t observations (where n denotes the total number of observations in the test data). From Fig. 8(a and b), it is evident that QRF and ETS are the most competitive models, with QRF being the best performing model by far, except for the horizon of 1 day when the difference between QRF and EST and LR is negligible.
The model rankings were consistent when the evaluation was based on the RPS, see Fig. 8(c).W e used the RPS to quantify the accuracy of the forecasts of the discrete probability distributions, as it is   strictly proper and sensitive to distance [for details, see (Murphy, 1969(Murphy, , 1970]. The performance of the QRF was considerably better than all the other models used in this study. Compared with the other methods, the superior performance of the QRF could potentially be attributed to the following facts: (1) QRF appropriately treated the admissions data as being count and not continuous. QRF generated forecasts that were non-negative integer counts (as the model predictions were realizations of admission counts from the historical data). Since actual admission counts cannot be negative or fractions, admission forecasts from all methods except QRF had to be truncated to exclude any negative predictions, whereas the positive fractional predictions had to be rounded to the nearest integer.
(2) QRF provided a non-parametric modelling approach and did not rely on any assumptions for the underlying residual distribution, and crucially, allowed for a non-linear relation between the predictors and the response, as evident from Fig. 5. (3) We used horizon-specific modelling approach for the QRF. Specifically, we estimated seven different horizon-specific QRF models, whereby the hyperparameters of each model were tuned separately using the training data. Whereas for the time series models, such as ARIMA, ARIMAX, ETS, we used the traditional approach of estimating a single model to generate forecasts across multiple horizons. However, the model parameters are re-estimated for each rolling origin.
The proposed modelling framework could be used in practice as the predictors are publicly accessible, and the models employed in this study can generate predictions in real time. We use two broad categories of predictors: (i) lagged values of daily COVID-19 admissions, which are accessible via electronic health record systems, and (ii) population level predictors, such as: confirmed cases of COVID-19, vaccination status, call center data, mobility data, etc., all of which are freely available online. In terms of computational speed, having identified the optimal hyperparameters, the total time to train a QRF model and generate forecasts for the next day was 0.47 s (in Matlab 2021b ©Mathworks, using a 2.2 GHz processor with 32 GB RAM). It was also worth noting that the policy and operations of the hospitals considered in this study are determined by the NHSs, which is the publicly funded healthcare system in England. Thus, the methods developed for the CDG considered in this study could potentially be adapted for other groups in England that operate under the operational framework of the NHS.

Conclusion
An accurate estimation of COVID-19 patients admitted to hospitals will help hospitals and regional authorities to allocate resources more effectively. Although many research studies have considered forecasting COVID-19 hospital admissions, less attention has been given to admissions at local levels. The accurate forecast of COVID-19 admissions at the local level is crucial but it presents various forecasting challenges, which might not be suited for many time series forecasting methods.
The daily time series of COVID-19 admissions has the following attributes which make the forecasting challenging: (i) admissions are count data, which require the forecasts to be integer as well; and (ii) they are low and contain many zero values which may result in negative forecasts when using time series forecasting methods. Moreover, the main focus of the research has been on modelling the full probability distribution of COVID-19 admissions and not just a point estimate.
We use the total daily time series of COVID-19 admissions in a Commissioning Group in England to train the models and evaluate the forecast accuracy. The performance of the proposed QRF model is compared with ETS, ARIMA, LR, ARIMAX and Prophet using two point forecast error measures (i.e. RMSE, MAE) and a probabilistic forecasting accuracy measures (RPS). The methodology developed in this study is based on the aggregated data for a subset of hospitals. However, we believe that it could be adopted for each individual hospital given some common features such as trends, low volume observations and presence of zeros.
In this study, we built a QRF model which guarantees the generation of strictly positive forecasts. A limitation of this study is that we only have daily data for slightly more than a year. In hospitals where Downloaded from https://academic.oup.com/imaman/advance-article/doi/10.1093/imaman/dpad009/7161870 by guest on 14 June 2023 feature-rich data may be available for longer, an interesting line of future work would be to employ ANNs, including deep learning, for probabilistic forecasting of admission counts.
We should note that given the uncertainty around the future COVID-19 patients admitted to hospitals, it is crucial to generate and provide a probabilistic forecast, which allows a decision-maker to assess the potential risk of making a decision, hence making a more informed decision. Therefore, in addition to point forecasts, we also provide the distribution of the future daily COVID-19 admissions. Moreover, we investigate the usefulness of various predictors, including COVID-19 phone call data, confirmed cases, number of virus tests performed, percentage of the population vaccinated and human mobility data from Google and Apple, in forecasting daily COVID-19 admissions.
Results indicate that the proposed QRF model outperforms the benchmarks in terms of both point forecast and predictive distribution accuracy. We observe that the accuracy of the QRF model is consistent across different horizons, and the gain in forecast accuracy becomes more prominent as the forecasting horizon increases. Our findings indicate that lagged COVID-19 admissions, number of positive COVID-19 cases, the number of daily tests performed, Google grocery and Apple driving are among the most strong predictors of COVID-19 admissions for a given day.
The generalizability of our findings could be considered in (a) the context of modelling demand under unprecedented conditions, such as the pandemic, and (b) using population-level features to improve forecast accuracy at a local hospital level. Since before Covid-19, it has been observed that the emergence of novel pandemic agents seems unpredictable (Morse et al., 2012). Given the increased cross-species viral transmission risk (Carlson et al., 2022), this might not be the last pandemic. Although the generalizability of our findings might be difficult to assess, but the overall conditions in which the research framework was designed and implemented can serve as a reference for similar future events. Forecasting Covid-19 admissions at the hospital level is a challenging task, and our results can provide insights for similar situations, where the demand for hospital resources due to a highly transmissible virus was reduced with an increase in the level of vaccination status in the population. Another important insight was related to incorporating the mobility data, which captures how people move around and interact, which can result in forecast accuracy improvement. This suggests that population-level features can be essential predictors in emergency situations, as they can capture the spread of an infectious disease or the movement of people during a natural disaster. In addition, this study highlights the importance of identifying and using leading predictors, such as calls to emergency telephone lines, the rate of positive cases and the population vaccinated, to improve prediction accuracy. Another point worth considering is the handling of low-volume data, which can be an issue in emergency situations. We note that traditional time series methods may not be effective when dealing with count data, especially when values are near zero. In such cases, rounding or truncation may be necessary to use conventional time series methods. However, we found that the Quantile Random Forest (QRF) method performed better with count data, suggesting that it may be a useful approach for emergency situations where low-volume data are a concern. In summary, the paper provides several insights that can be applied to other emergency situations. By incorporating population-level features, identifying appropriate leading predictors and using appropriate modelling techniques, it is possible to develop accurate forecasts that can help emergency personnel better prepare for and respond to emergencies. On a smaller note, it is interesting to observe how a dramatic event such as Covid is quickly forgotten once it is no longer in the spotlight. In emergencies such as a pandemic, our difficulty or incapacity to learn from the past means that we have an increased risk in the future of not having the right pieces in place to confront a problem. The risks of the acceleration of such forgetfulness should be carefully assessed, especially under the climate crisis.
Given the importance of forecasting patients admitted to hospital in an emergency situations such a pandemic, research into any of the following areas would appear to be merited: • We believe that different predictors might have varying importance during different waves of the pandemic. Therefore, investigating what predictors should be used in the forecasting model as the pandemic evolves is an important question, and developing a flexible forecasting model that can adapt to the change would be valuable.
• Similarly, it may not be prudent to rely on one model to perform well across all horizons or waves. Forecast combinations could prove beneficial as they have in other areas of forecasting (e.g. in the M4 forecasting competition, all 6 top performing methods employed forecast combinations of some kind (Makridakis et al., 2018)]. When it comes to COVID-19-related research,  demonstrated the efficacy of combining interval and distributional forecasts for modeling the cumulative mortality due to COVID-19. It would be worth investigating the accuracy of combined forecasts from different models for each horizon (and/or or wave).
• Forecasting the risk score of admissions i.e. predicting the probability of a threshold being exceeded, instead of predicting the exact number for a given day, is an interesting avenue. This may provide a more straightforward tool for a decision-maker to deal with uncertainty.
• Given the nature of the admission time series at the local level, employing a temporal aggregation approach might improve forecast accuracy (Rostami-Tabar et al., 2013, 2014. • Finally, although this study focuses on the local level, it is important to investigate the usefulness of the predictors discussed in this paper to forecast COVID-19 admissions at more aggregate (e.g. national) levels.

Data availability statement
Data used in this study are accessible via the GitHub Repository: https://github.com/bahmanrostamitabar/ Covid19_hospital_admission_data.

Funding
There is no funding for this project.

Conflict of Interest
There is no conflict of interest.