-
PDF
- Split View
-
Views
-
Cite
Cite
Jacqueline D. Seufert, Andre Python, Christoph Weisser, Elías Cisneros, Krisztina Kis-Katos, Thomas Kneib, Mapping Ex Ante Risks of COVID-19 in Indonesia using a Bayesian Geostatistical Model on Airport Network Data, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 185, Issue 4, October 2022, Pages 2121–2155, https://doi.org/10.1111/rssa.12866
- Share Icon Share
Abstract
A rapid response to global infectious disease outbreaks is crucial to protect public health. Ex ante information on the spatial probability distribution of early infections can guide governments to better target protection efforts. We propose a two-stage statistical approach to spatially map the ex ante importation risk of COVID-19 and its uncertainty across Indonesia based on a minimal set of routinely available input data related to the Indonesian flight network, traffic and population data, and geographical information. In a first step, we use a generalised additive model to predict the ex ante COVID-19 risk for 78 domestic Indonesian airports based on data from a global model on the disease spread and covariates associated with Indonesian airport network flight data prior to the global COVID-19 outbreak. In a second step, we apply a Bayesian geostatistical model to propagate the estimated COVID-19 risk from the airports to all of Indonesia using freely available spatial covariates including traffic density, population and two spatial distance metrics. The results of our analysis are illustrated using exceedance probability surface maps, which provide policy-relevant information accounting for the uncertainty of the estimates on the location of areas at risk and those that might require further data collection.
1 Introduction
In December 2019, local hospitals in the Chinese city of Wuhan in Hubei province reported a surge of pneumonia cases, which was later confirmed to have been caused by the virus species severe acute respiratory syndrome-related coronavirus (SARS-CoV-2, CSG, 2020; Zhu et al., 2020). The disease associated with the newly discovered virus became known as COVID-19. By the end of December 2021, the virus has spread across the globe, infecting about 277 million people and leading to 5.4 million deaths (WHO, 2022).
Indonesia's population has been particularly affected by the virus, with an estimated total of about 4.3 million confirmed cases and 144,000 deaths on by the end of the year 2021 (WHO, 2022). The demographic characteristics and geographical configuration of Indonesia make it extremely vulnerable to the spread of contagious diseases. It is the largest archipelago in the world, has the fourth largest population and represents a major hub connecting East Asia, South Asia and Oceania. The impact of COVID-19 on the Indonesian public health has been exacerbated by a high prevalence of comorbidities and clinical risk factors. About 39% of its population regularly consumes tobacco, and the country counts the seventh largest number of diabetic cases (10.7 million) as well as the third largest number (29.1 million) of prediabetic cases globally (Brake et al., 2020; Del Rio & Malani, 2020).
The estimation of national and local infection rates in Indonesia remains challenging for two main reasons. First, the country lacks of sufficient testing capacities (Rachman et al., 2020). While health centres are accessible in the metropolitan area of Jakarta and in the province of Bali, their spatial coverage is sparse and they are poorly accessible in remote provinces such as Papua and West Papua (Rachman et al., 2020). As of 7 December 2020, Indonesia has tested on average about 0.13 individuals per 1000 inhabitants, which is a relatively low testing rate in comparison with neighbouring countries, such as the Philippines (0.3), Malaysia (0.54) and Singapore (5.15) (Hasell et al., 2020). Second, asymptomatic and mild cases have been systematically excluded from testing (van Empel et al., 2020). Since about 80% of COVID-19 cases are asymptomatic or exhibit mild symptoms (Vital Surveillances, 2020), a large number of infected individuals have not been detected, which partly explains the discrepancy between the reported and real numbers of COVID-19 cases in Indonesia (van Empel et al., 2020).
Global studies have shown that population density is a key factor that determines human-to-human transmission of infectious diseases (Hughes, 2020; Kucharski et al., 2020). The COVID-19 transmission among humans tends to occur indoors, especially during cold weather periods (Ahlawat et al., 2020). COVID-19 initially spread from Wuhan to other regions through the worldwide air and sea traffic network (Python et al., 2022; Zhang et al., 2020). The role of global transport has also been observed previously in the spread of the severe acute respiratory syndrome (SARS), another coronavirus disease (Tatem et al., 2006). Due to modern transport systems' capacity to carry contagious diseases over long distances in a short time, local strategies are often rendered ineffective. In a global study, Gunthe and Patra (2020) found a strong positive correlation between importation risk, estimated from global air traffic and the official reported numbers of COVID-19 cases imported from abroad. Maier and Brockmann (2020) assessed the risk for major airports in the early stages of the COVID-19 pandemic and provide insights into the most likely transmission routes from a location in the impacted region. This model is the foundation of the first stage of the approach proposed in this paper.
For the case of Indonesia, quantitative research has mainly focused on examining the role of meteorological factors on the spread of COVID-19 in the metropolitan area of Jakarta, but has not yet paid attention to socio-economic drivers of the disease. Tosepu et al. (2020) documented a weak negative correlation between temperature and virus infections in Jakarta. Asyary and Veruswati (2020) showed a weak positive correlation between sunlight exposure and COVID-19 recovery. Rendana (2020) found a low negative correlation between wind speed and early COVID-19 infections.
We aim to provide a method to assess the ex ante risk of COVID-19 at a fine spatial scale across Indonesia which can be applied when the availability of data is limited (or biased). Spatial regression models based on current surveillance and prevalence survey data would be an obvious choice for this; the sparse spatial coverage and the selection bias associated with the systematic exclusion of asymptomatic cases would reduce the reliability of such models based exclusively on official COVID-19 data. As an alternative or complementary approach, we suggest a two-step modelling procedure that relies on a minimal set of routinely available input data related to the Indonesian flight network, traffic and population data, and geographical information allowing us to assess the ex ante spatial importation risk (i.e. the risk before the virus started to spread in Indonesia) as well as its uncertainty.
The general workflow of our approach is summarised in Figure 1. In the first step, we utilise the importation risk data for all international airports of Indonesia available from the global model on disease spread by Maier and Brockmann (2020), to build an initial risk assessment for all flight connections associated with these airports based on information on flight connections and flight volume. We then complement these data with various characteristics of the Indonesian domestic flight network based on flight data from December 2019 to January 2020, which is prior to the global COVID-19 outbreak (aviationstack, 2020). The data cover specifically a period from 12 January 2019 to 1 July 2020. The data set contains all available flight connections to, from and between Indonesian airports during this period. The data are not publicly available but can be purchased from specialised providers. With this information, we estimate an additive model with the connection-specific risk as response variable and the network characteristics as potential determinants. This Airport Network Model finally serves as the basis for predicting the risk for all flight connections from which, in turn, the importation risk for all 78 Indonesian airports can be determined. As the output of our first step, we obtain localised importation risks at each domestic airport.
![Workflow for predicting the ex ante COVID-19 infection risk in Indonesia at a fine spatial scale. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-1.jpeg?Expires=1748057090&Signature=aVjuZ5RY9MxxoQP8HNNfjg-jJtRMat~dltE1KkQAHKyms2P22YiTK6T3xs2LyEURzk3oSUjG7nPOP~ahYOmzbNfCrPaaQqZ9juSJpuSlonOL-LXFfpqNV8gCHLJ4kqsssr0E1qoKZO9ck~ln43fLSe~n5E-bSMiqUrAog9g7NYs9XAsEdxA-tvrrWPtFl2iVnu6tBgQ-4ppCUyb0ERUEC5jEwKAVbuIdBmbC445e7JwVaf7KTlNwiTLsRgcweQS4OERofEpk361pAoZOp~OC7CBs5IUskwC3b00jcyEvjjx43lPFhUIS-xzCxk3HOE93veUzShOPO6aSUhgH3MImPQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Workflow for predicting the ex ante COVID-19 infection risk in Indonesia at a fine spatial scale. [Colour figure can be viewed at https://academic.oup.com/]
In the second step, we aim at propagating the ex ante risk from the domestic airports to all of Indonesia at a resolution. For this, we implement a Bayesian geostatistical model (the Spatial Propagation Model) using a set of relevant spatial covariates that describe the local traffic and population density and spatial proximity to the airports and highly populated islands. The Spatial Propagation Model is validated through a leave-one-out cross-validation approach. Furthermore, the predictions are compared with official COVID-19 infections data between March and May 2020 (Kawal COVID, 2021).
The methodology suggested in this paper can help policymakers design and implement preventative measures in the presence of data sparsity and selection bias in the early stages of an epidemic disease outbreak. By assessing the ex ante distribution of COVID-19 risk while accounting for its uncertainty, it not only provides policymakers with early information on the potential spatial spread of the disease but also highlights areas where early testing may be the most helpful. Hence, it can help reduce the socioeconomic damages of potential future outbreaks of COVID-19 variants or other contagious diseases.
This paper is organised as follows: Section 2 implements the prediction of the ex ante importation risk at the 78 Indonesian airports while Section 3 deals with the spatial propagation of the ex ante risk to all of Indonesia. Section 4 discusses the findings and limitations of our study and Section 5 provides a summary of the findings and suggests potential future research paths that could extend our work.
2 Airport Network Model: Predicting The Ex Ante Importation Risk at Airports
2.1 General strategy
Before starting to discuss the ingredients of the Airport Network Model in detail, we summarise the general strategy visualised in Figure 2. Our aim is to determine ex ante importation risks for all 78 Indonesian airports forming the flight network represented in Figure 2a where the vertices in the outer circle represent the airports while the edges connecting the vertices represent the 438 flight connections between these airports. Ex ante COVID-19 importation risks are available for the seven international Indonesian airports (green vertices in Figure 2a) from Maier and Brockmann (2020) (data freely available here: http://rocs.hu-berlin.de/corona/docs/model/importrisk). A total of 132 flights are associated with these international airports (green edges (training) in Figure 2b) such that a risk weight can be assigned to the corresponding edges in the network. To achieve this, we proceed as illustrated in Figure 2c, we calculate the total number of outgoing flights for each airport, and assign a corresponding risk weight to each outgoing edge (i.e. flight connection) for the seven international airports. Assume, for example that airport 1 (green vertex) has a COVID-19 risk equal to 0.8 and the following number of departure flights to its connected airports (with a total number of departure flights of 10): 4 (edge 1-2), 3 (edge 1-3), 2 (edge 1-4) and 1 (edge 1-5). Hence, the relative risk weights are assigned to the connections (green edges) as follows: 0.8 × 4/10 (edge 1-2), 0.8 × 3/10 (edge 1-3), 0.8 × 2/10 (edge 1-4) and 0.8 × 1/10 (edge 1-5).
![Airport Network Model workflow. (a) Initial network with the seven international airports highlighted in green, (b) Training network of flights associated with these seven airports, (c) Connection risk assignment, (d) All domestic flight connections, (e) Predicted risk for these connections and (f) Predicted risk in all airports. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-2.jpeg?Expires=1748057090&Signature=VZyRo9QXTwb7mx5sXU6XRD02Jo-Yc8qQzrVU7vuZrRh49ZS2V3Ber0TzwBrpY3FxiPEjt3hIIHUACl-fc6yPfwh-br7fAMTNAiZ~J7pYcsMaMuvj49wpNaUpTYqQEbIhJvkPUgXqfLicTNpmmoxKRMNjAaC1qC8jTg-nwWpdm0NT6jmzwYowwbYUYU32Myg7~0AL5zhTihhyeeeX3kycd4cKwXK1X-lw2O~8tfxV~5kzciZXUgJuA-py8PYyof7SjafNTnddDVPHD79G07GviWA4yH9wigxFLlb6pbc0oXf4mioHYfIiXWvjRos-hMQ5phYtEsClc7Bsdb3R7MtrKw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Airport Network Model workflow. (a) Initial network with the seven international airports highlighted in green, (b) Training network of flights associated with these seven airports, (c) Connection risk assignment, (d) All domestic flight connections, (e) Predicted risk for these connections and (f) Predicted risk in all airports. [Colour figure can be viewed at https://academic.oup.com/]
Using the risk value attributed to each of the 132 connections associated with the seven airports, we implement a generalised additive model (GAM, cf. Section 2.3 below for details) to predict the COVID-19 risk for the 306 remaining connections (red edges in Figure 2d) using various network characteristics as covariates. Based on the predicted risk weights that are obtained for all 438 connections (green edges in Figure 2e), the final step of the Airport Network Model consists of assigning COVID-19 risks to the 78 airports, including those not in Maier and Brockmann (2020)'s data (71 red vertices in Figure 2e). We calculate the COVID-19 risk for each airport (green vertices, Figure 2 (f)), as a normalised strength defined as the weighted sum of all incident flight connections risk.
2.2 Network characteristics
The Airport Network Model predicts the ex ante COVID-19 risk for the 78 airports of the domestic Indonesian network based on various characteristics of this network. These network characteristics are calculated from flight connection data (available for a charge from aviationstack, 2020), which contain the locations of all Indonesian domestic airports along with their domestic flight frequencies for the period from 1 December 2019 to 7 January 2020. Most flights transit through Jakarta airport (CGK) and a few other large airports such as Bali (DPS), Sedati (SUB) and Makassar (UPG). Figure A1 illustrates the distribution of the frequency of flights.
From the primary data sources (aviationstack, 2020; Maier & Brockmann, 2020), we extract two types of metrics: flight (or connection-specific) metrics that describe airport connectivity, flight frequency, volume and geographical distances between each respective Indonesian airport, and network metrics that describe the role of each airport within the structure of the Indonesian flight network.
The network characteristics considered in the following can be roughly grouped into metrics associated with vertices or nodes (i.e. airports) and edges (i.e. flight connections). Vertex metrics include variables associated with the centrality of each airport, such as incoming and outcoming strength, eigenvector centrality, betweenness centrality and transitivity. Edge metrics include measures of connectivity between airports, such as topological distance, similarity, connectivity and Euclidean distance. Furthermore, we consider the structure of the Indonesian flight network by including indices of traffic volume and the frequency of departure flights.
For all metrics discussed in the following, let A denote the adjacency matrix encoding the network structure of the N = 78 airports of the Indonesian flight network. It is an (N × N) matrix with one encoding a direct flight connection, and zero otherwise. Note that we are considering a directed network representation, where the row index in A refers to the departure airport while the column index represents the destination. An airport (vertex in the network) is considered incident on a flight connection (edge in the network), if it is the destination of that edge.
2.2.1 Vertex metrics
- 1.
Strength: We consider two measures of strength representing the total incoming traffic () and total outgoing traffic () for each airport v (Kolaczyk, 2009).
- 2.
Eigenvector centrality: The eigenvector centrality (, also called eigencentrality or prestige score) is defined as the largest eigenvalue of the adjacency matrix A of the network, representing a measure for the total influence of a vertex in a network (see Kolaczyk, 2009, for details). Compared to the strength, it does not only consider direct flight connections, but also indirect influences via the neighbours. In a flight network, airports connected with influential hubs have a higher eigenvector centrality than those linked with less connected hubs, since they may be exposed to a larger number of indirect connections.
- 3.
Transitivity: The transitivity (, also called clustering coefficient) is a measure for the number of links between the vertices within the local neighbourhood of a vertex relative to the number of possible links between them (see Wang et al., 2017, for details).
- 4.
Betweenness: The number of shortest paths between any pair of vertices that pass through a vertex v defines the betweenness centrality (, Freeman, 1978). This metric provides information on the mediating role of a vertex in a network. In our case study, a high value of is expected in Jakarta airport (CGK in Figure A1) since a large number of domestic flights in Indonesia go through this airport and it therefore has a central role in the Indonesian flight network.
2.2.2 Edge metrics
- 1.
Shortest path distance: The shortest path distance () is defined as the minimum number of edges between a pair of airports u and v (Kolaczyk, 2009, p. 17).
- 2.
Similarity: Similarity () combines information from the adjacency matrix with information on the degree (i.e. the number of incoming edges) of the vertices in a network, see Dice (1945). Therefore, this metric provides information on the number of mutual connections an airport v has with an airport u.
- 3.
Connectivity: The airport connectivity () between two vertices u and v represents the minimum number of vertices that have to be removed from a network to isolate airport u from airport v (White & Harary, 2001). Connectivity is useful to assess the cohesiveness of vertices within a network.
- 4.
In addition, we consider the Euclidean distance (in km) between each pair of airports (), the number of flight connections per route () and the proportion of departures per route ().
2.3 Additive model for the flight-specific risk weights
We use a GAM to estimate the ex ante COVID-19 risk weights for each domestic flight connection in Indonesia and subsequently predict the COVID-19 infection risk at each airport. Initial risk values for each flight connection l = {1, …, 132} associated with the seven main airports in Indonesia are derived from Maier and Brockmann (2020) (see Appendix A.2 for further details). We log-transform the initial risks due to data imbalances—a large number of observations have low risk values—such that a normal distribution can be reasonably assumed. Models based on a normal distribution exhibit a better predictive performance compared to a gamma distribution (see Appendix A.2.1 for detailed results). As potential covariates, we consider the metrics discussed in the previous section. Colizza et al. (2006) point out that flight metrics are unlikely to be linearly associated with airport risk. Furthermore, Zhang and Fu (2009) argue that a link between flight connectivity and risk is more likely to be non-linear. Accordingly, we consider smooth, non-linear effects of the vertex metrics via penalised splines (Wood, 2006) while restricting the edge metrics to linear effects.
To select the covariates, we carry out a cross-validation procedure and split our data into a training (70%) and a testing (30%) set. From 4139 models, we pre-select 20 candidates based on the Akaike information criterion (AIC) with a stepwise procedure. Based on an out-of-sample procedure, we compute the root mean squared error (RMSE) and mean average error (MAE) metrics for each pre-selected candidate (further details in Appendix A.2.2).
2.4 Results
We report the results of three models with the best predictive performance scores. The Parametric Model includes all covariates while assuming linear effects for all of them. The Full Model includes all investigated covariates (with linear effects for the vertex metrics and smooth, non-linear effects for the edge metrics). The Stepwise Model includes three edge metrics and four vertex metric covariates, leading to the model equation:
where represents the overall intercept, are parametric effects of the vertex metrics betweeness BET, transitivity TRA, inward strength IN.STR, and outward strength OUT.STR, and , and are non-linear effects of the edge metrics connectivity CONNECT, number of flights NB.FLY and proportion of departures DEP.FLY.
Table 1 provides a summary of the results for our three model specifications. The results show that the Full Model and Stepwise Model have both a higher predictive performance compared to the Parametric Model. Their RMSE and MAE scores are similar. Following the parsimony principle, we choose the Stepwise Model since it exhibits a lower degree of complexity for a similar predictive performance.
Airport Network Model. Summary for the Parametric Model, the Full Model, and the (Stepwise Model)
. | . | Parameter estimation: mean (st. errors) . | ||
---|---|---|---|---|
Parameters . | Type . | Parametric Model . | Full Model . | Stepwise Model . |
Linear covariates | ||||
IN.STR | Vertex metric | −35.56* | −56.10*** | −38.74*** |
(14.88) | (13.92) | (10.52) | ||
OUT.STR | Vertex metric | 34.09* | 54.79*** | 36.06*** |
(14.51) | (14.27) | (10.30) | ||
BET | Vertex metric | −2.87** | −3.26*** | −2.92*** |
(0.94) | (0.67) | (0.66) | ||
TRA | Vertex metric | −49.29*** | −38.07*** | |
(12.83) | (10.75) | (9.00) | ||
CEN | Vertex metric | −3.23* | ||
(1.60) | ||||
CONNECT | Edge metric | 0.18*** | ||
(0.03) | ||||
NB.FLY | Edge metric | |||
(0.00) | ||||
DEP.FLY | Edge metric | 7.02* | ||
(2.31) | ||||
intercept | −10.19*** | −4.60** | −6.93*** | |
(1.69) | (1.64) | (1.19) | ||
Smooth terms | ||||
SIM | Edge metric | 0.00 | ||
(9.00) | ||||
CONNECT | Edge metric | 1.23** | 1.10* | |
(9.00) | (9.00) | |||
NB.FLY | Edge metric | 5.47*** | 5.31*** | |
(9.00) | (9.00) | |||
DEP.FLY | Edge metric | 1.74* | 2.12*** | |
(9.00) | (9.00) | |||
G.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
E.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
Predictive performance | ||||
In-sample | ||||
AIC | 517.25 | 424.87 | 425.96 | |
0.59 | 0.81 | 0.80 | ||
Out-of-sample | ||||
RMSE | 0.007937 | 0.000148 | 0.000129 | |
MAE | 0.000774 | 0.000044 | ||
Num. obs. | 132 | 132 | 132 |
. | . | Parameter estimation: mean (st. errors) . | ||
---|---|---|---|---|
Parameters . | Type . | Parametric Model . | Full Model . | Stepwise Model . |
Linear covariates | ||||
IN.STR | Vertex metric | −35.56* | −56.10*** | −38.74*** |
(14.88) | (13.92) | (10.52) | ||
OUT.STR | Vertex metric | 34.09* | 54.79*** | 36.06*** |
(14.51) | (14.27) | (10.30) | ||
BET | Vertex metric | −2.87** | −3.26*** | −2.92*** |
(0.94) | (0.67) | (0.66) | ||
TRA | Vertex metric | −49.29*** | −38.07*** | |
(12.83) | (10.75) | (9.00) | ||
CEN | Vertex metric | −3.23* | ||
(1.60) | ||||
CONNECT | Edge metric | 0.18*** | ||
(0.03) | ||||
NB.FLY | Edge metric | |||
(0.00) | ||||
DEP.FLY | Edge metric | 7.02* | ||
(2.31) | ||||
intercept | −10.19*** | −4.60** | −6.93*** | |
(1.69) | (1.64) | (1.19) | ||
Smooth terms | ||||
SIM | Edge metric | 0.00 | ||
(9.00) | ||||
CONNECT | Edge metric | 1.23** | 1.10* | |
(9.00) | (9.00) | |||
NB.FLY | Edge metric | 5.47*** | 5.31*** | |
(9.00) | (9.00) | |||
DEP.FLY | Edge metric | 1.74* | 2.12*** | |
(9.00) | (9.00) | |||
G.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
E.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
Predictive performance | ||||
In-sample | ||||
AIC | 517.25 | 424.87 | 425.96 | |
0.59 | 0.81 | 0.80 | ||
Out-of-sample | ||||
RMSE | 0.007937 | 0.000148 | 0.000129 | |
MAE | 0.000774 | 0.000044 | ||
Num. obs. | 132 | 132 | 132 |
***p < 0.001; **p < 0.01; *p < 0.05.
Airport Network Model. Summary for the Parametric Model, the Full Model, and the (Stepwise Model)
. | . | Parameter estimation: mean (st. errors) . | ||
---|---|---|---|---|
Parameters . | Type . | Parametric Model . | Full Model . | Stepwise Model . |
Linear covariates | ||||
IN.STR | Vertex metric | −35.56* | −56.10*** | −38.74*** |
(14.88) | (13.92) | (10.52) | ||
OUT.STR | Vertex metric | 34.09* | 54.79*** | 36.06*** |
(14.51) | (14.27) | (10.30) | ||
BET | Vertex metric | −2.87** | −3.26*** | −2.92*** |
(0.94) | (0.67) | (0.66) | ||
TRA | Vertex metric | −49.29*** | −38.07*** | |
(12.83) | (10.75) | (9.00) | ||
CEN | Vertex metric | −3.23* | ||
(1.60) | ||||
CONNECT | Edge metric | 0.18*** | ||
(0.03) | ||||
NB.FLY | Edge metric | |||
(0.00) | ||||
DEP.FLY | Edge metric | 7.02* | ||
(2.31) | ||||
intercept | −10.19*** | −4.60** | −6.93*** | |
(1.69) | (1.64) | (1.19) | ||
Smooth terms | ||||
SIM | Edge metric | 0.00 | ||
(9.00) | ||||
CONNECT | Edge metric | 1.23** | 1.10* | |
(9.00) | (9.00) | |||
NB.FLY | Edge metric | 5.47*** | 5.31*** | |
(9.00) | (9.00) | |||
DEP.FLY | Edge metric | 1.74* | 2.12*** | |
(9.00) | (9.00) | |||
G.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
E.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
Predictive performance | ||||
In-sample | ||||
AIC | 517.25 | 424.87 | 425.96 | |
0.59 | 0.81 | 0.80 | ||
Out-of-sample | ||||
RMSE | 0.007937 | 0.000148 | 0.000129 | |
MAE | 0.000774 | 0.000044 | ||
Num. obs. | 132 | 132 | 132 |
. | . | Parameter estimation: mean (st. errors) . | ||
---|---|---|---|---|
Parameters . | Type . | Parametric Model . | Full Model . | Stepwise Model . |
Linear covariates | ||||
IN.STR | Vertex metric | −35.56* | −56.10*** | −38.74*** |
(14.88) | (13.92) | (10.52) | ||
OUT.STR | Vertex metric | 34.09* | 54.79*** | 36.06*** |
(14.51) | (14.27) | (10.30) | ||
BET | Vertex metric | −2.87** | −3.26*** | −2.92*** |
(0.94) | (0.67) | (0.66) | ||
TRA | Vertex metric | −49.29*** | −38.07*** | |
(12.83) | (10.75) | (9.00) | ||
CEN | Vertex metric | −3.23* | ||
(1.60) | ||||
CONNECT | Edge metric | 0.18*** | ||
(0.03) | ||||
NB.FLY | Edge metric | |||
(0.00) | ||||
DEP.FLY | Edge metric | 7.02* | ||
(2.31) | ||||
intercept | −10.19*** | −4.60** | −6.93*** | |
(1.69) | (1.64) | (1.19) | ||
Smooth terms | ||||
SIM | Edge metric | 0.00 | ||
(9.00) | ||||
CONNECT | Edge metric | 1.23** | 1.10* | |
(9.00) | (9.00) | |||
NB.FLY | Edge metric | 5.47*** | 5.31*** | |
(9.00) | (9.00) | |||
DEP.FLY | Edge metric | 1.74* | 2.12*** | |
(9.00) | (9.00) | |||
G.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
E.DIST.AIRPORT | Edge metric | 0.00 | ||
(9.00) | ||||
Predictive performance | ||||
In-sample | ||||
AIC | 517.25 | 424.87 | 425.96 | |
0.59 | 0.81 | 0.80 | ||
Out-of-sample | ||||
RMSE | 0.007937 | 0.000148 | 0.000129 | |
MAE | 0.000774 | 0.000044 | ||
Num. obs. | 132 | 132 | 132 |
***p < 0.001; **p < 0.01; *p < 0.05.
The results of the Stepwise Model indicate that all parametric coefficients are highly significant (p < 0.001). The estimated means of the coefficients associated with IN.STR and OUT.STR are about −44 and 42, respectively. Hence, a one percentage point increase in in-strength or out-strength, corresponds to a log-risk reduction of about 44% or a log-risk increase of 42%, respectively, while keeping the other covariates constant. In line with the theory of importation risk (Maier & Brockmann, 2020), the results show that airports with a larger frequency of departing flights are more likely to act as hubs and end up at risk. The negative sign of IN.STR shows that more remote airports that count a relatively larger number of landing but fewer departing flights are associated with a lower risk.
Airports that have a larger ability to control the traffic flow passing through the network, and hence demonstrate higher levels of betweenness (BET), are expected to show a smaller importation risk of the disease. According to the results, a one percentage point increase in BET corresponds to a (log) risk reduction of about 2.7%. Airports with high values of betweenness might be associated with stronger authority over neighbouring clusters or may be closer to the periphery of clusters. High values of transitivity are associated with high potential for clustering. Therefore, we expect a higher risk in areas that are outside clusters, which is confirmed by the negative effect of TRA. We expect an increase of risk along the smooth terms (visualised in Figure 3) measuring connectivity (CONNECT), the number of flight connections per route (NB.FLY) and proportion of departures per route (DEP.FLY). More connected and busy airports are likely to be at higher risk due to the potential spread of the virus via their many flight connections (Coelho et al., 2020; Daon et al., 2020).
![Airport Network Model. Mean estimated marginal effect (red lines) on the log COVID-19 risk of three edge metric covariates. Top-left: number of flight connections per route (NB.FLY); top-right: connectivity (CONNECT); bottom-left: proportion of departures per route (DEP.FLY). The blue dashed lines indicate 95% pointwise confidence intervals. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-3.jpeg?Expires=1748057090&Signature=Bc4wCd4NLlEdAxXyOsvQqC3tu0nZBxmUP6XGAc6zbsJzeEMhFymD3BOYAypoOIDzwdXJEnx5R1~hSexKDeNN2D67KtmKiguiU-8t2sqzaURdImDrKjTu91rvRp5xZLDm6a2ab5lWIKJwtAeIDA0EgSabIAKGZYAORhL9A8LEClVZD1i3T97Lc7vlxiF8W3r2x5g0Maw9wIXMCZ-CcqbL990G75tj0~6atRI1u0dmzUqxmsA5xgu7849qt-eR2XP5jw308qWv5U1wGJRYdLTkO1K44JXC95YvQXjj0-k5MRN2kUntCxaZPN-WUdiRydpKeWUFE0lIz15-3a2Bz-18Ng__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Airport Network Model. Mean estimated marginal effect (red lines) on the log COVID-19 risk of three edge metric covariates. Top-left: number of flight connections per route (NB.FLY); top-right: connectivity (CONNECT); bottom-left: proportion of departures per route (DEP.FLY). The blue dashed lines indicate 95% pointwise confidence intervals. [Colour figure can be viewed at https://academic.oup.com/]
The results (top-right, Figure 3) corroborate that a higher connectivity between airports is associated with an increase in the resilience of the network, which provides opportunities for the virus to spread easily and hence increases the risk (White & Harary, 2001). A higher traffic volume tends to be associated with higher risk prevalence (Bogoch et al., 2016), which appears in line with the positive trend (red line) observed in the effects of the number of flights on COVID-19 risk (top-left). However, the evidence of a positive trend in both variables is weak given the relatively large confidence intervals (CIs) (blue dashed lines).
The estimated effects of the departing flight volume (bottom-left) on COVID-19 are not conclusive. The absence of a clear tendency is illustrated by the wide CIs. The large uncertainty observed in this analysis might result from the relatively low number of observations (132 flight connections) and the presence of outliers.
2.5 Risk estimates
Based on the specifications of the selected Airport Network Model (Equation 1), we perform out-of-sample predictions to compute the risk values for all 438 connections within the full Indonesian network (cf. Figure 2e). The predicted values are then used as edge weights to calculate the normalised ex ante COVID-19 risk for each of the 78 airports on the Indonesian archipelago (cf. Figure 2f).
Figure 4 shows the value of the mean (log) COVID-19 risk (red dots) predicted at each airport along with uncertainty (bars) given by 95% CIs which provide a lower and upper estimate of (log) COVID-19 risk. The main airports such as Jakarta (CGK), Sultan Hasanuddin International Airport (UPG) and Surabaya (SUB) exhibit the highest predicted (log) COVID-19 risk.
![Predicted ex ante COVID-19 risk at airports. Ex ante COVID-19 (log) risk (x-axis) predicted for all (78) domestic airports in Indonesia (y-axis). Red dots correspond to the mean estimation of the log ex ante COVID-19 risk. Bars represent 95% confidence intervals, which provide a lower and upper estimate of (log) COVID-19 risk. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-4.jpeg?Expires=1748057090&Signature=zNL7pGq9r0rg-npKYGsDMUMWPsBjTQaBg3dHyqNCp8xMmjrOMmbUlp9GVOgvaSLkfJHm-R5p-T5Z1HghB9RFnH~Wg2OMhd5QW6ZRgg7hcTy3KQMrPM~oiyGIJ20ADKR7w2JT0-lOBPKz9EVE-w9O4Z-BrtCbtXetGlSZjgvCTREV4xHAQ2f7NmIDV-y6wzipE1SFTWmAX~qwiCAXIN--uAjS~cNXAV78wAAj0OtRL07RYmoya98VsqkycsMGdsucAQx-qlwvDXY7OkiqYMKAMslmxxpofxQFRqoiDVQe5vfI9q719lj7dxHV~EpzAC4GAqyNiXAqk0wbyt857jdsag__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Predicted ex ante COVID-19 risk at airports. Ex ante COVID-19 (log) risk (x-axis) predicted for all (78) domestic airports in Indonesia (y-axis). Red dots correspond to the mean estimation of the log ex ante COVID-19 risk. Bars represent 95% confidence intervals, which provide a lower and upper estimate of (log) COVID-19 risk. [Colour figure can be viewed at https://academic.oup.com/]
3 Spatial Propagation Model: A Hierarchical Bayesian Geostatistical Model To Predict Ex Ante Covid-19 Infection Risk
The results of the Airport Network Model provide an estimate of the ex ante COVID-19 risk along with uncertainty for each connection in the domestic Indonesian airport network. To account for the uncertainty in the estimation of the ex ante COVID-19 risk at the airports, we build three scenarios, which provide a lower (scenario I), mean (scenario II) and upper (scenario III) estimation of the ex ante COVID-19 risk at the airports. For each scenario (I-III), we extend the prediction of ex ante COVID-19 risk over the entire archipelago at a fine spatial scale using a hierarchical Bayesian geostatistical model. The models include a set of spatial correlates associated with the risk of transmission and account for spatial dependencies (Diggle & Giorgi, 2018).
3.1 Data
We propagate the ex ante COVID-19 risk from each airport into a spatial grid which covers the Indonesian archipelago. The Spatial Propagation Model considers four covariates related to an intensified spread of COVID-19: airport access, traffic intensity, proximity to Java and Bali, and population. We measure airport access () as the travel distance from each grid cell to the nearest airport. Travel distances are computed following Weiss et al. (2018) high-resolution friction maps. The spread of an infectious disease is expected to increase with the intensity of commuting and travelling (cf. Xu et al., 2019). We approximate travel intensity () using Meijer et al. (2018) road network density maps. Proximity to Java is proxied by the Euclidean distance to the islands Java and Bali (). We gather population size data from the High Resolution Settlement Layer (HRSL) population data set (Facebook Connectivity Lab and Center for International Earth Science Information Network, 2021) that we aggregated at pixel-level (). Larger population density increases the pool of individuals to be infected and quickens the spread of the virus (Bhadra et al., 2021). Population is considered to be a major driver associated with the transmission of the COVID-19 at various spatial scales (Python et al., 2022; Zhang et al., 2020). Therefore, we expect more vulnerability to COVID-19 outbreaks close to densely populated areas.
All covariates are scaled multiplicatively to values from 0.01 to 0.99, which avoids numerical difficulties during the fitting process (Gómez-Rubio, 2020) and ease the comparison among the covariate estimates. Figure 5 shows the values (log scale for illustration purposes) of the three distance-based risk and population covariates aggregated within grid-cells in Indonesia.
![Covariates used in the Spatial Propagation Model. (a) log population size (POP), (b) log travel time to the nearest airport (ACCESS), (c) log traffic density (TRAFFIC), and (d) log Euclidean distance to Java and Bali (DIST.JAVA). All covariates are defined on a 5×5 km2 grid. Grey pixels are regions without population which are excluded in this study. Green dots in panel (b) indicate the location of the investigated airports (78) from which log COVID-19 risk are taken as response in the Spatial Propagation Model. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-5.jpeg?Expires=1748057090&Signature=PCZPaFf-cmmWFPvVwEucLPgYbQQuz19SouzSTJth-XXMA93uFRxzpOQbccnR7D~PgpiuMhv~-dfhl~IRuV1yTZ8qXU-HB9bc9UecYZAoX3Fa9KrAdZbusYqz-ll8KiKYmkLCoFyL3ZcB-FKtyJoq1hjZRaumKjD56wMh8XXy89wMGVqRvfU7BFVebEUgWpFla~fMRtjGhY8HTM-BJhuVRZNnYCay3rphpHQu4U8ggr2Fo8nQSqfijYU-A4fxmSWu3mwdGMjPNyXHkOpZbPN3568PDMQzxAr5--cASVJOfx9MW6qVdq~o4Rm7KUhz7l7a45uTk-PMHkrDyLqr0h8OFQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Covariates used in the Spatial Propagation Model. (a) log population size (), (b) log travel time to the nearest airport (), (c) log traffic density (), and (d) log Euclidean distance to Java and Bali (). All covariates are defined on a grid. Grey pixels are regions without population which are excluded in this study. Green dots in panel (b) indicate the location of the investigated airports (78) from which log COVID-19 risk are taken as response in the Spatial Propagation Model. [Colour figure can be viewed at https://academic.oup.com/]
3.2 Model specification
The model selection procedure (Section 3.3) results in the following specification of the Spatial Propagation Model:
where is the ex ante COVID-19 log-risk value observed in the 78 grid cells i associated with the domestic airports (see Section 2). We assume that follows a Gaussian distribution with expectation and precision of the Gaussian errors .
In addition to the selected covariates assumed to be associated with an intensified spread of the disease, our model features a spatial Gaussian process, to account for spatial dependencies. More precisely, we consider a stationary Gaussian field with a Matérn covariance function (Eq. 3):
where denotes the Euclidean distance between two locations , is the marginal variance of the Gaussian random field, is the modified Bessel function of second kind and order λ > 0 (quantifying the degree of smoothness of the process) and κ > 0 is the scaling parameter associated with range which can be interpreted as the distance above which the spatial correlation becomes negligible (Lindgren & Rue, 2015).
For computationally efficient inference, we consider the stochastic partial differential equation (SPDE) approach available in the R package R-INLA utilising Integrated Nested Laplace Approximations (INLA; Rue et al., 2009; Lindgren et al., 2011). R-INLA has been widely adopted in epidemiological modelling (see, e.g. Bhatt et al., 2015; Cameletti et al., 2013; Golding et al., 2017; Noor et al., 2014). INLA can accommodate a large class of models (Rue et al., 2009) is often faster than approaches based on Markov chain Monte Carlo, and provides diagnostics that are easily interpreted and reproducible (Wang et al., 2018). For further details on INLA, see Lindgren et al. (2011), Rue et al. (2009) and Bakka et al. (2018) review on R-INLA applications.
The Bayesian nature of the investigated spatial models allows us to flexibly set informative priors in order to mitigate the problem of having a low sample size (78 airports). However, small-sample estimates are ineluctably sensitive to the specification of the prior distribution. Therefore, we take particular care to set reasonable priors to obtain reliable and interpretable results. To do so, we use prior knowledge to impose relatively strict priors on the fixed effects parameters. Using R-INLA default flat priors (range between −93 and 93) on the variance of the fixed effects would lead to implausible or inaccurate estimates. We assume an a priori negative effect of and , since it is likely that COVID-19 risk is higher in densely populated areas. Note that is considered during the model selection procedure but the selected model with the highest predictive performance does not include it (Section 3.3). Similarly, we assume an a priori positive effect of and on COVID-19 risk (cf. Python et al., 2022; Xu et al., 2019; Zhang et al., 2020). Further details on the prior specifications are provided in Appendix A.3.2.
As we do not have particular a priori knowledge on the extent and magnitude of the spatial structure of the spread of COVID-19 in Indonesia, we use penalised complexity priors (Fuglstad et al., 2019; Simpson et al., 2017) for the spatial effect. Penalised complexity priors are set on the two hyperparameters and associated with the spatial random field ξ (Equation 2). This requires us to provide a threshold and an a priori probability that the true values of each investigated parameter are above or below the threshold. We choose the following weakly informative priors on the range and marginal variance of ξ: , with measuring the maximum Euclidean distance between two airports from a set of all observed airports (78) across Indonesia and . The sensitivity to the choice of different priors is discussed in Appendix A.3.5.
3.3 Model selection procedure
We use the widely applicable information criterion (WAIC) (Watanabe & Opper, 2010) and two predictive performance metrics to select the model: the conditional predictive ordinate (CPO, Pettit, 1990) and the probability integral transform (PIT, Dawid, 1984). is the posterior probability of observing conditioned on all other observations that is
where contains the observed (log) risk with the observation omitted. We compare the CPO resulting from seven models, which use a different subset from all possible non-null subsets composed of three covariates (results in Appendix, Table A2).
is given by the conditional (given all other observations) cumulative distribution function of a replicate observation evaluated at the observed value , i.e.
If the forecast is perfect and the distribution of is continuous, has a uniform distribution (Wang et al., 2018).
We also use CPO as a cross-validation measure of fit using the log pseudo marginal likelihood (LPML) which is defined as (Gneiting et al., 2007):
Larger values of LPML indicate a better fit in terms of predictive performance.
3.4 Results
The results of the best performing model predicting the ex ante COVID-19 risk at Indonesian airports are shown in Table 2 (detailed results are shown in Appendix A.2.2). The selected model includes the following covariates (on a log scale): distance to airports (), traffic intensity () and population ()). Table 2 reports the mean, median, 95% lower bound CI, 95% upper bound CI and the standard deviation (std) of the estimated parameters. None of the CIs associated with the fixed effects coefficients contain zero.
Spatial Propagation Model: Baseline scenario. Mean, median, 95% credible intervals with lower () and upper () bounds, and standard deviations (std) associated with the covariate effects (Fixed effects), precision of the unstructured errors () and hyperparameters of the spatial field (range and marginal variance )
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
intercept | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 |
Random effects | |||||
Spatial range | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 |
Marginal variance | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 |
Precision | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 |
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
intercept | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 |
Random effects | |||||
Spatial range | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 |
Marginal variance | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 |
Precision | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 |
Spatial Propagation Model: Baseline scenario. Mean, median, 95% credible intervals with lower () and upper () bounds, and standard deviations (std) associated with the covariate effects (Fixed effects), precision of the unstructured errors () and hyperparameters of the spatial field (range and marginal variance )
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
intercept | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 |
Random effects | |||||
Spatial range | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 |
Marginal variance | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 |
Precision | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 |
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
intercept | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 |
Random effects | |||||
Spatial range | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 |
Marginal variance | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 |
Precision | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 |
As expected, the logged travel time to the nearest airport () has a negative effect, which is in line with the literature (Knipl et al., 2013). A location 1% farther away from an airport decreases the (log) COVID-19 risk by 0.5% on average while keeping all other explanatory variables constant. Both logged traffic intensity () and logged population () are associated with an increase in the logged COVID-19 risk, which is consistent with the literature (Bhadra et al., 2021; Python et al., 2022; Zhang et al., 2020).
The resulting estimation of the spatial range () provides complementary information on the initial spread of COVID-19 risk. The estimated average distance above which spatial dependencies become negligible is 2.89 degrees, which correspond to about 300 km. This suggests that the ex ante COVID-19 risk observed in Jakarta is unlikely to be associated with the risk observed in Bali—953 kilometres away from the capital.
3.5 Model validation
In order to systematically validate our model, we use a leave-one-out cross-validation (LOOCV) procedure (Vehtari et al., 2017). We visually assess the uniformity of the PITs using a Q–Q plot, which plots the sample quantiles together with the theoretical quantiles associated with the PIT estimated for each of the 78 hold-out data. The good alignment of the observations (dots) with the assumption of uniformity (diagonal line) in the Q–Q plot illustrated in Figure 6 support the assumption that the cross-validated PITs follow a uniform distribution. These results suggest a relatively good model fit.

Model validation. Q–Q plot assessing the uniformity of the PITs associated with the predictions of ex ante COVID-19 risks () for i = {1, …, 78} hold-out data in a leave-one-out cross-validation (LOOCV) procedure. Perfect uniformity is represented by the diagonal line. Dots represent the values of the theoretical quantiles (x-axis) of the estimated PITs along with their corresponding estimated sample quantiles (y-axis).
Figure 7 illustrates the uncertainty associated with the predictions of ex ante COVID-19 risk. For each of the hold out points, the figure shows the median (green points) and predicted 95% CI (red segments) along with the true value (blue points). While most true values (blue points) fall into the predicted 95% CI, one can observe that most CIs are relatively wide, which suggests a relatively high uncertainty associated with the predictions. Note that the model seems to underestimate the risk for extremely high risk values (see the blue points that lie above the red segments in Figure 7).
![Model validation. Mean (green points) and predicted 95% CI (red segments) together with the true observed values, log(RISKi), (blue points) for each hold-out airport i = {1, …, 78}, produced by a leave-one-out cross-validation (LOOCV) procedure. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-7.jpeg?Expires=1748057090&Signature=QWZd~x8Sc6NTRez6dfy8G-0EHSg6HbJMfHN6WoulaNXk6iJC--ldyXGosLgea4XFSISnPkm9llm4A8n8o~E0Z~9xobesvGbiy~89PLrbEd1TrN8e1JrN~Rts-p~iyGonZi~bggWr6L3Z4CghQ9IgH9ZWlSJp7KKAN7qjRi7CzaZd0uo88BNNXeZfB8F8dAuAjbf1T2~kSgXFUNDlWstZHpxt1MG2dfFiW0HIdGBZa0-jl-9Ri2-Qg-mp81e7MAwamS83x3lv3co41Xm2G7ebsZ4rSqbwucWvtN8UJCPjzcqJfKGy4Vpmzfu6nlW2vALvpFOZh8uvHTjihc-acS-zJQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Model validation. Mean (green points) and predicted 95% CI (red segments) together with the true observed values, , (blue points) for each hold-out airport i = {1, …, 78}, produced by a leave-one-out cross-validation (LOOCV) procedure. [Colour figure can be viewed at https://academic.oup.com/]
As an ex post plausibility check for our modelling approach, we compare our risk predictions with external data on early COVID-19 incidence in Indonesia. For this purpose, we rank Indonesian regencies by the total number of cases that are reported by the middle of March, April and May 2020. The comparison is imperfect as the official surveillance data is likely to under-report COVID-19 cases in the first months that followed the initial spread. Furthermore, bias is likely to affect the spatial distribution of the data and increase over time since spatially targeted public health interventions are carried out more frequently over time, which distorts the observed ranking of COVID-19 cases at regency level especially in the later months. Other determinants of the spatial spread of the disease factors, including internal migration linkages and local public health capacity, may also introduce additional biases. These limitations notwithstanding, we observe a relatively good fit between the regency-level average of the predicted risk and incidence data in mid-March, mid-April and mid-May (see results in Appendix A.3.4).
3.6 Mapping the ex ante COVID-19 risk across Indonesia
Within a Bayesian framework, we use the posterior distribution estimation of the parameters. The output of the two-step modelling process consists of an estimation of the posterior distribution associated with the parameters of the Spatial Propagation Model which is used to make spatial inference about the ex ante COVID-19 risk in Indonesia. We consider three scenarios to account for the uncertainty in the estimation of the first step (Airport Network Model) that is propagated in the second step (Spatial Propagation Model). For each scenario, we map the posterior median and the standard deviation of the (log) ex ante COVID-19 risk into a fine spatial grid (5 × 5 resolution). Figure 8 illustrates the results. We observe that the patterns are consistent among the scenarios. As expected, the highest median risk values are observed in populated areas such as the islands of Java and Bali, while remote areas such as Papua show a smaller initial importation risk (panels (a–c)). However, areas with spatially sparse observations in regions such as Papua and northern Kalimantan exhibit higher uncertainty in the estimation of COVID-19 risk, with high standard deviation values (panels (d–f)) relative to the median.
![Maps of the ex ante (log) COVID-19 risk in Indonesia. The maps show the (a–c) median and (d-f) standard deviation of the (log) ex ante COVID-19 risk for three scenarios (I: lower (first column), II: mean (second column), III upper estimations (third column) of the response) in Indonesia predicted at fine spatial grid (5 × 5 km2). [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-8.jpeg?Expires=1748057090&Signature=Nqc430dEVcmiwBknpRSuxJfVozXPhVLwqfvXdKNTfu4djzF3DWFHQnGfR0IimAJ5HYTRjWkJog6x6KeiEN30FfPTJD71E8X3sHoyMWWZCGZ1V3zL7FRFGX8RAVQn89xSq36rLlny~DSellMTGYQmfMGIcaMae9-plWO2m10EiRflADQamEmJZOicDBUbHwthS9WK5hC~VU2fjNqNZPudJHWh5x8wdtsnqlNgGr4H-D8CUYo6LOUlhWibxlpaZV-RouOb~q1lS94EvcndIYioXMGmDoOzUrUp740laFkK~8eXZfWA5s5~SrRjEAJnHusfdh4U4fCKLP8iUNDsgLblhw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Maps of the ex ante (log) COVID-19 risk in Indonesia. The maps show the (a–c) median and (d-f) standard deviation of the (log) ex ante COVID-19 risk for three scenarios (I: lower (first column), II: mean (second column), III upper estimations (third column) of the response) in Indonesia predicted at fine spatial grid (5 × 5 ). [Colour figure can be viewed at https://academic.oup.com/]
While these maps help to visualise the results, they are especially difficult to interpret for policymakers. To determine areas with high COVID-19 risk, end users need to consult at least two maps: (a) identifying locations with high median risk (based on, e.g., panels (a–c) of Figure 8); and (b) checking if the uncertainty about the prediction (based on, e.g. panels (d–f)) is small enough to avoid a false identification of high-risk areas. Equally difficult is the detection of areas at low COVID-19 risk or areas that require further data collection.
To simplify the interpretation of the maps, we calculate the exceedance probability for each grid cell k = {1, …, 388,740} that covers the Indonesian archipelago. The resulting maps provide an estimated probability that the ex ante COVID-19 risk is larger than a threshold value c, . This allows for the easy identification of high-risk areas. In practice, thresholds have to be chosen according to the policymakers' needs and can vary with their strategies and available resources.
Figure 9 illustrates such exceedance probability surface maps. We use two thresholds which can be adjusted e.g. in a policy-making context. Panels (a–c) show the probability of the log COVID-19 risk exceeding −2.2 for scenarios I–III. Panels (d–f) show the probability of the log COVID-19 risk exceeding −1 for scenarios I–III.
![Exceedance probability maps. Probability of log COVID-19 risk exceeding −2.2 in scenarios I (a), II (b) and III (c). Probability of log COVID-19 risk exceeding −1 in scenarios I (d), II (e) and III (f). The spatial resolution is 5×5 km2. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-9.jpeg?Expires=1748057090&Signature=C5sU3C4a3BWcX7UEXV9oExqR6SCAW9lQpKlYuFA954HUdPyzgIOV2rotBfquGDxTeaYNISfi~QLf-lGFt3mhKEeuPF~6t87Dl5LPXauWjmJO5H~6kU6n0gMac66asIXUTuZyleyqiLZrn7rMB1~BVp3wjXvOA4kp0-WCKuSRRCMe2V7ZcjFR-lm2k7snsZ4tJO~F292jnJRTfkbW7Gol62STRz6pSgYiupyOnJe9GquFsS9sYrhQwwyhC7RpbqoWO8YhCRVhbmelfe9JAIbsUaact8Pq-d6mwkpaTDx6j1n12TmHZhZw5BlVpmkHVxcyZYFpMuaXBHRvVhZWUy7Byg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Exceedance probability maps. Probability of log COVID-19 risk exceeding −2.2 in scenarios I (a), II (b) and III (c). Probability of log COVID-19 risk exceeding −1 in scenarios I (d), II (e) and III (f). The spatial resolution is . [Colour figure can be viewed at https://academic.oup.com/]
With very limited resources, policymakers may favour a high threshold in the exceedance probability map to highlight regions at elevated risk requiring direct intervention. Once more resources are available, the threshold can be reduced, increasing the area considered at high risk. When choosing a relatively high threshold, the regions considered at elevated risk include the metropolitan area around Jakarta and regions between Surabaya and Malang (Figure 9a–c). With a lower threshold, the spatial coverage of high risk areas is extended to almost all territories of Java, South Sumatra, South Kalimantan, Makassar, Bali and Mataram (Figure 9d-f). The choice of the threshold is crucial. Consequently, in practical applications, it should be selected with care by end users.
Besides, maps showing the (log) probability of COVID-19 risk not exceeding a relatively large threshold (Figure 9a–c) highlight larger areas at low risk than those using smaller thresholds (Figure 9d–f). For a given threshold, areas with a low exceedance probability in the interval (0,0.25] are informative as well. They may highlight areas where the spatial distribution of the observations is too sparse such that additional data collection might be required. The information on statistical uncertainty conveyed by the exceedance probability maps is crucial for policymakers who need further information from additional sampling in areas that require surveillance.
4 Discussion
The first step (Airport Network Model) and the second step (Spatial Propagation Model) exhibit important limitations since both models rely on simplifying assumptions and relatively scarce data. Estimates from the Airport Network Model crucially depend on the quality of the initial COVID-19 risk estimates for the seven main entry airports provided by Maier and Brockmann (2020). Furthermore, the skewed distribution of the risk data makes the choice of an appropriate likelihood function for the Airport Network Model challenging. To mitigate this issue, we log the risk values such that normality assumptions appear valid for the resulting transformed data (see the diagnostics plot in Appendix A.2.1). Quantifying uncertainty associated with the estimates from the Airport Network Model remains challenging. To account for it, we computed the estimates under three scenarios, which provide a lower, mean and upper estimation of COVID-19 risk based on the estimated mean and 95% CIs. For each corresponding scenario, we fit a Spatial Propagation Model and compared the results. The relatively high consistency in the results suggests that the Spatial Propagation Model is robust to variations of the response (i.e. the output of the Airport Network Model).
Our Airport Network Model relies on airports' traffic volumes to derive Indonesia's COVID-19 importation risk. This seems a reasonable assumption for island nations like Indonesia, Australia or New Zealand, but would face more significant limitations in case of countries where a substantial part of international travel occurs via land. Even in our setting, approximating the number of passengers at risk of being infected by COVID-19 only by the number of flight connections ignores a potential association between airplane capacity and urbanisation. Small airplanes are more likely to fly to provincial airports. Therefore, our simplified approach may overestimate the risk in remote areas. We also ignored maritime transport routes such as ferries that are widely used across the Indonesian archipelago, but are a less substantial means of international travel. However, as the island of the arrival airport might not always coincide with the final destination of passengers, our approach does not capture maritime transmission routes. We also exclude Singapore as a final possible destination although it hosts a large share of international flights to Indonesia. Some passengers who enter Indonesia from Singapore via ferry will not necessarily enter the domestic Indonesian flight network. Despite these limitations, it appears reasonable to suppose that the COVID-19 importation risk is still fairly well described by airplane connectivity because cross-border traffic in Indonesia predominantly occurs via air transportation.
We identified several shortcomings associated with the Spatial Propagation Model. It assumes both stationarity (and hence invariance to translation) and isotropy (invariance to rotation). Processes impacting the ex ante COVID-19 risk might be non-stationary, with the risk being clustered in one area, but incurring randomly in other regions, or anisotropic such that the ex ante risk increases systematically in some directions. Future studies investigating non-stationary models as well as dynamic models including cell phone or real traffic data might be able to capture more complex spatio-temporal processes associated with the ex ante spread of COVID-19 through the SPDE/GMRF-approach implemented in R-INLA.
The hierarchical Bayesian geostatistical model allows us to set up informative priors to benefit from prior knowledge and mitigate issues associated with the relatively low sample size of the observed data. While our prior sensitivity analysis (see Appendix A.3.5) suggests that limited changes in the prior specifications do not drastically alter the main results, the model ineluctably remains sensitive to important variations in the priors. In this context, we put particular emphasis on choosing relevant priors in line with the literature to increase the reliability of our results while ensuring that they remain interpretable.
By mainly focusing on transport networks, the Spatial Propagation Model ignores potential drivers of the ex ante COVID-19 risk such as the proportion of people with pre-existing conditions or with a lifestyle associated with a higher risk. To mitigate the effects of the omitted variable bias, the spatial dependence structure, represented by a Gaussian Markov random field, is introduced in the model, which partially accounts for unobserved drivers.
5 Conclusion
This paper examined the log-transformed ex ante risk of COVID-19 in Indonesia by applying a Bayesian hierarchical geostatistical model based on risk values obtained through a network analysis of flight connections in Indonesia. The novelty of this paper lies in its ability to systematically capture the ex ante COVID-19 risk both in urban as well as rural areas in Indonesia in the presence of scarcity of spatial data, which typically occurs in the early stages of a disease outbreak. Our approach provides tools for policymakers to identify areas that would require further surveillance and areas vulnerable to a potential future outbreak of a new COVID-19 variant or another similar contagious respiratory disease.
The suggested spatial statistical approach adds to the exploratory literature by assessing the role of major infrastructural and socio-economic drivers of COVID-19 (Bhadra et al., 2021; Dadar et al., 2020; Meijer et al., 2018). We found that the island of Java was exposed to a high initial risk. This seems plausible because of the important socioeconomic role of the island in the archipelago. Furthermore, we identified other metropolitan areas at risk to be affected at the beginning of the pandemic. Our findings corroborate that urban areas tend to be more exposed than peripheral islands to an initial spread of COVID-19. Sparsely populated areas and those with a low traffic density—both effects are associated with less degree of urbanisation—seem less at risk. Our study brings support to previous findings showing the vulnerability of urban centres to COVID-19 risk and their role in facilitating its spread (Carozzi et al., 2020).
A rigorous model validation remains challenging in the context of predicting risk ex ante, which is inherently affected by the absence of data. However, our results show a good alignment of predictions with external data at a fine spatial scale, for several months after the initial spread of COVID-19 in Indonesia. Future work collecting additional data based on a design aiming at reducing various sources of bias would improve the reliability of the procedure used to assess the validity of our model.
Within a Bayesian model, we generated exceedance probability risk maps to illustrate estimates of the ex ante COVID-19 risk and its uncertainty in different scenarios and at policy-relevant spatial scales across Indonesia. A collaboration between the analyst who calibrates the model and policymakers is required to define sensible thresholds to identify local hotspots and areas that require further data collection (Moraga, 2019). We believe that initiating and maintaining a close dialogue between statisticians and policymakers is crucial to ensure successful interventions aiming at mitigating COVID-19 risk in vulnerable areas.
Acknowledgements
We declare no competing interests. This project has received funding from the National Key Research and Development Programme of China (2021YFC2701905), the Zhejiang University Fundamental Research Funds for the Central Universities (2021QN81029), the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101031461, the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–Project ID 192626868–SFB 990 in the framework of the collaborative German-Indonesian research project CRC990, as well as the Campus-Institute Data Science, University of Göttingen. The authors of this work take full responsibilities for its content. We would like to thank the Associate Editor and the reviewers for their thoughtful comments which have led to numerous improvements of our work. We would like to thank Bertha Lovita Dwi Intania Permana and Isabelle Wilhelm for their assistance in translation tasks, data collection and proofreading and Baoli Liu for assistance on the illustrations. We especially thank Wisnu Harto Adiwijoyo for sharing crowd-sourced panel data of Indonesian COVID-19 infection and fatality cases.
References
This appendix provides further details on data processing and the implementation of the two-stage approach consisting of the Airport Network Model and the Spatial Propagation Model. All statistical analyses were conducted in R 4.0.2 (R Core Team, 2020). The code and publicly available data for replication can be accessed (https://github.com/JacDo/Indonesia) via Github.
A.1 Airport network model
Data
The aim of the Airport Network Model is to predict the ex ante COVID-19 risk in each investigated airport (78) of the domestic Indonesian network. To do so, it uses: (a) COVID-19 importation risk data for seven major Indonesian airports from Maier and Brockmann (2020), (b) covariates associated with 132 flight connections in the Indonesian domestic airports (aviationstack, 2020).
The COVID-19 importation risk data for seven major Indonesian airports is obtained from Maier and Brockmann (2020)'s model (data freely available here: http://rocs.hu-berlin.de/corona/docs/model/importrisk, which provides a relative importation risk of COVID-19 value (in %) for the seven main airports in Indonesia. The model is based on global air transportation network data (3893 airports and 51476 directed flight connections). The risk values for the seven main Indonesian airports are the following: CGK (Jakarta airport: 0.154%), DPS (Denpasar-Bali: 0.238%), SUB (Surabaya: 0.029%), UPG (Makassar: 0.014%), Medan Kuala Namu (MDC: 0.023%) and Yogyakarta (JOG: 0.012%). Details on the model can be found in Maier and Brockmann (2020). Figure A1 provides an overview on the Indonesian domestic flight network from December 2019 and January 2020.
![Indonesian domestic flight network. Flight connections and their frequency (denoted by line thickness) among the 78 domestic Indonesian airports from December 2019 to early January 2020. Data source: aviationstack (2020). [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-10.jpeg?Expires=1748057090&Signature=19hOT43oZ6i-tKraZq4P98UJBrBycf-j08sx5LnBWWs7clvcX~~u1Q0Bi4RLlbJknQ~hRe~fKxnMW~bFTLswF7Ka4TDigINTHj01OJM3Y35gosCn~60Mv13uH1NV5mNaw4jLMvW6cYdcf9p1pYvvB9s-u6D0GTbtzbYiuJzB-7qpWIbVMW2P8XmtWWceZpfodqhMytj5J10MSL7k1iznvVOmGQ~4zbIC65ayAY9HvqO~RSBE0KQabHE7FMIU79sNasjX0Z6c3DEcvyqVVzCtGjWTpbhOi3zA5bXYjc03hiPEiVOypA8ChfygdsfWfOzRlyJlaITv1LXkiGyJhnObdg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Indonesian domestic flight network. Flight connections and their frequency (denoted by line thickness) among the 78 domestic Indonesian airports from December 2019 to early January 2020. Data source: aviationstack (2020). [Colour figure can be viewed at https://academic.oup.com/]
The Airport Network Model includes several covariates, associated with network metrics: connectivity (CONNECT), number of flight connections per route (NB.FLY) and proportion of departures per route (DEP.FLY). Further, the Euclidean distance between airports (E.DIST.AIRPORT), geodesic distance between airports (G.DIST.AIRPORT), betweenness (BET), transitivity (TRA), inward strength (IN.STR), outward strength (OUT.STR), eigencentrality (CEN) and similarity (SIM). These metrics are computed from flight connection data which provides the locations as well as connection flow (flight frequencies) between domestic airports in Indonesia from December 2019 to January 2020. The data can be obtained from the provider (with a charge) here: https://aviationstack.com (aviationstack, 2020).
A.2 Assumptions of the airport network model
In this section, we discuss the assumptions underlying the Airport Network Model (Maier & Brockmann, 2020). Here, we estimate , the probability of importing a case at vertex m from an origin airport . We assume that passengers enter the network at and exit at v. We denote by the weighted links, with passenger flux . We assume a path , where n = 1, …, N are the labels of the vertices in the network, i is the index of the point of entry into the network and j is the index of steps L. The defined path is a sequence of L steps along a set consisting of L + 1 vertices.
For each possible path, we consider two metrics that can be expressed as a probability: (a) transition probability: the probability that the next position is m given that a passenger at vertex v travels to another vertex, expressed as where ; and (b) exit probability: the probability that a passenger exits the network upon arrival at vertex v is , where while the probability that the person will continue the journey is . An additional assumption is that the exit probability also depends on the starting point of the passenger: .
The probability that a path Γ is taken (path probability) can be defined as:
where . Moreover, we assume that at the initial starting airport. As is a function of the entry vertex, S has this additional dependency: , which assumes that the location of the starting point determines the set of paths that can be taken. Consider the probability that a path Γ begins at and ends at after t steps:
where is the subsets of paths starting at and ending at after a time t. Hence, path probability can written as:
which can be interpreted as the probability that a passenger exits the network at m at time t. The transition probability can be derived as . We can interpret as the probability that a traveller starts at , arrives at vertex m and leaves. For smaller airports in remote positions, a value of 1 is sensible since one can reasonably assume to be large as the passengers are likely to exit at m. Large airports often serve as a transition hub and therefore we assume that has a population in the surrounding which proportional to a power ι of the population size. The scaling exponent can be generalised to ι ≠ 1 (Maier & Brockmann, 2020).
Normality assumption
We visually assess the normality assumption of the response of the Airport Network Model based on a quantile–quantile (Q–Q) plot (Figure A2), which plots the quantiles of the observed data (COVID-19 ex ante risk) against the theoretical quantiles of the normal distribution. Normally distributed data should closely follow the red line (a). However, the histogram of the residuals (b) suggests some skewness in the data. While the distribution of the low-values of the residuals seems in line with normality, major deviations can be seen for larger residuals.
![Diagnostic plots of the Airport Network Model. We provide four diagnostic plots for the Airport Network Model: (a) Q–Q plot which plots the theoretical quantiles versus the observed response, (b) histogram of the residuals, (c) Q–Q of the logged response. The response is the COVID-19 ex ante risk of 132 connections among the airports in the domestic Indonesian network. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-11.jpeg?Expires=1748057090&Signature=ohEfPLjngg2VMr8rc5nrovwtwXPp8xSTubYFx2HCUfipP5l4KkQ3gn1EODjYcHVyInipaJeZtAKYBzxPU9frSI4Ck5jVI71fU-~mwqpIJ8DRBBCnQEJnYW0-SZBwrT~jzN5bihu5UopvlFACqf1bSOFbsSxd-jDm3PQobN2wGtdzOfXrLjSbekBfLi-hRjsBsehj1Hclq4NZGXA1RAGCnnLemmyM~dwoycZtaSwprLgt14QWIJrywqxOyo0w~ZbGSMRqyLwpEm9vG46cRdpfLyChP6k1OK3zRI7RhS26gjFt8lIquIyA8uLffY-pQtUJ4EvdxxeVA3yPPJqxoPk57w__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Diagnostic plots of the Airport Network Model. We provide four diagnostic plots for the Airport Network Model: (a) Q–Q plot which plots the theoretical quantiles versus the observed response, (b) histogram of the residuals, (c) Q–Q of the logged response. The response is the COVID-19 ex ante risk of 132 connections among the airports in the domestic Indonesian network. [Colour figure can be viewed at https://academic.oup.com/]
To mitigate the risk of violating the normality assumptions of the data, we logged the response. Despite the presence of a few outliers, the resulting log-transformed response variable appears to follow a normal distribution, as suggested by a relatively good alignment of the residuals with the solid line (Figure A2).
As a robustness test, we assess the predictive performance of an alternative probability distribution: the gamma distribution. The gamma distribution takes the original values, which are not logged (it cannot take non-positive values). To make a valid comparison between the results of the normal distribution function based on a logged response, we use the properties that a normal distribution of the log-response is equivalent to a log-normal distribution of the original response. We therefore plug the parameters estimated from the normal distribution into to the density of the log-normal distribution and compute quantile residuals to generate Q–Q plots as well as the AIC, RMSE and MAE to compare the models.
Figure A3 shows for the gamma distribution the Q–Q plot of the theoretical quantiles versus the observed response. It shows a relatively good fit for theoretical quantiles between -1 and 1 but relatively large deviance outside this range of values.
![Diagnostic plots of the Airport Network Model based on a gamma distribution function. Q–Q plot which plots the theoretical quantiles versus the deviance of the residuals. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-12.jpeg?Expires=1748057090&Signature=JucnjMDR9FRvKTLprasnirDaWtugK4QUwR6RV2DKDQUH2IP~roOUsAW20VbHuRZo6PNIhTS5XXN7zD7i9vzJViM31khNULOun-hO5AtqPsKswELx1xxL86gCTvK18Mm9-S0w4Mcym8xqzQ-LWfsyOoBIRKUpCcFFl15j~Racs6URt1Me1JIWiW7FYj9svO0WGnZUABx2haClomJ-FIdT~e2imIlEsJx~xP55dayG25P8XDzMSrTIuWFfJXG8Lcud34CT7Z5y9CNuuN75U384D~vmkCyyNNvO-z1u82RGHLwzYXXPIE8nTTh0RKWtMQDtx2MgXKDHkLXwe3JnPCgihw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Diagnostic plots of the Airport Network Model based on a gamma distribution function. Q–Q plot which plots the theoretical quantiles versus the deviance of the residuals. [Colour figure can be viewed at https://academic.oup.com/]
Table A1 shows the AIC, RMSE and MAE scores computed for the log-normal distribution and the gamma distribution functions. We compute two AIC values. First, we compute the AIC based on the maximum likelihood estimator (MLE). Second, we compute the AIC of the previously selected model specification for both likelihoods. The number of observations is 132 and the response is COVID-19 risk on a natural scale. The first AIC score assesses the fit of the response to a theoretical distribution via MLE and uses the log-likelihood calculated on the estimated parameters. The second one computes the AIC based on the log-likelihood of the respective models. While the first metric assesses the fit of the response to a theoretical distribution, the second one evaluates the goodness-of-fit of the models. The results suggest that the model based on a log-normal distribution has a better predictive performance (lower AIC, RMSE and MAE scores).
Model comparison summary. This table provides the root mean squared error (RMSE) and mean average error (MAE) scores and the Akaike information criterion (AIC) computed using maximum likelihood estimator (MLE) and computed to compare models based on a log-normal and gamma distribution functions
Distribution . | . | . | RMSE . | MAE . |
---|---|---|---|---|
Log-normal | −2716.11 | −2931.93 | 0.89 | 0.60 |
Gamma | −2653.97 | −2826.49 | 1.49 | 1.16 |
Distribution . | . | . | RMSE . | MAE . |
---|---|---|---|---|
Log-normal | −2716.11 | −2931.93 | 0.89 | 0.60 |
Gamma | −2653.97 | −2826.49 | 1.49 | 1.16 |
Model comparison summary. This table provides the root mean squared error (RMSE) and mean average error (MAE) scores and the Akaike information criterion (AIC) computed using maximum likelihood estimator (MLE) and computed to compare models based on a log-normal and gamma distribution functions
Distribution . | . | . | RMSE . | MAE . |
---|---|---|---|---|
Log-normal | −2716.11 | −2931.93 | 0.89 | 0.60 |
Gamma | −2653.97 | −2826.49 | 1.49 | 1.16 |
Distribution . | . | . | RMSE . | MAE . |
---|---|---|---|---|
Log-normal | −2716.11 | −2931.93 | 0.89 | 0.60 |
Gamma | −2653.97 | −2826.49 | 1.49 | 1.16 |
Model selection
The GAMs investigated in this study include both parametric and nonparametric covariates. The sets of these parametric and nonparametric covariates would amount to (a total of) over 18,000 possible combinations. For each vertex-based covariate, we can either include or exclude it. The same holds true for the edge-based covariates, but with the additional option to include them as a smooth term. It would be computationally challenging to assess the predictive performance of all potential models. To reduce the number of investigated models, we exclude models that exhibit combinations of covariates with high degrees of correlation (above 0.8), which is observed in models including eigenvector centrality, similarity and connectivity together. We obtain a total of 4139 potential models. Hence, we split our data in a training (70%) and a testing set (30%) and computed the AIC values of the models, and preselect 20 models that exhibit the lowest AIC (computed on the training data). The in-sample predictive performance of the 4139 models using AIC allows a comparison of a large number of models in a reasonable time. However, the AIC does not provide information about the out-of-sample predictive performance of the models. To select the best model among the 20 preselected models, we assess the out-of-sample predictive performance (using root mean squared error (RMSE) and mean average error (MAE) metrics) of the models on the testing set. To reduce the risk of overfitting, we applied a double penalty approach, following Marra & Wood (Marra & Wood, 2011). This includes two shrinkages on the smooth effects of the GAMs: (a) wiggliness penalty which affects functions in the range space, and (b) shrinkage penalty which affects functions in the penalty null space.
A.3 Spatial propagation model
Data
To predict the COVID-19 importation risk on a fine spatial grid in Indonesia, we use a Bayesian model based on several spatial covariates obtained at resolution covering the entire Indonesian archipelago (388,740 pixels). The spatial covariates include: travel time to the nearest airport () from Weiss et al. (2018), Euclidean distance to the islands Java and Bali (), and traffic network density (), both from Meijer et al. (2018), and population from Facebook Connectivity Lab and Center for International Earth Science Information Network (2021). To avoid numerical issues during the fitting process (Gómez-Rubio, 2020) and to ease comparison among the covariates, we scale each covariate X multiplicatively to values from 0.01 to 0.99 via the following formula:
Missing values are relatively few. The percentage of missing values per covariate is 0.42%, 0.43%, 0.45%, and 2.44%. We use population to remove uninhabited areas by not considering pixels where . Making inference only in regions where populations at risk is common practice in spatial epidemiological studies (see, e.g. Lucas et al. (2021); Weiss et al. (2019))
Prior specifications
The Bayesian setting adopted in this work requires us to set priors on the parameters of the Spatial Propagation Model. We use normally distributed priors ((mean, precision)) for the fixed-effects as follows:
: intercept;
: parameter for travel time to the nearest airport ();
: parameter for traffic density ();
: parameter for Euclidean distance to Java and Bali ();
: parameter for population ().
Furthermore, we use multivariate normally distributed priors on the spatial term (GMRF) ξ(s) (Equation 2):
with precision matrix Q composed of the scaling parameter κ and GMRF precision . The precision matrix is specified in the Matérn covariance function (Equation 3). Without specific a priori knowledge on the unstructured errors, we use R-INLA default priors, which assume .
We specify joint penalised complexity (PC) priors for the hyperparameters of the spatial term, as suggested by Fuglstad et al. (2019). The main principle here is to penalise deviations from a simple benchmark model. The approach allows users to use expert knowledge to define sensible priors through a definition of a threshold associated with exceedance probability. One of the main advantages of this approach lies in its invariance to reparameterisation. This facilitates the specification of the prior as we can construct it without taking the specific parameterisation into account (Simpson et al., 2012).
It is common to penalise complexity by shrinking the range () towards infinity (or very large values) and the marginal standard deviation () towards zero (or very small values) using expert knowledge (Simpson et al., 2012). Accordingly, we assess sensitivity to priors by investigating two cases where we set reasonable threshold values to favour a spatial field with a high range and small standard deviation. To do so, we set a low probability that is above a threshold and that is below a threshold, as follows: and .
Model selection
We compare the conditional predictive ordinate (CPO, Pettit, 1990) and (WAIC, Watanabe & Opper, 2010) of a total of 15 models (), which include all possible non-null subsets of a set of four covariates , , , and ). Table A2 shows the CPO and WAIC values along with the Kolmogorov–Smirnov (K–S) test results on the uniformity of the PITs and their kurtosis computed for each investigated model.
Model predictive performance. This table provides the conditional predictive ordinate (CPO) values for seven model specifications along with the widely applicable information criterion (WAIC). For the two models with highest CPO and lowest WAIC, we report the Kolmogorov–Smirnov (K-S) test on the uniformity of PITs as well as their kurtosis. All model specifications include an intercept and a spatial field (not shown)
. | Model . | CPO . | WAIC . | K-S . | Kurtosis . |
---|---|---|---|---|---|
1 | ACCESS + TRAFFIC + POP | −108.594 | 215.232 | 0.995 | 1.827 |
2 | TRAFFIC + POP | −108.326 | 214.548 | 0.900 | 1.859 |
3 | ACCESS + DIST.JAVA + TRAFFIC + POP | −110.162 | 218.315 | ||
4 | DIST.JAVA + TRAFFIC + POP | −111.063 | 220.787 | ||
5 | ACCESS + DIST.JAVA + POP | −116.044 | 230.438 | ||
6 | DIST.JAVA + POP | −116.699 | 231.435 | ||
7 | ACCESS + POP | −116.196 | 230.917 | ||
8 | POP | −116.565 | 229.608 | ||
9 | ACCESS + DIST.JAVA + TRAFFIC | −111.930 | 222.707 | ||
10 | DIST.JAVA + TRAFFIC | −113.816 | 226.785 | ||
11 | ACCESS + TRAFFIC | −110.811 | 220.167 | ||
12 | TRAFFIC | −111.388 | 221.575 | ||
13 | ACCESS + DIST.JAVA | −120.499 | 239.357 | ||
14 | DIST.JAVA | −121.445 | 242.540 | ||
15 | ACCESS | −122.733 | 240.160 |
. | Model . | CPO . | WAIC . | K-S . | Kurtosis . |
---|---|---|---|---|---|
1 | ACCESS + TRAFFIC + POP | −108.594 | 215.232 | 0.995 | 1.827 |
2 | TRAFFIC + POP | −108.326 | 214.548 | 0.900 | 1.859 |
3 | ACCESS + DIST.JAVA + TRAFFIC + POP | −110.162 | 218.315 | ||
4 | DIST.JAVA + TRAFFIC + POP | −111.063 | 220.787 | ||
5 | ACCESS + DIST.JAVA + POP | −116.044 | 230.438 | ||
6 | DIST.JAVA + POP | −116.699 | 231.435 | ||
7 | ACCESS + POP | −116.196 | 230.917 | ||
8 | POP | −116.565 | 229.608 | ||
9 | ACCESS + DIST.JAVA + TRAFFIC | −111.930 | 222.707 | ||
10 | DIST.JAVA + TRAFFIC | −113.816 | 226.785 | ||
11 | ACCESS + TRAFFIC | −110.811 | 220.167 | ||
12 | TRAFFIC | −111.388 | 221.575 | ||
13 | ACCESS + DIST.JAVA | −120.499 | 239.357 | ||
14 | DIST.JAVA | −121.445 | 242.540 | ||
15 | ACCESS | −122.733 | 240.160 |
Model predictive performance. This table provides the conditional predictive ordinate (CPO) values for seven model specifications along with the widely applicable information criterion (WAIC). For the two models with highest CPO and lowest WAIC, we report the Kolmogorov–Smirnov (K-S) test on the uniformity of PITs as well as their kurtosis. All model specifications include an intercept and a spatial field (not shown)
. | Model . | CPO . | WAIC . | K-S . | Kurtosis . |
---|---|---|---|---|---|
1 | ACCESS + TRAFFIC + POP | −108.594 | 215.232 | 0.995 | 1.827 |
2 | TRAFFIC + POP | −108.326 | 214.548 | 0.900 | 1.859 |
3 | ACCESS + DIST.JAVA + TRAFFIC + POP | −110.162 | 218.315 | ||
4 | DIST.JAVA + TRAFFIC + POP | −111.063 | 220.787 | ||
5 | ACCESS + DIST.JAVA + POP | −116.044 | 230.438 | ||
6 | DIST.JAVA + POP | −116.699 | 231.435 | ||
7 | ACCESS + POP | −116.196 | 230.917 | ||
8 | POP | −116.565 | 229.608 | ||
9 | ACCESS + DIST.JAVA + TRAFFIC | −111.930 | 222.707 | ||
10 | DIST.JAVA + TRAFFIC | −113.816 | 226.785 | ||
11 | ACCESS + TRAFFIC | −110.811 | 220.167 | ||
12 | TRAFFIC | −111.388 | 221.575 | ||
13 | ACCESS + DIST.JAVA | −120.499 | 239.357 | ||
14 | DIST.JAVA | −121.445 | 242.540 | ||
15 | ACCESS | −122.733 | 240.160 |
. | Model . | CPO . | WAIC . | K-S . | Kurtosis . |
---|---|---|---|---|---|
1 | ACCESS + TRAFFIC + POP | −108.594 | 215.232 | 0.995 | 1.827 |
2 | TRAFFIC + POP | −108.326 | 214.548 | 0.900 | 1.859 |
3 | ACCESS + DIST.JAVA + TRAFFIC + POP | −110.162 | 218.315 | ||
4 | DIST.JAVA + TRAFFIC + POP | −111.063 | 220.787 | ||
5 | ACCESS + DIST.JAVA + POP | −116.044 | 230.438 | ||
6 | DIST.JAVA + POP | −116.699 | 231.435 | ||
7 | ACCESS + POP | −116.196 | 230.917 | ||
8 | POP | −116.565 | 229.608 | ||
9 | ACCESS + DIST.JAVA + TRAFFIC | −111.930 | 222.707 | ||
10 | DIST.JAVA + TRAFFIC | −113.816 | 226.785 | ||
11 | ACCESS + TRAFFIC | −110.811 | 220.167 | ||
12 | TRAFFIC | −111.388 | 221.575 | ||
13 | ACCESS + DIST.JAVA | −120.499 | 239.357 | ||
14 | DIST.JAVA | −121.445 | 242.540 | ||
15 | ACCESS | −122.733 | 240.160 |
Model validation
To validate the model, we first assess the reliability of the parameter estimations. Figure A4 (top-left) shows the marginal posterior distribution of the fixed effects (covariate coefficients and intercept), the Gaussian unstructured error terms (precision ), and the hyperparameters (range and marginal standard deviation ) associated with the spatial field ξ. Figure A4 (top-right) displays the marginal posterior distribution of , , and ). Figure A4 (bottom-left) shows the marginal posterior mean and 95% CI of the spatial random effects. Moreover, Figure A4 (bottom-right) illustrates the marginal posterior mean and 95% CI of the linear predictor and fitted values.

Model diagnostics. Top-left: marginal posterior distribution of the fixed effects (covariate coefficients and intercept). Top-right: marginal posterior distribution of the precision of the Gaussian unstructured error terms and the hyperparameters. Bottom-left: marginal posterior mean and 95% CI of the spatial random effects. Bottom-right: marginal posterior mean and 95% CI of the linear predictor and fitted values.
To assess the validity of our model further, we compare the predicted ex ante COVID-19 risk with official surveillance COVID-19 data at regency level (514 second-level administrative divisions of Indonesia) collected from March 2020 to May 2020 (middle of each month) in Indonesia. We focus on a few months following the initial spread of COVID-19 in Indonesia, since the number of reported cases were less affected by government measures that were implemented and reinforced over time following the initial spread. The comparison of the model prediction with data at regency level does not constitute a formal assessment of the model validity. However, it offers an alternative to compare our model results with real data and thereby help us better understand the limits of our models and improve the interpretation of our results. The official surveillance COVID-19 is based on decentralised official online reports of Indonesia's province and regency administrations (38 different websites). A collective initiative of Indonesian volunteers and scientists archived and digitised the daily reports to centrally publish online at https://kawalcovid.id. Kawal COVID (2021) generously shared the data which documents the monthly aggregate infection cases. The original official reports provide infection statistics at the regency level following the Indonesian testing protocol, which includes individuals from two categories: persons under surveillance (ODP) and patients under surveillance (PDP). The first category includes individuals with mild acute respiratory infection symptoms who returned from a high-risk area or have a history of close contact with confirmed cases. The second category consists of patients with a moderate or severe acute respiratory infection such as pneumonia with similar travel history or close contact with infected individuals. We calculate infection cases at the 15th of each month as the sum of both ODP and PDP.
We note that the percentage of asymptomatic individuals is undetermined. A conservative estimate assumes that 15.6% of all infected individuals do not display symptoms (He et al., 2021). Workman (2020) evaluates different studies and assumes that approximately 30% of all disease progression occurs asymptomatically. Also, they estimate that re-infected people have a 48.8% chance of being asymptomatic. Figure A5 shows the predicted mean ex ante (log) COVID-19 risk aggregated for each of the 514 regencies in Indonesia.
![Predicted COVID-19 risk in Indonesia at regency level. The map shows the predicted log-transformed mean ex ante COVID-19 risk in each regency (514 in total) of Indonesia. Data period: December 2019 until early January 2020. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-14.jpeg?Expires=1748057090&Signature=hecKS0wLQXhamL6zpjCeynSwQlux423g-qjvjs-4aQq1A6o-Ne-K3Ts-D7fO3NYhJaIjpXk03pe4egfZkrLg4Dl~7R3sPb8dBtzARqLjMAIeapIQndh~KGN-t8ixBiD~p9SM4xVEute~F0IuCplEQhoDdU2qhPb2Ds7kbL4BMCq~x~mbKbqcExhjRuYjBoyZJCytG4By7m-JXMO~jtdWOhQNo86hQCEEgR1V3avVQMtqdcsUrr4L8afdB~VLFpCzhrLKRtcir-gXJRWreYRenQA5QzofLlCH4Qhx2MA~GIfSoAD6Qvirg1dppxWZsAc-fr4BbaBiYCpC1wxiovctTg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Predicted COVID-19 risk in Indonesia at regency level. The map shows the predicted log-transformed mean ex ante COVID-19 risk in each regency (514 in total) of Indonesia. Data period: December 2019 until early January 2020. [Colour figure can be viewed at https://academic.oup.com/]
We conduct Wilcoxon rank tests between the logged risk and the reported infections in mid-March, mid-April and mid-May 2020. For each month, we reject the null hypothesis that the difference between the pairs follows a symmetric distribution around zero. However, the results suggest a moderate to strong Spearman rank correlation between the logged mean risk and the reported infections in mid-March, mid-April, and mid-May 2020 (see Table A3).
Correlation test at regency level: summary. Wilcoxon rank test and Spearman rank correlation are calculated between the mean prediction of the Spatial Propagation Model and external data set. Time period: model prediction: December 2019–February 2021; the reported infections: mid-March, mid-April and mid-May 2020
Wilcoxon test p-value . | Spearman rank correlation . | Data collection time . |
---|---|---|
0.008 | 0.730 | mid-March 2020 |
<0.001 | 0.535 | mid-April 2020 |
<0.001 | 0.478 | mid-May 2020 |
Wilcoxon test p-value . | Spearman rank correlation . | Data collection time . |
---|---|---|
0.008 | 0.730 | mid-March 2020 |
<0.001 | 0.535 | mid-April 2020 |
<0.001 | 0.478 | mid-May 2020 |
Correlation test at regency level: summary. Wilcoxon rank test and Spearman rank correlation are calculated between the mean prediction of the Spatial Propagation Model and external data set. Time period: model prediction: December 2019–February 2021; the reported infections: mid-March, mid-April and mid-May 2020
Wilcoxon test p-value . | Spearman rank correlation . | Data collection time . |
---|---|---|
0.008 | 0.730 | mid-March 2020 |
<0.001 | 0.535 | mid-April 2020 |
<0.001 | 0.478 | mid-May 2020 |
Wilcoxon test p-value . | Spearman rank correlation . | Data collection time . |
---|---|---|
0.008 | 0.730 | mid-March 2020 |
<0.001 | 0.535 | mid-April 2020 |
<0.001 | 0.478 | mid-May 2020 |
We observe a reasonable correlation between the predicted risk and reported infections, which suggests a relatively good fit between the model predictions and this external source of data. The strength of the correlation appears to decrease over the month. This is expected since more measures were implemented by the Indonesian government over time to reduce the transmission in areas at risk. The validity of these results remain subject to various threats, however. First, the official surveillance data is likely to omit cases given the biased sampling procedure employed to generate the data, which systematically ignores asymptomatic cases. Second, we observe temporal discrepancies between the data used in the Spatial Propagation Model to predict the ex ante risk of COVID-19 (December 2019 to January 2020) and the data set collected between March 2020 and May 2020 (in the middle of each month).
Robustness tests
As a first robustness check, we conduct a sensitivity analysis for the PC priors associated with the hyperparameters of the spatial field. The spatial models described in the manuscript use a threshold for the marginal standard deviation () equal to 2 and a threshold for the spatial range () equal to 0.48. To assess sensitivity to prior changes, we run four additional models that combine the following threshold values associated with : and with : .
Table A4 shows the parameter values estimated for the model presented in the manuscript and the four additional model with changes in the spatial parameter priors. To visually assess the effect of changes in the priors, Figure A6 illustrates the estimated posterior distribution of the model parameters. We observe that changes in the priors affect the estimates of the spatial range (mean values between 1.00 and 3.52) and marginal variance (mean values between 0.23 and 6.29). This matches expectations since the use of PC priors leads to important constraints on the parameters associated with the spatial field. Furthermore, given the small number of 78 observations, one cannot rule out sensitivity to prior changes. However, the estimates of the fixed effects are not affected by these changes, which provides additional support for the validity of the effects of our covariates on the risk described in the manuscript.
![Spatial Propagation Model: robustness check. Diagnostic plots for the investigated spatial models. Each colour represents the result of a model using a different combination of PC priors. Top: marginal posterior distribution of the fixed effects (covariate coefficients and intercept). Middle: marginal posterior distribution of the precision of the Gaussian unstructured error terms and the hyperparameters. Bottom: marginal posterior mean and 95% CI of the spatial random effects. [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/185/4/10.1111_rssa.12866/1/m_rssa_185_4_2121_fig-15.jpeg?Expires=1748057090&Signature=1YuS7l8pJmVm2IWvhzyI8j8Ydi3W~2IogAIxMjsK9octY3enQG-ChCwpE~T0daYFd3w5-Qrfj-lIB9jEk841IJABZguBn7teXA38rsoHBxERVtJDE9sDS9v~v-VGBl-MKts4Xw5lz-9VrqGUsjaf6iXjNGOWL9BCZZ9q1B5hWXILZHXqocallLGVYP5JxaHgQ4002bTdAt9HvZGBORewC88yMtgvX7ncC7Wil9xAOBPAtXH2zNz9aZcHz0o8aB5i3abdzLXwfHlD4~ryk85p4vzeyfvLPxu~ZgOzwIT2CQkptPNV6p80s78aCYnjluFEhZ699rs6~QQNW6wbWP55tw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Spatial Propagation Model: robustness check. Diagnostic plots for the investigated spatial models. Each colour represents the result of a model using a different combination of PC priors. Top: marginal posterior distribution of the fixed effects (covariate coefficients and intercept). Middle: marginal posterior distribution of the precision of the Gaussian unstructured error terms and the hyperparameters. Bottom: marginal posterior mean and 95% CI of the spatial random effects. [Colour figure can be viewed at https://academic.oup.com/]
Spatial Propagation Model: prior sensitivity analysis. This table shows (for each combination of and ) the mean, median, 95% credible intervals with lower () and upper () bounds, and the standard deviation (STD) associated with the estimation of the covariate coefficients (Fixed effects), precision of the random effects () associated with the unstructured errors, and the hyperparameters of the spatial field (range and marginal variance ). Nb. of obs.: 78 domestic airports in Indonesia
Parameters . | . | . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|---|---|
Fixed effects | |||||||
1.75 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.71 | −0.51 | −0.51 | −0.77 | −0.25 | 0.13 | |
2.00 | 0.48 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.71 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.24 | 0.46 | 0.46 | 0.21 | 0.70 | 0.12 | |
1.75 | 0.71 | 0.47 | 0.47 | 0.22 | 0.72 | 0.13 | |
2.00 | 0.48 | 0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
2.25 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.71 | 0.19 | 0.19 | 0.00 | 0.37 | 0.09 | |
1.75 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
1.75 | 0.71 | 0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
2.00 | 0.48 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.24 | −0.31 | −0.31 | −0.51 | −0.12 | 0.10 | |
2.25 | 0.71 | −0.32 | −0.32 | −0.52 | −0.12 | 0.10 | |
intercept | 1.75 | 0.24 | −3.15 | −3.15 | −4.44 | −1.91 | 0.64 |
1.75 | 0.71 | −3.16 | −3.15 | −4.46 | −1.90 | 0.65 | |
2.00 | 0.48 | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.24 | −3.16 | −3.15 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.71 | −3.15 | −3.14 | −4.47 | −1.89 | 0.66 | |
Random effects | |||||||
Spatial range | 1.75 | 0.24 | 2.84 | 2.51 | 0.67 | 6.97 | 1.66 |
1.75 | 0.71 | 1.35 | 1.10 | 0.29 | 3.88 | 0.96 | |
2.00 | 0.48 | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 | |
2.25 | 0.24 | 2.80 | 2.59 | 0.82 | 5.99 | 1.35 | |
2.25 | 0.71 | 0.63 | 0.45 | 0.03 | 2.20 | 0.60 | |
Marginal variance | 1.75 | 0.24 | 0.97 | 0.51 | 0.12 | 4.68 | 1.43 |
1.75 | 0.71 | 6.73 | 1.69 | 0.08 | 47.39 | 17.90 | |
2.00 | 0.48 | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 | |
2.25 | 0.24 | 0.63 | 0.39 | 0.12 | 2.55 | 0.72 | |
2.25 | 0.71 | 9.89 | 2.79 | 0.31 | 63.64 | 27.47 | |
Precision | 1.75 | 0.24 | 1.17 | 1.17 | 0.75 | 1.60 | 0.22 |
1.75 | 0.71 | 1.23 | 1.21 | 0.80 | 1.81 | 0.26 | |
2.00 | 0.48 | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 | |
2.25 | 0.24 | 1.19 | 1.19 | 0.75 | 1.61 | 0.22 | |
2.25 | 0.71 | 1.20 | 1.19 | 0.80 | 1.64 | 0.22 |
Parameters . | . | . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|---|---|
Fixed effects | |||||||
1.75 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.71 | −0.51 | −0.51 | −0.77 | −0.25 | 0.13 | |
2.00 | 0.48 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.71 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.24 | 0.46 | 0.46 | 0.21 | 0.70 | 0.12 | |
1.75 | 0.71 | 0.47 | 0.47 | 0.22 | 0.72 | 0.13 | |
2.00 | 0.48 | 0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
2.25 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.71 | 0.19 | 0.19 | 0.00 | 0.37 | 0.09 | |
1.75 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
1.75 | 0.71 | 0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
2.00 | 0.48 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.24 | −0.31 | −0.31 | −0.51 | −0.12 | 0.10 | |
2.25 | 0.71 | −0.32 | −0.32 | −0.52 | −0.12 | 0.10 | |
intercept | 1.75 | 0.24 | −3.15 | −3.15 | −4.44 | −1.91 | 0.64 |
1.75 | 0.71 | −3.16 | −3.15 | −4.46 | −1.90 | 0.65 | |
2.00 | 0.48 | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.24 | −3.16 | −3.15 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.71 | −3.15 | −3.14 | −4.47 | −1.89 | 0.66 | |
Random effects | |||||||
Spatial range | 1.75 | 0.24 | 2.84 | 2.51 | 0.67 | 6.97 | 1.66 |
1.75 | 0.71 | 1.35 | 1.10 | 0.29 | 3.88 | 0.96 | |
2.00 | 0.48 | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 | |
2.25 | 0.24 | 2.80 | 2.59 | 0.82 | 5.99 | 1.35 | |
2.25 | 0.71 | 0.63 | 0.45 | 0.03 | 2.20 | 0.60 | |
Marginal variance | 1.75 | 0.24 | 0.97 | 0.51 | 0.12 | 4.68 | 1.43 |
1.75 | 0.71 | 6.73 | 1.69 | 0.08 | 47.39 | 17.90 | |
2.00 | 0.48 | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 | |
2.25 | 0.24 | 0.63 | 0.39 | 0.12 | 2.55 | 0.72 | |
2.25 | 0.71 | 9.89 | 2.79 | 0.31 | 63.64 | 27.47 | |
Precision | 1.75 | 0.24 | 1.17 | 1.17 | 0.75 | 1.60 | 0.22 |
1.75 | 0.71 | 1.23 | 1.21 | 0.80 | 1.81 | 0.26 | |
2.00 | 0.48 | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 | |
2.25 | 0.24 | 1.19 | 1.19 | 0.75 | 1.61 | 0.22 | |
2.25 | 0.71 | 1.20 | 1.19 | 0.80 | 1.64 | 0.22 |
Spatial Propagation Model: prior sensitivity analysis. This table shows (for each combination of and ) the mean, median, 95% credible intervals with lower () and upper () bounds, and the standard deviation (STD) associated with the estimation of the covariate coefficients (Fixed effects), precision of the random effects () associated with the unstructured errors, and the hyperparameters of the spatial field (range and marginal variance ). Nb. of obs.: 78 domestic airports in Indonesia
Parameters . | . | . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|---|---|
Fixed effects | |||||||
1.75 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.71 | −0.51 | −0.51 | −0.77 | −0.25 | 0.13 | |
2.00 | 0.48 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.71 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.24 | 0.46 | 0.46 | 0.21 | 0.70 | 0.12 | |
1.75 | 0.71 | 0.47 | 0.47 | 0.22 | 0.72 | 0.13 | |
2.00 | 0.48 | 0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
2.25 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.71 | 0.19 | 0.19 | 0.00 | 0.37 | 0.09 | |
1.75 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
1.75 | 0.71 | 0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
2.00 | 0.48 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.24 | −0.31 | −0.31 | −0.51 | −0.12 | 0.10 | |
2.25 | 0.71 | −0.32 | −0.32 | −0.52 | −0.12 | 0.10 | |
intercept | 1.75 | 0.24 | −3.15 | −3.15 | −4.44 | −1.91 | 0.64 |
1.75 | 0.71 | −3.16 | −3.15 | −4.46 | −1.90 | 0.65 | |
2.00 | 0.48 | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.24 | −3.16 | −3.15 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.71 | −3.15 | −3.14 | −4.47 | −1.89 | 0.66 | |
Random effects | |||||||
Spatial range | 1.75 | 0.24 | 2.84 | 2.51 | 0.67 | 6.97 | 1.66 |
1.75 | 0.71 | 1.35 | 1.10 | 0.29 | 3.88 | 0.96 | |
2.00 | 0.48 | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 | |
2.25 | 0.24 | 2.80 | 2.59 | 0.82 | 5.99 | 1.35 | |
2.25 | 0.71 | 0.63 | 0.45 | 0.03 | 2.20 | 0.60 | |
Marginal variance | 1.75 | 0.24 | 0.97 | 0.51 | 0.12 | 4.68 | 1.43 |
1.75 | 0.71 | 6.73 | 1.69 | 0.08 | 47.39 | 17.90 | |
2.00 | 0.48 | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 | |
2.25 | 0.24 | 0.63 | 0.39 | 0.12 | 2.55 | 0.72 | |
2.25 | 0.71 | 9.89 | 2.79 | 0.31 | 63.64 | 27.47 | |
Precision | 1.75 | 0.24 | 1.17 | 1.17 | 0.75 | 1.60 | 0.22 |
1.75 | 0.71 | 1.23 | 1.21 | 0.80 | 1.81 | 0.26 | |
2.00 | 0.48 | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 | |
2.25 | 0.24 | 1.19 | 1.19 | 0.75 | 1.61 | 0.22 | |
2.25 | 0.71 | 1.20 | 1.19 | 0.80 | 1.64 | 0.22 |
Parameters . | . | . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|---|---|
Fixed effects | |||||||
1.75 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.71 | −0.51 | −0.51 | −0.77 | −0.25 | 0.13 | |
2.00 | 0.48 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.24 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
2.25 | 0.71 | −0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
1.75 | 0.24 | 0.46 | 0.46 | 0.21 | 0.70 | 0.12 | |
1.75 | 0.71 | 0.47 | 0.47 | 0.22 | 0.72 | 0.13 | |
2.00 | 0.48 | 0.46 | 0.46 | 0.21 | 0.71 | 0.13 | |
2.25 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.71 | 0.19 | 0.19 | 0.00 | 0.37 | 0.09 | |
1.75 | 0.24 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
1.75 | 0.71 | 0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
2.00 | 0.48 | 0.19 | 0.19 | 0.00 | 0.38 | 0.10 | |
2.25 | 0.24 | −0.31 | −0.31 | −0.51 | −0.12 | 0.10 | |
2.25 | 0.71 | −0.32 | −0.32 | −0.52 | −0.12 | 0.10 | |
intercept | 1.75 | 0.24 | −3.15 | −3.15 | −4.44 | −1.91 | 0.64 |
1.75 | 0.71 | −3.16 | −3.15 | −4.46 | −1.90 | 0.65 | |
2.00 | 0.48 | −3.16 | −3.16 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.24 | −3.16 | −3.15 | −4.45 | −1.91 | 0.65 | |
2.25 | 0.71 | −3.15 | −3.14 | −4.47 | −1.89 | 0.66 | |
Random effects | |||||||
Spatial range | 1.75 | 0.24 | 2.84 | 2.51 | 0.67 | 6.97 | 1.66 |
1.75 | 0.71 | 1.35 | 1.10 | 0.29 | 3.88 | 0.96 | |
2.00 | 0.48 | 1.54 | 1.29 | 0.26 | 4.21 | 1.05 | |
2.25 | 0.24 | 2.80 | 2.59 | 0.82 | 5.99 | 1.35 | |
2.25 | 0.71 | 0.63 | 0.45 | 0.03 | 2.20 | 0.60 | |
Marginal variance | 1.75 | 0.24 | 0.97 | 0.51 | 0.12 | 4.68 | 1.43 |
1.75 | 0.71 | 6.73 | 1.69 | 0.08 | 47.39 | 17.90 | |
2.00 | 0.48 | 2.89 | 1.04 | 0.12 | 17.34 | 6.43 | |
2.25 | 0.24 | 0.63 | 0.39 | 0.12 | 2.55 | 0.72 | |
2.25 | 0.71 | 9.89 | 2.79 | 0.31 | 63.64 | 27.47 | |
Precision | 1.75 | 0.24 | 1.17 | 1.17 | 0.75 | 1.60 | 0.22 |
1.75 | 0.71 | 1.23 | 1.21 | 0.80 | 1.81 | 0.26 | |
2.00 | 0.48 | 1.17 | 1.16 | 0.77 | 1.60 | 0.21 | |
2.25 | 0.24 | 1.19 | 1.19 | 0.75 | 1.61 | 0.22 | |
2.25 | 0.71 | 1.20 | 1.19 | 0.80 | 1.64 | 0.22 |
In addition, we assess the effects of changes in the response obtained from three Airport Network Model scenarios (I: lower, II: mean, III: upper) on the estimation of the parameters of the Spatial Propagation Model. The results of scenario II are reported in the manuscript (Table 2). The results of scenarios I and III are reported in Tables A5 and A6.
Spatial Propagation Model: Lower Bound (scenario I). Mean, median, 95% credible intervals with lower (CIl) and upper (CIu) bounds, and standard deviations (std) associated with the covariate effects (Fixed effects), precision of the unstructured errors (), and hyperparameters of the spatial field (range and marginal variance )
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.51 | −0.51 | −0.77 | −0.26 | 0.13 | |
0.48 | 0.48 | 0.22 | 0.72 | 0.13 | |
0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
intercept | −3.18 | −3.18 | −4.48 | −1.92 | 0.65 |
Random effects | |||||
Spatial range | 1.64 | 1.39 | 0.34 | 4.40 | 1.07 |
Marginal variance | 2.51 | 1.06 | 0.17 | 14.00 | 4.80 |
Precision | 1.17 | 1.16 | 0.75 | 1.65 | 0.23 |
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.51 | −0.51 | −0.77 | −0.26 | 0.13 | |
0.48 | 0.48 | 0.22 | 0.72 | 0.13 | |
0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
intercept | −3.18 | −3.18 | −4.48 | −1.92 | 0.65 |
Random effects | |||||
Spatial range | 1.64 | 1.39 | 0.34 | 4.40 | 1.07 |
Marginal variance | 2.51 | 1.06 | 0.17 | 14.00 | 4.80 |
Precision | 1.17 | 1.16 | 0.75 | 1.65 | 0.23 |
Spatial Propagation Model: Lower Bound (scenario I). Mean, median, 95% credible intervals with lower (CIl) and upper (CIu) bounds, and standard deviations (std) associated with the covariate effects (Fixed effects), precision of the unstructured errors (), and hyperparameters of the spatial field (range and marginal variance )
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.51 | −0.51 | −0.77 | −0.26 | 0.13 | |
0.48 | 0.48 | 0.22 | 0.72 | 0.13 | |
0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
intercept | −3.18 | −3.18 | −4.48 | −1.92 | 0.65 |
Random effects | |||||
Spatial range | 1.64 | 1.39 | 0.34 | 4.40 | 1.07 |
Marginal variance | 2.51 | 1.06 | 0.17 | 14.00 | 4.80 |
Precision | 1.17 | 1.16 | 0.75 | 1.65 | 0.23 |
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.51 | −0.51 | −0.77 | −0.26 | 0.13 | |
0.48 | 0.48 | 0.22 | 0.72 | 0.13 | |
0.19 | 0.19 | −0.00 | 0.38 | 0.10 | |
intercept | −3.18 | −3.18 | −4.48 | −1.92 | 0.65 |
Random effects | |||||
Spatial range | 1.64 | 1.39 | 0.34 | 4.40 | 1.07 |
Marginal variance | 2.51 | 1.06 | 0.17 | 14.00 | 4.80 |
Precision | 1.17 | 1.16 | 0.75 | 1.65 | 0.23 |
Spatial Propagation Model: Upper Bound (scenario III). Mean, median, 95% credible intervals with lower () and upper () bounds, and standard deviations (std) associated with the covariate effects (Fixed effects), precision of the unstructured errors () and hyperparameters of the spatial field (range and marginal variance )
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.45 | 0.46 | 0.21 | 0.70 | 0.12 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.09 | |
intercept | −3.13 | −3.12 | −4.41 | −1.88 | 0.64 |
Random effects | |||||
Spatial range | 2.03 | 0.96 | 0.20 | 10.67 | 3.39 |
Marginal variance | 1.55 | 1.40 | 0.37 | 3.62 | 0.86 |
Precision | 1.20 | 1.20 | 0.76 | 1.66 | 0.23 |
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.45 | 0.46 | 0.21 | 0.70 | 0.12 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.09 | |
intercept | −3.13 | −3.12 | −4.41 | −1.88 | 0.64 |
Random effects | |||||
Spatial range | 2.03 | 0.96 | 0.20 | 10.67 | 3.39 |
Marginal variance | 1.55 | 1.40 | 0.37 | 3.62 | 0.86 |
Precision | 1.20 | 1.20 | 0.76 | 1.66 | 0.23 |
Spatial Propagation Model: Upper Bound (scenario III). Mean, median, 95% credible intervals with lower () and upper () bounds, and standard deviations (std) associated with the covariate effects (Fixed effects), precision of the unstructured errors () and hyperparameters of the spatial field (range and marginal variance )
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.45 | 0.46 | 0.21 | 0.70 | 0.12 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.09 | |
intercept | −3.13 | −3.12 | −4.41 | −1.88 | 0.64 |
Random effects | |||||
Spatial range | 2.03 | 0.96 | 0.20 | 10.67 | 3.39 |
Marginal variance | 1.55 | 1.40 | 0.37 | 3.62 | 0.86 |
Precision | 1.20 | 1.20 | 0.76 | 1.66 | 0.23 |
Parameters . | Mean . | Median . | 95% . | 95% . | STD . |
---|---|---|---|---|---|
Fixed effects | |||||
−0.50 | −0.50 | −0.76 | −0.25 | 0.13 | |
0.45 | 0.46 | 0.21 | 0.70 | 0.12 | |
0.19 | 0.19 | 0.00 | 0.38 | 0.09 | |
intercept | −3.13 | −3.12 | −4.41 | −1.88 | 0.64 |
Random effects | |||||
Spatial range | 2.03 | 0.96 | 0.20 | 10.67 | 3.39 |
Marginal variance | 1.55 | 1.40 | 0.37 | 3.62 | 0.86 |
Precision | 1.20 | 1.20 | 0.76 | 1.66 | 0.23 |
All parameter estimations are consistent with the main results except the covariate coefficient associated with the variable population , which contains a range of negative values in the 95% CI.