Corona and coffee on your commute: a spatial analysis of COVID-19 mortality and commuting flows in England in 2020

Abstract Background The COVID-19 pandemic forced governments to implement lockdown policies to curb the spread of the disease. These policies explicitly encouraged homeworking, hence reducing the number of commuters with the implicit assumption that restricting peoples’ movement reduces risk of infection for travellers and other people in their areas of residence and work. Yet, the spatial interrelation of different areas has been rarely addressed both in the public discourse and in early accounts of the various consequences of COVID-19. Methods Our study proposes a spatial analysis of the association between commuting flows and COVID-19 mortality in England between March and June 2020, using a range of publicly available area-level data. To account for spatial correlation, we used a structural mobility gravity model to analyze commuting flows between Local Authority Districts. By accounting for these spatial dependencies, we temper concerns of bias and inefficiency affecting simple linear estimates. Additionally, we disentangle the direct and indirect (from other areas) influence of commuting on COVID-19 mortality. Results The results of our spatial regression models suggest that higher commuting flows—in general and particularly by public transport—are associated with higher COVID-19 mortality. Our results are consistent with a reduction in COVID-related mortality after the introduction of a national lockdown in March. The spatial-lag term is statistically significant, highlighting the importance of accounting for spatial dependencies. Conclusion We suggest that considering spatial interactions through commuting or travel motivations may offer interesting perspectives on the trade-off between health and economic activity during lockdowns.


Introduction
D uring the initial COVID-19 pandemic in 2020, the response of governments across the world was characterized by a widespread use of non-pharmaceutical interventions. 1 The strictest form of interventions aimed to restrict physical interactions as much as possible and reduce an individual's freedom of movement. 2 These policies often took the form of national and/or regional 'lockdowns', where individuals were instructed to stay in their homes wherever possible. The public health rationale behind lockdowns was the risk of disease spread associated with movement, gathering and mixing of people from different households. 3 In economic terms, these restrictions represented an example of a response to the potential negative externalities imposed by individual behaviour on broader society. 4 With slight differences in type and timing, lockdowns were implemented almost everywhere across Europe. After initial hesitancy, the UK introduced a first national lockdown on 23 March 2020, 5 when the Prime Minister Johnson instructed British citizens to 'stay at home'.
Commuting is the interaction of the housing and labour markets and typically involves employees travelling to and from work. As such, commuting was discouraged during the strictest periods of lockdown in the UK 5 and elsewhere, 6 with calls for the adoption of homeworking arrangements for everyone except essential workers. Commuting to work rather than working from home induces countless opportunities for infection from exposure to environmental factors and the interaction with other co-workers, with the related risk for commuters of transmitting the disease to members of their households and workplaces. A crucial factor exacerbating the risk of infection associated with commuting to and from work is the mode of transport. Private modes of travel (e.g. own car, motorcycle, bicycle or walking) clearly offer less opportunities for contagion compared to public transport aggregating large number of individuals, as suggested by evidence that trains and buses contributed to a large share of infections in the early phases of the pandemic. 7 Beyond commuting and intra-national mobility, the international spread of COVID-19 was also assisted by cross-border work 8 and international trade. 9 Despite the fact that lockdowns were aimed at restricting movement of people, this specific spatial dimension of infections is often overlooked in the growing body of empirical and theoretical research addressing COVID-19. Notable exceptions to this trend include Birge et al., 10 Glaeser et al. 11 and Fajgelbaum et al. 12 Gravity models provide an interesting theory-grounded approach to account for commuting flows. This concept, which has traditionally been used by economist to study international trade, inspired work by Cuñat and Zymek. 13 In a nutshell, the authors derive a structural gravity model for commuting flows based on agents' endogenous choices of work location with given characteristics. This structural gravity model offers a microfounded framework to assess the welfare implications of restrictions to mobility. Furthermore, since commuting flows are a known vector for disease spread, the resulting structural mobility gravity model offers a useful framework to study the spatial spread of disease.
We study the association between commuting and mortality related to COVID-19 between March and June 2020 allowing for spatial correlation across English Local Authority Districts (LADs). LADs are geographical areas with decentralized responsibility for the provision of public services of various types (Supplementary Appendix SA). As of 2020, England is divided into 314 LADs. The spatial interaction between LADs is modelled using weights based on the share of people commuting from one LAD to others (Supplementary Appendix SB).
Our analytical approach tests the following hypotheses: H1. The higher the percentage of employed people who live in a LAD and work in the 'same' LAD, the lower the COVID-19 mortality rate.
H2. The higher the percentage of a LAD's workforce who work in a different LAD 'and travel by public transport,' the higher the COVID-19 mortality rate.
H3. The effects tested in H1 and H2 are stronger in the earlier months compared to later months.

Data
We collated data from a range of publicly available sources. Our main dependent variable is COVID-19 mortality, which is published on a monthly basis at LAD level by the Office for National Statistics (ONS). Specifically, we used age/sex standardized monthly mortality rates (per 100 000 population) for the 4-month period from March to June 2020, as well as the monthly mortality rates. Although mortality is an undesirable endpoint of infection and overall disease spread is best represented by reported COVID-19 cases or hospitalizations, these measures are unquestionably related. We focus on COVID-19 mortality for England primarily to overcome the potentially huge heterogeneity in the accuracy of reported COVID-19 cases of resulting from wide variability in testing policies across regions and countries in the UK, but also to focus on a more informative measure of the health damage caused on communities by the disease. Notably, the ONS definition for COVID-19 mortality is consistent across areas and time and hence unaffected by differences in testing capacity. In Supplementary Appendix SC4, we also include analyses for overall mortality. Most other covariates at the level of LAD and for year 2019 (i.e. pre-COVID to avoid reverse causation) were obtained from the ONS through the online portal NOMIS. This includes information on population size, gender distribution, age structure, ethnicity, education, labour force status and mortality rates attributable to respiratory illness from 2019. The ONS also provided the vector data for LAD boundaries used to generate maps. We also obtained population density from the ONS. This indicator encapsulates valuable information about a LAD's rurality and about living arrangements of its population (i.e. areas with higher density may be assumed to offer more opportunities for infection). The health and disability domain of the Index of Multiple Deprivation (IMD), published by the Ministry of Housing, Communities and Local Government, was obtained from the Government open data portal. Finally, we obtained georeferenced data on care home beds and ratings across England from the Care Quality Commission, which we used to compute LAD level rates of care home beds per person and share of providers with different ratings. No ethical clearance was required for this analysis of secondary publicly available data. A full list of data sources is included in the Supplementary Appendix SC1.
We obtained figures on commuting flows between LADs from publicly available 2011 Census data. 13 The ONS collected detailed information on location of residence and work, as well as type, frequency and mode of commuting for the whole UK population. Changes in LAD boundaries between 2011 and 2020 resulted in the loss of 14 observations, limiting our analysis to a set of N ¼ 298 LADs in England. Our main exposure is the percentage of people who live and work in the same LAD: HW. Supplementary Appendix SB2 describes the derivation of HW in detail.
The lack of alternative data sources forced us to rely on 2011 Census data on commuting flows, which is arguably old. However, we have no reason to believe that overall commuting patterns have changed radically in the last 9 years. Figure 1 shows maps describing the distribution of the combined 4-month COVID-19 mortality at LAD level and share of people living and working in the same LAD. The maps clearly point to some degree of spatial clustering in the data, but do not reveal a clear-cut relationship.
With an analogous approach, we used 2011 Census data on mode of transport for commuting to obtain a variable indicating the share of workers in each LAD commuting to another LAD with public transport. Our definition of 'travel by public transport' included bus, train, metro and light rail. Table 1 shows summary statistics for our main outcome, exposure and covariates.

Empirical approach
Our study is primarily focused on the relationship between COVID-19 mortality and the extent to which people commute to LADs other than the one where they live, at all and specifically using public transport. Figure 2 shows the bivariate plots for COVID-19 mortality with (i) share of home workers and (ii) share of people commuting out with public transport.
In both cases, the graph seems to suggest the existence of a relationship between commuting and COVID-19 mortality between March and June 2020. On the one hand, a larger share of the workforce living and working in the same LAD is negatively associated with COVID-19 mortality in the same LAD. On the other hand, the share of people living in one LAD and commuting to another by public transport shows a positive association with COVID-19 mortality. However, the simple bivariate relationships above do not account for a range of other factors known to correlate with COVID-19 mortality. As suggested by figure 1, COVID-19 mortality and commuting flows are likely to entail spatial dependencies, which would invalidate ordinary least squares (OLS) assumptions. Spatial dependencies may be driven by people living and working close in space, or by an economic geography characterized by clustered areas attracting people for work (large urban areas, industrial clusters, etc.) and others where people predominantly live and work in the same area. The existence of spatial correlation is supported by Moran's I test, 14 which clearly rejects the null of homoscedasticity at conventional confidence levels, favouring the hypothesis of spatially correlated errors (exact test results are reported in figure 2's notes). Besides attenuating the risk of bias and inefficiency related to spatial correlation, modelling explicitly the spatial dependencies allows to disentangle direct (own LAD) and indirect (related to changes in other LADs) effects of commuting on COVID-19 mortality (Supplementary Appendix SD).
Our main specification includes spatial lags in mortality rates and in the error terms: 15 where subscript l refers to LADs, M l is COVID-19 mortality (either monthly or 4-month mortality between March and June 2020), C l is the exposure variable of interest related to commuting flows (either HW l for the percentage of workers living and working in the same LAD or PT l for the percentage of people commuting for work to a different LAD by public transport) and X l is a vector of covariates, which we describe in table 1. The error term u l includes a spatially correlated component u k and an orthogonal component l k , where q models the spatial error autocorrelation structure. The coefficients of interest can be consistently estimated using a generalized twostage least squares approach. The core of our spatial regression model is the square weight matrix W, which gives larger weights to 'nearby' LADs more likely to affect the outcomes of a given LAD l. Spatial regression models typically use weight matrices based on contiguity (only neighbouring LADs are assumed to be correlated) or inverse distance (LADs geographically closer get a higher weight). Our approach departs from standard spatial analysis, proposing a weight matrix based on an underlying structural mobility gravity à la Cuñat and Zymek, 13 such that LADs are related one to another by the share of people commuting between them. A graphical representation, derivation and definition for W are provided in Supplementary Appendix SB. W satisfies all properties required for a weight matrix in a spatial regression model: it is (semi-)positive definite and has zero on lead diagonals to avoid double weighting of own LAD characteristics. 15 Our focus in W is on the relative share of commuting flows originating from LAD l and directed to other LADs, irrespective of differences in baseline share of homeworkers in LAD l. This should mitigate concerns of endogeneity of the weight matrix, as there is no obvious reason for higher shares of homeworkers to affects the patterns of commuting across nearby LADs, beyond spatial contiguity. It is worth noting that our matrix W is not symmetric: the share of people commuting l ! k is allowed to differ from the share of people commuting k ! l, given that l 6 ¼ k. Although symmetric spatial weight matrices are often used in (and implied by) applications based on geographical distances, symmetry is not a necessary condition. 16 A non-symmetric matrix in this case allows for a more flexible modelling of the spatial interactions in the outcome.
We define the weight matrix based on commuting flows rather than distance for various reasons. First, since people are unlikely to live and work in very distant LADs, commuting flows and (inverse) distance convey at least partially the same information. Second, since COVID-19 is transmitted from person to person (rather than through broader geographic or environmental factors), commuting flows seem more relevant to the dynamics of this specific disease compared to arbitrary distance-related weights. Finally, linking back to our motivation in Methods section, this mobility gravity approach offers microeconomic theoretical underpinnings for the spatial interactions observed in the data, potentially allowing welfare evaluations. Table 1 reports the summary statistics for the key variables used. The average four-month COVID-19 mortality rate was 87.5 per 100 000 and this was highest in April (52.9 per 100 000). There was substantial variation in the two commuting variables considered. On average, 50% of the employed people who lived in an LAD worked in the same LAD. About 17% of those that did commute to another LAD did so by public transport.

Summary statistics
On average, 17% of people aged over 16 years of age have no qualifications, although there is a lot of variability in this variable. The male to female ratio is approximately parity.
The unemployment rate pre-COVID varied substantially across England, with some areas having an unemployment rate of 0.9% and some as high as 7.2%. This was linked strongly with deprivation. Around 17% of the population are aged over 65 years of age, varying from 6.1% to 28.8%. On average, 89% of the population is white, but there is substantial variation here, with a range of 29-99%. 2019 mortality rates attributable to respiratory diseases also varied greatly. Generally, LADs with higher mortality rates tended to be in more deprived areas. A similar pattern was observed for the availability and quality of care home beds across England.

The effect of commuting flows on COVID-19 mortality rates
In table 2, we present the results from three forms of Equation (1). The first column is a simple univariate association between the percentage of a LAD's workforce who live and work in the same LAD and the four-month COVID-19 mortality rate in the LAD. The raw association (figure 2) is À0.824, indicating that a 10 percentage point increase in the number of people commuting in and out of another LAD increased COVID-19 mortality by 8.2 deaths per 100 000 (P ( 0.001). In column (2), we include additional control variables known to affect the prevalanece of COVID-19. The association remains, but reduces in magnitude as expected. Here, a 10 percentage point increase in the number of people commuting in and out of another LAD increased COVID-19 mortality by 6.8 deaths per 100 000 (P ( 0.001). Notably, the results in columns (1) and (2) are likely to be biased and measured imprecisely due to spatial autocorrelation in COVID-19 mortality across LADs, as suggested visually by figure 1 and statistically by Moran's I (see exact test results in the notes to figure 2). Hence, in column (3), we present the results from the model that explicitly accounts for the spatial dependence of COVID-19 spread. Besides a sligthly improved goodness of fit, the spatial model in (3) shows a fairly large (0.64) and strongly significant spatial autocorrelation coefficient. This points to the validity of our spatial approach over OLS. Unfortunately, explicit statistical statistical testing procedures for model selection (OLS vs. spatial, for example the comparison of information criteria) are limited by the generalized two-stages least squares procedure used to estimate our spatial model. The spatial model implies direct and indirect effects of covariates, which can be averaged to obtain average marginal effects, 17 as we explain in Supplementary Appendix SD. The direct effect of a 10% increase in the share of people commuting to other LADs on COVID-19 mortality is an increase of about 2.75 deaths per 100 000 (P < 0.05). The indirect effect from increases in commuting flows across other LADS is about 4.4 additional deaths per 100 000 (P < 0.05). The corresponding overall effect is estimated to be around 7.2 deaths per 100 000 (P < 0.05). More detailed discussion of the estimated coefficients for other covariates are included in Supplementary Appendix SC5, whilst in Supplementary Appendix SC4, we discuss results of similar models for all-cause mortality.
Additionally, in Supplementary Appendix SC3 we include the results for the full spatial specification [Column (3) of table 2] but instead consider the monthly mortality rates as the outcome. As expected, and consitent with the idea that lockdown policies reduced the transmission of COVID-19, the main effect is coming through in the March mortality rates. In March, there was a very strongly statistically significant relationship between commuting flows and mortality. This effect reduced in magnitude and statistical significance once the lockdown policies were introduced.

The effect of commuting by public transport on COVID-19 mortality rates
In Supplementary Appendix SC3, we consider the effect of commuting by public transport. Similar to the main effects, there is a very strongly significant association with the March COVID-19 mortality rate. Before lockdown, areas that had higher shares of workers commuting in and out using public transport experienced higher mortality rates. However, this association reduced to close to zero when lockdown policies were implemented.

Conclusions
We found strong evidence to support the idea that areas with higher commuting flows were more susceptible to higher COVID-19 mortality rates. This is most likely due to the fact that there are more people travelling in and out and bringing in the infection from other areas with potentially higher infection rates, either falling ill themselves if vulnerable or spreading the disease in their areas of residence. Our analytical approach builds on a spatial regression model, assuming a weight matrix based on commuting flows between LADs. Overall, we found that a 10% increase increases mortality by about 7.2 deaths per 100 000. The decomposition of the effects in the spatial analysis sheds light on the interrelated effects of commuting across LADs: about two-thirds of the overall effect of commuting occurs indirectly, through commuting flows and COVID-19 mortality in different LADs. Conversely, only one-third can be attributed directly to each LAD own share of commuters.
Our study was not explicitly designed to assess the impact of lockdown policies. However, to some extent, the additional analyses in Supplementary Appendix SC2 support the notion that the lockdown may well have achieved its aim in reducing the severity of the pandemic by reducing the flows of people between geographic areas. We find a strong association between commuting flows for mortality in March, but this effect size reduces to almost zero as we consider later months. However, this assertion relies strongly on the ceteris paribus interpretation; that is we assume lockdown policies were the only thing that happened in March and there were no other changes in behaviour. Hence, we only make cautious claims about the effectiveness of lockdown policies.
In addition to all commuting flows, in Supplementary Appendix SC3 we show that LADs where there was more commuting by public transport experienced stronger mortality. Again, this is consistent with theoretical predictions, in that public transport offers more opportunities for people to be in close contact with their fellow commuters. Private transport, walking and cycling are less likely to see people coming into contact with other people. Like the main effect, the results for public transport also vary by month, with the main effect observed in March, broadly consistent with the notion that the lockdown policy has been effective in curbing the spread of disease and related mortality by limiting interactions between people.
Our study has several limitations. First, we mainly consider COVID-19 mortality as an outcome. The lockdown policies may well have affected other aspects of the lives of workers, such as mental health and employment opportunities, which calls for a Overall, our work suggests that spatial analyses can be useful to assess COVID risk factors and gauge benefits and costs of lockdown policies. Building on similar approaches to consider commuting flows and other type of motivations to travel, future research may shed light on the unexplored trade-off between health and economic activity intrinsic to lockdown policies. This clearly requires that all relevant consequences of lockdowns beyond COVID-related outcomes-including workers' physical and mental health, costs for increased preventive measures, investments for homeworking arrangements, loss of revenues for firms, etc.-can be effectively measured and considered.

Supplementary data
Supplementary data are available at EURPUB online.
Author's contribution I.F. and L.M. contributed equally to develop the analytical framework, prepare and analyze the data, as well as in writing the manuscript. Both authors approved the final version of the manuscript. Table 2 The effect of the % of workforce who live and work in the same LAD on the 4-month COVID-19 mortality rate (1) (2) a: The definition of average marginal effect (direct, indirect and total) for the spatial model is provided in Supplementary Appendix SD. b: The reference level of care home quality rating is 'Very good'. c: The weight matrix S used in the spatial regression model is defined in Data section; it links LADs based on commuting flows between LADs reporting the share of people commuting to each LAD out of all people commuting out of the home LAD (i.e. where they live). The matrix S has zeros on the leading diagonal and is asymmetric (i.e. the share of people commuting from A to B are allowed to be different from that of people commuting from B to A). Standard errors in parentheses.