-
PDF
- Split View
-
Views
-
Cite
Cite
Sebastian Meyer, Leonhard Held, Incorporating social contact data in spatio-temporal models for infectious disease spread, Biostatistics, Volume 18, Issue 2, April 2017, Pages 338–351, https://doi.org/10.1093/biostatistics/kxw051
Close -
Share
Summary
Routine public health surveillance of notifiable infectious diseases gives rise to weekly counts of reported cases—possibly stratified by region and/or age group. We investigate how an age-structured social contact matrix can be incorporated into a spatio-temporal endemic–epidemic model for infectious disease counts. To illustrate the approach, we analyze the spread of norovirus gastroenteritis over six age groups within the 12 districts of Berlin, 2011–2015, using contact data from the POLYMOD study. The proposed age-structured model outperforms alternative scenarios with homogeneous or no mixing between age groups. An extended contact model suggests a power transformation of the survey-based contact matrix toward more within-group transmission.
1. INTRODUCTION
The social phenomenon of “like seeks like” produces characteristic contact patterns between subgroups of a population. If suitably quantified, such social mixing behavior can inform models for infectious disease spread (Read and others, 2012). One of the largest social contact surveys to date was conducted as part of the EU-funded POLYMOD project, recording conversational contacts of 7 290 individuals in eight European countries (Mossong and others, 2008). Contact patterns were found to be similar across the different countries and highly assortative with respect to age, especially for school children and young adults.
The basic idea behind the combination of social contact data with epidemic models has been termed the “social contact hypothesis” (Wallinga and others, 2006): The age-specific numbers of potentially infectious contacts are proportional to age-specific numbers of social contacts. For instance, for pathogens transmitted via respiratory droplets, face-to-face conversation and/or physical contact are frequently used as proxy measures for exposure. Many studies have now made use of the POLYMOD contact data (Rohani and others, 2010; Goeyvaerts and others, 2010, 2015; Birrell and others, 2011; Baguelin and others, 2013), but none of them accounts for the spatial characteristics of disease spread. The distance of social contacts from the home location of each participant has only recently been investigated by Read and others (2014). Their finding that “most were within a kilometer of the participant’s home, while some occurred further than 500 km away” reflects the power-law distance decay of social interaction as determined by human travel behavior (Brockmann and others, 2006). Meyer and Held (2014) found such a power law to translate to the spatial spread of infectious diseases.
The purpose of this paper is to combine the social and spatial determinants of infectious disease spread in a multivariate time-series model for public health surveillance data. For notifiable diseases, such data are routinely available as weekly counts of reported cases by administrative district and further stratified by age group or gender. Social contact matrices reflect the amount of mixing between these strata. Our focus is on age-structured models, but the methods equivalently apply to other or multiple strata. We investigate if a (possibly adjusted) contact matrix captures disease spread better than simple assumptions of homogeneous or no mixing between the subgroups. The approach also allows us to estimate how much disease incidence in each group can be linked to previous cases in their own and in other groups—while adjusting for the spatial pattern of disease spread.
This paper is organized as follows. Section 2 introduces our case study on norovirus gastroenteritis, including contact data from the POLYMOD study. Section 3 outlines the spatio-temporal modeling framework and describes how to incorporate additional stratification with a contact matrix. Section 4 shows results of the case study and Section 5 concludes the paper with a discussion. The supplementary material contains additional figures, an animation of the data, as well as the |$\texttt{R}$| source package |$\texttt{hhh4contacts}$| with the data and code to reproduce the presented analysis (run |$\texttt{demo}$| (“|$\texttt{hhh4contacts}$|”) after installing and loading the package).
2. CASE STUDY: NOROVIRUS GASTROENTERITIS IN BERLIN, 2011–2015
Most of the aforementioned studies relate contact patterns to the spread of influenza, whereas here we investigate the occurrence of norovirus-associated acute gastroenteritis. Both diseases are highly infectious, have a similar temporal pattern, and similar mortality in elderly persons (van Asten and others, 2012). However, in contrast to influenza, vaccines against noroviruses have yet to be developed (Pringle and others, 2015). Absence of vaccination simplifies the analysis of infectious disease occurrence since vaccination coverage—potentially varying across age groups, regions and over time—needs not to be taken into account.
2.1. Epidemiology of norovirus gastroenteritis
Norovirus-associated acute gastroenteritis is characterized by “sudden onset of vomiting, diarrhea, and abdominal cramps lasting 2–3 days” (Pringle and others, 2015). O’Dea and others (2014) estimate an average symptomatic period of 3.35 days from outbreaks in hospitals and long-term care facilities, where vulnerable individuals live closely together and norovirus outbreaks most commonly occur. Another frequently affected subgroup are children in daycare centers. Norovirus incidence peaks during winter, where outbreaks in childcare facilities were observed to precede those in private households, hospitals, and nursing homes (Bernard and others, 2014b).
Noroviruses are highly contagious since only few viral particles are needed for an infection. Being thermally stable and particularly persistent in the environment (Marshall and Bruggink, 2011), noroviruses can also be transmitted indirectly via contaminated surfaces or food. The serial interval, i.e. the time between onset of symptoms in a primary and a secondary case, ranges from within a day to more than 1 week with a median of about 3 days (Götz and others, 2001).
2.2. Incidence data
In Germany, the national public health institute (the Robert Koch Institute, RKI) provides access to incidence data of notifiable diseases through the SurvStat@RKI 2.0 online service (http://survstat.rki.de). Since the last revision of the case definition for norovirus gastroenteritis in 2011, only laboratory-confirmed cases are reported to the RKI. The number of cases to be modeled thus excludes all asymptomatic cases as well as all those symptomatic cases, who have not found their way to laboratory testing (Gibbons and others, 2014). It is known that under-reporting of norovirus illness is most pronounced in the 20- to 29-year-old persons and substantially lower in persons aged |$<10$| years and 70 years and over (Bernard and others, 2014c). A sensitivity analysis will indicate how under-reporting may affect the interpretation of our model results.
As to the geographic region of interest, we chose the largest city of Germany, Berlin, which is divided into 12 administrative districts. This enables the analysis of disease spread on a smaller spatial scale. Furthermore, a large underlying population is required for our time-series model to be a reasonable approximation of the epidemic process (Farrington and others, 2003).
We have downloaded weekly numbers of reported cases of norovirus gastroenteritis in Berlin from SurvStat@RKI (as of the annual report 2015). These counts cover four norovirus seasons, from 2011-W27 to 2015-W26, and are stratified by the 12 city districts and six age groups: 0–4, 5–14, 15–24, 25–44, 45–64, and 65+ years of age. The age groups were condensed from 5-year intervals to reflect distinct social mixing of pre-school vs. school children, and intergenerational mixing. Similarly stratified population numbers were obtained from the Statistical Information System Berlin-Brandenburg StatIS-BBB (http://www.statistik-berlin-brandenburg.de/statis) at the reference date December 31, 2011, when Berlin had 3 501 872 inhabitants in total.
Age-stratified time series and maps of norovirus gastroenteritis incidence (per 100 000 inhabitants) in Berlin, 2011-W27 to 2015-W26. The weekly incidence plots on the left all use the same |$\sqrt{}$|-scale. The Christmas break in calendar weeks 52 and 1 is highlighted. The group-specific maps on the right show the mean yearly incidence by city district.
Age-stratified time series and maps of norovirus gastroenteritis incidence (per 100 000 inhabitants) in Berlin, 2011-W27 to 2015-W26. The weekly incidence plots on the left all use the same |$\sqrt{}$|-scale. The Christmas break in calendar weeks 52 and 1 is highlighted. The group-specific maps on the right show the mean yearly incidence by city district.
How disease incidence varies across the 12 city districts of Berlin is shown in Figure 1 (right). The south-western district Steglitz-Zehlendorf tends to be affected more and the central districts tend to be affected less than the remaining districts. This pattern is roughly consistent across age groups. An exception are the two younger age groups, which exhibit a relatively high incidence in Marzahn-Hellersdorf. District-specific seasonal shifts are not apparent (supplementary Figure S2, See supplementary material available at Biostatistics online).
Animated, age-stratified maps of the weekly counts encompass the full information from all three data dimensions. Such an animation (supplementary material) may provide additional insight into the dynamics of disease spread. However, epidemic models estimated from these data offer a more structured view and take population heterogeneity directly into account.
2.3. Contact data
Age-structured contact matrix estimated from the German POLYMOD sample using 5-year intervals (left), and aggregated to the age groups of the surveillance data (right). The entries refer to the mean number of contact persons per participant per day.
Age-structured contact matrix estimated from the German POLYMOD sample using 5-year intervals (left), and aggregated to the age groups of the surveillance data (right). The entries refer to the mean number of contact persons per participant per day.
The strong diagonal pattern in the social contact matrix reflects that people tend to mix with people of the same age. The other prominent pattern is produced by the contacts between parents and children. The matrix for physical contacts shows similar patterns (supplementary Figure S3, See supplementary material available at Biostatistics online). Aggregation of the contact matrix is done by summing over the contact groups (columns) to be joined and calculating the weighted average across the corresponding participant groups (rows), with weights equal to the group sizes. The aggregated contact matrix is asymmetric because of the different sizes of the involved age groups, but reciprocity at the population level still holds. For the models described in the next section, only the row-wise distributions will be relevant, i.e. the contact pattern of an infectious participant across the different age groups.
3. AN AGE-STRUCTURED SPATIO-TEMPORAL MODEL FOR INFECTIOUS DISEASE COUNTS
We review an endemic–epidemic modeling framework for areal time series of infectious disease counts (Meyer and Held, 2014, Section 3), into which we subsequently incorporate an additional stratification variable featuring a contact matrix.
3.1. Spatio-temporal formulation
3.2. Extension for stratified areal count time series
Extending the above spatio-temporal model to fit multivariate time series of counts |$Y_{grt}$| stratified by (age) group in addition to region enables us to relax the simple assumption of homogeneous mixing within each region. More complex strata such as the interaction of age group and gender are equally possible and can be subsumed in the single group index |$g$|.
There are two special cases of the contact structure involved in the epidemic component. First, a contact matrix with identical rows implies that the mixing pattern of the |$Y_{g',r',t-1}$| infectious cases does not depend on the group |$g'$| they belong to. An example of such homogeneous mixing is a contact matrix where each row equals the vector of group sizes (|$c_{g'g} = e_g$|). If |$\phi_{grt}$| contains group-specific effects, a simple matrix of ones (|$\boldsymbol{C} = \bf{1}$|) will induce the same contact structure. The other special case is a diagonal contact matrix |$\boldsymbol{C} = \boldsymbol{I}$|, which reflects complete absence of mixing. This is equivalent to formulating a separate spatio-temporal model (3.3) for each group. However, also in this case of no between-group mixing, the joint model formulation has the advantage of allowing for parsimonious decompositions of |$\nu_{grt}$| and |$\phi_{grt}$| into group and region effects. Borrowing strength across groups is especially useful in applications with low counts.
3.3. Parameterizing the contact matrix
Contact patterns derived from sociological studies might not fully match the characteristics of disease spread. For example, social networks are known to change during illness (van Kerckhove and others, 2013) and brief contacts are frequently not reported (Smieszek and others, 2014). We therefore suggest a parsimonious single-parameter approach to adaptively estimate the transmission weights as a function of the given contact matrix |$\boldsymbol{C}$|.
The power transformation |$\boldsymbol{C}^\kappa$| (3.5) applied to the row-normalized POLYMOD contact matrix for different values of |$\kappa$|. The right-hand plots show the values of the diagonal entry |$\lbrack 3,3\rbrack$| and the off-diagonal entry |$\lbrack 3,6\rbrack$| of |$\boldsymbol{C}^\kappa$|.
The power transformation |$\boldsymbol{C}^\kappa$| (3.5) applied to the row-normalized POLYMOD contact matrix for different values of |$\kappa$|. The right-hand plots show the values of the diagonal entry |$\lbrack 3,3\rbrack$| and the off-diagonal entry |$\lbrack 3,6\rbrack$| of |$\boldsymbol{C}^\kappa$|.
3.4. Inference
Likelihood inference for the multivariate count time-series model (3.1) has been developed by Paul and Held (2011) and Meyer and Held (2014). The log-likelihood is maximized numerically using the quasi-Newton algorithm provided by the |$\textsf{R}$| function |$\texttt{nlminb}$| (R Core Team, 2016). Supplied with analytical formulae for the score function and Fisher information, convergence is fast, even for a large number of parameters. The modeling framework is implemented in the |$\textsf{R}$| package |$\texttt{surveillance}$| (Meyer and others, 2016, Section 5) as function |$\texttt{hhh4}$|.
The age-structured model (3.4) is built on top of the existing inference framework. The power parameter |$\kappa$| of (3.5) is conveniently estimated via a profile likelihood approach (see, e.g. Held and Sabanés Bové, 2014, Section 5.3), which avoids the cumbersome implementation of additional derivatives with respect to all model parameters. We numerically maximize the log-likelihood of a model with fixed contact matrix |$\boldsymbol{C}^\kappa$| as a function of |$\kappa$|. The profile confidence interval for |$\kappa$| thus incorporates the uncertainty of all other parameter estimates (but not vice versa).
4. RESULTS
The endemic predictor allows for age- and district-specific incidence levels, fewer cases during the Christmas break (|$x_t = 1$| in calendar weeks 52 and 1, otherwise |$x_t=0$|), as well as age-specific seasonality (|$\omega = 2\pi/52$|). Transmission between age groups is modeled using the power transformation (3.5) for the row-normalized contact matrix estimated from the POLYMOD study. Transmission between districts is quantified by a power law with respect to adjacency order. The intercepts are identifiable by fixing |$\alpha_1^{(G)}=\alpha_1^{(R)}=0$|, |$\phi_1^{(G)}=\phi_1^{(R)}=1$|, where |$\phi_g^{(G)}$| and |$\phi_r^{(R)}$| are estimated on the log-scale, and including overall intercepts in both model components.
Table 1 summarizes competing models with respect to the assumed contact structure between age groups. It turns out that a superposed epidemic component improves upon a purely endemic model, and that incorporating the contact matrix from the POLYMOD study outperforms naive models with homogeneous or no mixing between age groups. Akaike’s Information Criterion (AIC) is minimal for the model with a power-adjusted contact matrix |$\boldsymbol{C}^\kappa$| (penultimate row), where the exponent is estimated to be |$\hat{\kappa} =$| 0.47 (95% CI: 0.34–0.66). This means that the epidemic part subsumes more information from cases in the own age group than suggested by the original contact matrix (cf Figure 3). The change in AIC associated with this adjustment, however, is minor compared to the improvement achieved by employing the POLYMOD contact matrix in the first place. Results are very similar for physical contacts, but the fit is slightly worse.
Model summaries for the age-stratified, areal surveillance data of norovirus gastroenteritis in Berlin. For reference, the first row represents the purely endemic model, which assumes independent counts. The remaining rows correspond to endemic–epidemic models with a spatial power law, but varying assumptions on the age-structured contact matrix |$\boldsymbol{C}$|. The columns refer to the following model characteristics: the number of parameters, the difference in AIC compared to the purely endemic model, the power |$\tau$| of the population scaling factor, the decay parameter |$\rho$| of the spatial power law, and the power adjustment |$\kappa$| of the contact matrix. The parameter columns contain the estimates and 95% Wald confidence intervals
| . | dim . | |$\Delta$|AIC . | |$\tau$| . | |$\rho$| . | |$\kappa$| . |
|---|---|---|---|---|---|
| Purely endemic model . | 36 . | 0.0 . | — . | — . | — . |
| Homogeneous mixing (|$\boldsymbol{C} = \bf{1}$|) | 55 | |$-$|415.4 | 1.19 (0.83–1.55) | 2.43 (2.04–2.88) | — |
| No mixing (|$\boldsymbol{C} = \boldsymbol{I}$|) | 55 | |$-$|602.8 | 0.61 (0.24–0.98) | 2.18 (1.89–2.53) | — |
| Original contact matrix |$\boldsymbol{C}$| | 55 | |$-$|631.9 | 0.97 (0.66–1.28) | 2.34 (2.03–2.70) | — |
| Adjusted contact matrix |$\boldsymbol{C}^\kappa$| | 56 | |$-$|659.4 | 0.86 (0.53–1.19) | 2.27 (1.98–2.61) | 0.47 (0.34–0.66) |
| Based on physical contacts only | 56 | |$-$|655.3 | 0.85 (0.52–1.19) | 2.27 (1.98–2.61) | 0.48 (0.35–0.66) |
| . | dim . | |$\Delta$|AIC . | |$\tau$| . | |$\rho$| . | |$\kappa$| . |
|---|---|---|---|---|---|
| Purely endemic model . | 36 . | 0.0 . | — . | — . | — . |
| Homogeneous mixing (|$\boldsymbol{C} = \bf{1}$|) | 55 | |$-$|415.4 | 1.19 (0.83–1.55) | 2.43 (2.04–2.88) | — |
| No mixing (|$\boldsymbol{C} = \boldsymbol{I}$|) | 55 | |$-$|602.8 | 0.61 (0.24–0.98) | 2.18 (1.89–2.53) | — |
| Original contact matrix |$\boldsymbol{C}$| | 55 | |$-$|631.9 | 0.97 (0.66–1.28) | 2.34 (2.03–2.70) | — |
| Adjusted contact matrix |$\boldsymbol{C}^\kappa$| | 56 | |$-$|659.4 | 0.86 (0.53–1.19) | 2.27 (1.98–2.61) | 0.47 (0.34–0.66) |
| Based on physical contacts only | 56 | |$-$|655.3 | 0.85 (0.52–1.19) | 2.27 (1.98–2.61) | 0.48 (0.35–0.66) |
Model summaries for the age-stratified, areal surveillance data of norovirus gastroenteritis in Berlin. For reference, the first row represents the purely endemic model, which assumes independent counts. The remaining rows correspond to endemic–epidemic models with a spatial power law, but varying assumptions on the age-structured contact matrix |$\boldsymbol{C}$|. The columns refer to the following model characteristics: the number of parameters, the difference in AIC compared to the purely endemic model, the power |$\tau$| of the population scaling factor, the decay parameter |$\rho$| of the spatial power law, and the power adjustment |$\kappa$| of the contact matrix. The parameter columns contain the estimates and 95% Wald confidence intervals
| . | dim . | |$\Delta$|AIC . | |$\tau$| . | |$\rho$| . | |$\kappa$| . |
|---|---|---|---|---|---|
| Purely endemic model . | 36 . | 0.0 . | — . | — . | — . |
| Homogeneous mixing (|$\boldsymbol{C} = \bf{1}$|) | 55 | |$-$|415.4 | 1.19 (0.83–1.55) | 2.43 (2.04–2.88) | — |
| No mixing (|$\boldsymbol{C} = \boldsymbol{I}$|) | 55 | |$-$|602.8 | 0.61 (0.24–0.98) | 2.18 (1.89–2.53) | — |
| Original contact matrix |$\boldsymbol{C}$| | 55 | |$-$|631.9 | 0.97 (0.66–1.28) | 2.34 (2.03–2.70) | — |
| Adjusted contact matrix |$\boldsymbol{C}^\kappa$| | 56 | |$-$|659.4 | 0.86 (0.53–1.19) | 2.27 (1.98–2.61) | 0.47 (0.34–0.66) |
| Based on physical contacts only | 56 | |$-$|655.3 | 0.85 (0.52–1.19) | 2.27 (1.98–2.61) | 0.48 (0.35–0.66) |
| . | dim . | |$\Delta$|AIC . | |$\tau$| . | |$\rho$| . | |$\kappa$| . |
|---|---|---|---|---|---|
| Purely endemic model . | 36 . | 0.0 . | — . | — . | — . |
| Homogeneous mixing (|$\boldsymbol{C} = \bf{1}$|) | 55 | |$-$|415.4 | 1.19 (0.83–1.55) | 2.43 (2.04–2.88) | — |
| No mixing (|$\boldsymbol{C} = \boldsymbol{I}$|) | 55 | |$-$|602.8 | 0.61 (0.24–0.98) | 2.18 (1.89–2.53) | — |
| Original contact matrix |$\boldsymbol{C}$| | 55 | |$-$|631.9 | 0.97 (0.66–1.28) | 2.34 (2.03–2.70) | — |
| Adjusted contact matrix |$\boldsymbol{C}^\kappa$| | 56 | |$-$|659.4 | 0.86 (0.53–1.19) | 2.27 (1.98–2.61) | 0.47 (0.34–0.66) |
| Based on physical contacts only | 56 | |$-$|655.3 | 0.85 (0.52–1.19) | 2.27 (1.98–2.61) | 0.48 (0.35–0.66) |
The spatial spread of the disease across city districts is estimated to have a strong distance decay with |$\hat{\rho} =$| 2.27 (95% CI: 1.98–2.61), such that the adjacency orders 0–4 have weights 1.00, 0.21, 0.08, 0.04, and 0.03. Supplementary Figure S4, See supplementary material available at Biostatistics online, shows age-dependent power laws (replacing |$\rho$| by |$\rho_{g'}$| in (4.1)), as well as unconstrained estimates of the order-specific weights, which are close to the power law. In accordance with the idea of a gravity model, we find that the epidemic part scales with the population size of the “importing” district and age group. Similar to a previous application on influenza (Meyer and Held, 2014), the corresponding estimate |$\hat{\tau} =$| 0.86 (95% CI: 0.53–1.19) is slightly below unity and provides strong evidence for such an association.
Fitted mean components from the AIC-optimal model with adjusted contact matrix, aggregated over all districts. The dots correspond to the reported numbers of cases.
Fitted mean components from the AIC-optimal model with adjusted contact matrix, aggregated over all districts. The dots correspond to the reported numbers of cases.
The estimated group-specific overdispersion parameters are 0.24 (0–4), 1.98 (5–14), 0.30 (15–24), 0.03 (25–44), 0.15 (45–64), and 0.40 (65+) in the model with adjusted contact matrix. The large overdispersion for the 5- to 14-year-old children may be partly due to the food-borne outbreak in 2012, for which the model does not explicitly account. The estimates are similar for the other epidemic models of Table 1, but slightly larger in the endemic-only model.
5. DISCUSSION
We have incorporated a social contact matrix in a regression-oriented, endemic–epidemic time-series model for stratified, area-level infectious disease counts. This three-dimensional approach provides a more detailed description of disease spread than unstratified or non-spatial models, which inherently assume homogeneous mixing within each region or subgroup, respectively.
In our application to age-stratified counts of norovirus gastroenteritis in Berlin’s city districts, the contact model was superior to homogeneous or no mixing between age groups. The model further improved when adjusting the POLYMOD contact matrix toward more within-group transmission. This could be related to biases in contact reporting (Smieszek and others, 2014) with more unreported (short) contacts along the diagonal. The two age groups involving parents were affected the most by preceding infections in other age groups. This is in accordance with the leading role of school children in influenza epidemics (Worby and others, 2015).
Furthermore, new infections predominantly depend on past cases from the same district, as suggested by the estimated spatial transmission weights. An age-dependent distance decay could not be identified from the disease counts. One could thus try to replace the parametric formulation by a social contact matrix, stratified by spatial distance in addition to age group. Separate movement data for school children and adults could then be used to quantify the strength of epidemiological coupling between regions (Kucharski and others, 2015). However, integration of movement network data does not necessarily improve predictions (Geilhufe and others, 2014).
A potentially more severe simplification of our model is the assumption of a time-constant contact matrix. Although weekday vs. weekend differences in contact patterns are not relevant for weekly time-series models, there are possibly relevant seasonal effects on larger time scales. For instance, the contact structure of school children changes considerably between regular and school holiday periods (Hens and others, 2009). Our model could be further tuned both by incorporating a time-varying contact matrix and by estimating seasonality also in the epidemic component (Held and Paul, 2012), which the |$\texttt{hhh4}$| implementation already supports.
To check the robustness of our results with respect to under-reporting, we re-estimated the models with age-specific multiplication factors applied to the reported numbers of cases. Roughly following Bernard and others (2014c, Table 1), we used factors of 1.5 (0–4), 2.5 (5–14), 3.0 (15–24), 3.0 (25–44), 2.5 (45–64), and 2.0 (65+), respectively. While the overdispersion increases, the parameters of the mean are close to the original fit and the epidemic proportion is similar (supplementary Figure S9, See supplementary material available at Biostatistics online). For small strata with a low number of cases, a drawback of this simple deterministic approach is that zero reported counts remain zero regardless of the amount of under-reporting. More sophisticated adjustments are currently being investigated within a Bayesian modeling framework. In principle, asymptomatic infections could be similarly accounted for as missing cases, but they seem to play a minor role in disease transmission (Sukhrie and others, 2012). One-week-ahead forecasts or long-term simulations of the number of (symptomatic) infections, however, are of particular relevance for public health planning. Whether the improved model with social contact data also leads to better predictions will be described elsewhere.
SUPPLEMENTARY MATERIAL
Supplementary material is available at http://biostatistics.oxfordjournals.org.
ACKNOWLEDGMENTS
We thank the associate editor, two anonymous referees, and Michael Höhle for helpful comments on a previous version of this manuscript. Joël Mossong made the POLYMOD data available, and a KML file of Berlin’s districts Conflict of Interest: None declared.
FUNDING
Swiss National Science Foundation (project #137919).




