The limits of human mobility traces to predict the spread of COVID-19: A transfer entropy approach

Abstract Mobile phone data have been widely used to model the spread of COVID-19; however, quantifying and comparing their predictive value across different settings is challenging. Their quality is affected by various factors and their relationship with epidemiological indicators varies over time. Here, we adopt a model-free approach based on transfer entropy to quantify the relationship between mobile phone-derived mobility metrics and COVID-19 cases and deaths in more than 200 European subnational regions. Using multiple data sources over a one-year period, we found that past knowledge of mobility does not systematically provide statistically significant information on COVID-19 spread. Our approach allows us to determine the best metric for predicting disease incidence in a particular location, at different spatial scales. Additionally, we identify geographic and demographic factors, such as users’ coverage and commuting patterns, that explain the (non)observed relationship between mobility and epidemic patterns. Our work provides epidemiologists and public health officials with a general—not limited to COVID-19—framework to evaluate the usefulness of human mobility data in responding to epidemics.


Introduction
The relationship between human movements and the spatial spread of infectious diseases has been recognized for a long time [1,2,3].Human movement has been shown to play a key role in the dynamics of several pathogens, through two basic mechanisms: traveling infectious individuals may introduce a pathogen in a susceptible population, and, at the same time, human movement increase the contact rate between individuals, creating new opportunities for infection.
In the past 15 years, the increasing availability of mobility data derived from mobile phones has fueled a large body of work aimed at identifying opportunities to use them for infectious disease modeling and surveillance [4,5,6,7,8,9,10].
More recently, during the COVID-19 pandemic, mobile phone-derived data have been extensively harnessed to monitor the effect of non-pharmaceutical interventions (NPIs) across countries, understand the early dynamics of COVID-19 diffusion, and forecast its spread at different spatial scales, from countries to cities [11,12,13,14,15,16,17].By measuring human movements and combining them with phylogeography methods [18,19], several studies shed light on the cryptic spread of new variants, their persistence over time and resurgence after the relaxation of NPIs [20,21,22].
Human mobility has been shown to strongly correlate with the spread of COVID-19 during the early phase of the outbreak in China and in many other countries [23,24,25,26,27,28].
However, once COVID-19 established a foothold in a population, the relative importance of mobile phone-derived data to predict the epidemic dynamics on a local scale has been generally less understood and several studies have shown conflicting evidence about the use of mobility traces to model the spread of COVID-19 at later stages of the outbreak.For instance, it has been shown that the explanatory power of mobility metrics in relation to the case growth rate in the U.S., significantly declined in spring 2020, especially in rural areas [29,30,31].Similar trends have been observed in Europe [32].In parallel, mobile phone-derived data have been proven beneficial to model COVID-19 dynamics in largely populated urban areas of Western countries [33,34], but less so in countries of the Global South [35].
Several reasons have been proposed to explain the varying relationship between mobility metrics and epidemic indicators [29].Mobility metrics are generally derived from raw mobile positioning data through complex and customized processing pipelines that can significantly vary across data providers [36].How raw data are processed, and the specific definitions of mobility metrics can significantly impact their interpretation with respect to epidemic variables [37].Moreover, the relationship between mobility and epidemic patterns often relies on modeling assumptions, typically considering linear dependencies, that may not capture the complex interplay of these quantities [32,30].Finally, mobile phone-derived metrics are generated from a sample of users who is generally not representative of the whole population.It is therefore of paramount importance to define standardized approaches that can quantify the added value of mobility metrics for epidemiological analysis, and make different metrics, across settings, directly comparable.
Here, we extensively quantify the relationship between cell phone-derived mobility metrics and COVID-19 epidemiological indicators through a model-free approach, based on an informationtheoretic measure, transfer entropy [38], adapted for small sample sizes.Leveraging granular data provided by Meta that capture both users' movements and colocation at a fine spatial scale [39], we measure the information flow between mobility metrics and time series of COVID-19 incidence and deaths in four European countries, at a subnational scale, over a one year period.
We find that the relative information added by the past knowledge of mobility metrics to the knowledge of the current state of COVID-19 time series is often not statistically significant.
In statistically significant cases instead, we show that the relative information added by past knowledge of COVID-19 cases to the knowledge of current deaths is twice the information flow between past knowledge of mobility metrics and current deaths.We also show that the information flow of a given mobility metric to predict future COVID-19 incidence or deaths can be significant in one country but not in another, even if derived from the same original data source.
Being a general framework, our approach provides a quantitative measure of the relative added explanation brought by mobile phone data to the prediction of epidemiological time series that does not depend on the choice of a specific forecasting model.It thus helps to better identify the most appropriate mobility metrics to use among those available.Our results can thus guide epidemiologists and public health practitioners in the evaluation of mobile phone-derived mobility metrics when they are interpreted as a precursor of epidemic activity.

Results
Here, we first describe and then apply our framework to measure the information flow between human mobility traces and the time evolution of COVID-19 in four European countries.

A transfer entropy approach to link mobility behavior and COVID-19 epidemiology
With the aim of quantifying the information flow from mobility-derived data to COVID-19 data, we first gathered a set of mobility and epidemiological indicators.Fig. 1 provides an overview of the datasets used in the study.In Materials and Methods, we provide a full description of all data sources and the data processing steps.We considered four European countries -Austria, France, Italy, and Spain -and their administrative subdivisions at NUTS3 level [40]   In each country, we also collected weekly and daily time series describing movements and colocation patterns made available by Meta [41].We computed contact rates from colocation maps (see Material and Methods and the SI for details), which measure the probability that two users from two locations are found in the same location at the same time [39].Colocation maps were generated by Meta on a weekly basis, only.To study human movement patterns, we considered movement range maps provided by Meta, which report the number of users who moved between any two 16-level Bing tiles with an 8 hour frequency [42].To make colocation and movement patterns comparable in terms of scale, we focused on short-range movements, i.e. movements that occurred within the same tile, and we separately considered the mid-range movements, i.e. movements that occur between two different tiles in the same province.We then processed the three datasets, starting from their raw form, to aggregate them at the NUTS3 resolution and create the time series: M s (t) for the short-range movements, M (t) for the mid-range movements and CR(t) for the contact rates.These time series were then used as source variables in the information-theoretic analysis.In the remainder of the paper, we will generally refer to CR(t), M s (t), and M (t) as mobility time series as they are all derived from human mobility data.We will also generally refer to the NUTS3 units as provinces, although their nomenclature varies across countries.
Fig. 2 illustrates our study design based on the transfer entropy [38].Transfer entropy is a metric that measures the directed statistical dependence between a source and a target time series and it has been applied to a wide range of research domains [43].Here, our approach consists, first, in computing the transfer entropy between mobility time series, M s (t), M (t) and CR(t), and epidemiological time series such as the reported number of COVID-19 attributed deaths D(t) and cases C(t), in each administrative unit, and for different temporal lags l, using the definition of Shannon entropy, as described by the equations in Fig. 2. Intuitively, the transfer entropy between mobility and deaths, T E M s →D (resp.T E M →D ), can be interpreted as the degree of uncertainty of the reported deaths, D, at time t that is solved jointly by the time series of deaths and mobility trends M s (resp.M ) and exceeds the current degree of uncertainty of D, which can be solved by D's own past.
It is known that transfer entropy estimates suffer in case of small sample sizes and nonstationarity of the source and target time series [44].Moreover, due to the non-parametric nature of the transfer entropy, values computed between different source-target time series are not directly comparable.To address these issues, we first adopted the definition of effective transfer entropy (ETE) [44].ETE is obtained by subtracting from the original definition of TE a reference TE value using a shuffled version of the target time series (see Methods for details), thus removing spurious contributions to TE due to fluctuations observed in small sample sizes.
Also, to address biases due to small sample sizes, we applied a Kernel Density Estimation, before the time series discretization that is necessary to compute the transfer entropy.Second, we normalized the effective transfer entropy by the Shannon entropy of the target variable, defining a normalized effective transfer entropy (NETE) [45].We obtain a metric that is always positive when it is statistically significant and whose zero value indicates the absence of information transfer between time series.In the remainder of the article, we thus refer to the NETE between source X and target Y as our main quantity of interest, using the symbol N X→Y to denote it.
To better understand the cause-effect relationship between mobility and COVID-

The information flow between COVID-19 incidence and deaths
As previously mentioned, to gauge our transfer entropy analysis framework, we first looked at the causal relationship between the incidence of COVID-19 cases and reported death counts.It is clearly expected that a major source of information that provides knowledge on future deaths is encoded in the time series of past case counts.We used the NETE to quantify such information flow.
Fig. 3 shows the NETE between the weekly time series of COVID-19 cases and deaths in the four countries under study.In all countries, median values of N C→D increase from lags equal to 1 week up to a maximum of around 2-3 weeks, and then decline rapidly beyond the 3 weeks time lag.This is in line with early estimates of the median time delay between case reporting and fatality, which was estimated to range between 7 and 20 days in different countries [46,47].At lag equal to 2 weeks, the mean relative explanation added by time series of cases with respect to deaths -that is how much of D(t) can be explained only by the past knowledge C(t − l) -is 14% (SD=8) in Spain, 8% (SD=6) in Italy, 7% (SD=5) in Austria, and 6% (SD=5) in France.Boxplots computed on the distribution of administrative units in each country show a substantial heterogeneity of NETE across regions for lags shorter than 4 weeks.This may be partially explained by spatial heterogeneities in case and death reporting, and in testing strategies.Also, N C→D values appear to be higher in Spain, with respect to the other countries.
A transfer entropy analysis of daily time series of COVID-19 cases and deaths displays consistent results (see Fig. S1), with NETE values that fall within the same range measured on a weekly time scale.
These results suggest NETE estimates are robust with respect to the time scale at which source and target time series are compared.Moreover, it provides a reference value for NETE, in terms of orders of magnitude, when the existence of a causal relationship between time series is known.

The information flow between mobility traces and COVID-19 dynamics
Having defined a benchmark of information transfer using N C→D , we measured the information flow between behavioral time series of mobility indicators and COVID-19 cases and deaths.Fig. 4 summarizes the main results of our analysis.Values of N X→D , with X being either short range movements, mid-range movements or contact rates, were substantially smaller than N C→D in all countries, for any given time lag l.In particular, Fig. 4a allows to compare the distributions of N C→D , N CR→D , N M s →D , and N M →D , at the time lag l that maximized the median NETE for weekly time series, for all indicators.We found the largest median values of the normalized transfer entropy at l = 7 weeks for both contact rates and movements (short-range and midrange).The upper quartile of the NETE distributions derived from the mobility traces generally fell below 5%, in all countries, while the lower quartile of N C→D was always above 5%.Also, the distributions of normalized transfer entropy computed from movements were much narrower and often including the value N = 0 within their interquartile range.Values of N M →C , shown in Fig. 4b, display a pattern similar to the normalized transfer entropy from the mobility time series to the death time series, with generally low values of NETE in all countries.Compared to movement time series, contact rates led generally to relatively higher values of NETE with both targets, cases and deaths, as shown in Fig. 4. Our result confirms the additional value of measuring contact rates from mobile phone data, with respect to other movement metrics [48].Besides, it shows that short-range mobility within a province had often a limited predictive power to capture time trends of COVID-19 spread.
To obtain a more detailed picture of the predictive power of different mobility metrics in terms of NETE, we computed the percentage of provinces for which mobility time series provided significant relative information added, with respect to the past knowledge of epidemiological indicators only (see Tab. 1).On the one hand, our framework effectively captured the existing causal relationship between the time evolution of cases counts and the number of deaths, as the NETE between these indicators was statistically significant (p < 0.01) in about 80% of the provinces, at 2 weeks lag.On the other hand, we observed a statistically significant information transfer from mobility time series to epidemiological ones in a much smaller fraction of provinces.
Short-range movements NETE was significant in less than 20% of provinces when considered as a predictor of both cases and deaths.Mid-range movement time series and contact rates were significant in at most 27% and 40% of provinces.This means that in most provinces, mobility traces did not provide any additional information to predict future COVID-19 cases or deaths, at any lag between 2 and 8 weeks.
Measures of contact rate extracted from colocation maps were more suitable than movement As a sensitivity analysis, we also computed the NETE on a shorter time window, between September 2020 and January 2021, to exclude the confounding effect of the introduction of nationwide vaccination programs.Since in those months all countries adopted mobility restrictions to mitigate the fall COVID-19 wave, we expect a stronger relationship between mobility and COVID-19 cases.Indeed, during this time frame, the information flow between movement time series and COVID-19 cases was consistently higher than in the full study period (see Fig. S10).
This result indicates that, provided with time series of adequate size, the NETE can effectively capture the time-varying relationship between human mobility time trends and COVID-19 dynamics.

Identifying the determinants of mobility data predictive power for COVID-19
Maps of Fig. 5 highlight the spatial heterogeneity of N X→D values observed within the same country, Spain, for a given time lag and different source time series (see Figs. S11 -S13 for the maps of Austria, France, and Italy).As previously mentioned, N C→D displays higher and significant values in most of the country (Fig. 5a), with very few exceptions, while statistically Provinces in white are excluded from our sample.
significant values of N M s →D are found only in 16 provinces out of 42 (Fig. 5c).
To better understand the observed heterogeneity in NETE, and identify those features that can predict the likelihood to observe a statistically significant information transfer from mobility  to COVID-19 death counts, we resorted to a classification model.Namely, we used a random forest classifier to predict when the value N X→D is more likely to be statistically significant, using short-range movement and contact rate as source time series.We focused on these two metrics as they are quantities measured at the same spatial scale.Moreover, short-range movements represent on average 90% or more of all movements within a province (see Table S1).As input features to the model, we considered a set of attributes of the provinces in each country.In particular, we investigated the effects of population size, province area in square kilometers, the density of Facebook users, the number of total cumulative deaths, the ratio between the number of commuters traveling from or to the province, and those who live and work there, as reported by the census (commuting flow), and the coverage consistency, that is the correlation over time between the number of Facebook users sharing their location and the number of Facebook users taken into account to compute the colocation maps.
The results summarized in Tab. 3 show that the model achieves a good overall performance in terms of precision and recall, as indicated by f1-scores generally higher than 0.6.In particular, of all provinces that are classified by the model as characterized by a statistically significant value of NETE, 90% or more display a significant transfer of information, as shown by precision values.
On the other hand, the model's recall is close to 0.95 when it comes to identifying provinces characterized by a not statistically significant NETE, therefore the model correctly identifies 95% of those provinces where there is no actual transfer of information between mobility and deaths.
To explore the importance of province features in our classification model, we examined the SHAP (SHapley Additive exPlanations) values associated with each, as shown in Fig. 6.SHAP is a method based on a game theoretic approach to explaining the output of classification models [49].As expected, the choice of the time lag to compute the NETE is crucial in determining the presence of a significant information transfer between mobility metrics and epidemiological indicators.Indeed, lag is ranked as the most and second most important feature explaining the classification, for contact rate and short-range movement, respectively.Commuting flow is the most important predictor of the statistical significance of NETE between short-range movements and deaths: when the number of commuters leaving or entering a province represents an important fraction with respect to those who remain within the province, the relationship between short-range mobility and COVID-19 dynamics gets weaker.However, the same feature has only a marginal impact on the NETE between contact rates and deaths, which suggests contact rate should be preferred over short-range movements to predict epidemic outcomes when a province is characterized by large population inflows/outflows.Province area and population size have also a significant impact on the information transfer between short-range movement and COVID-19 deaths.Indeed, a larger area and population size correspond to a higher likelihood of NETE significance for short-range movements.
This effect may partly explain why we observed NETE values that were statistically significant only in a few provinces of Austria, where spatial units were particularly small.When looking at the information flow between contact rates and time series of deaths, the total cumulative deaths represent an important explanatory variable for the classification model.Besides the analysis presented in Fig. 6 suggests that the coverage consistency needs to be sufficiently high in order to get a statistically significant transfer entropy from contact rate to deaths.In France, where in most provinces the coverage consistency is low and the commuting inflow and outflow are higher than in other countries (see Table S2), mid-range movements seem to provide a better alternative to contact rates and short-range movements to partially explain time trends of COVID-19 cases and deaths (see Fig. S14 of the SI).
From our analysis, we thus conclude that NETE values computed using contact rates as source time series are less sensitive to the province's geographic or demographic features, rather than to the noise of the target time series.Given good coverage, and consistency over time, contact rates thus represent a better epidemiological predictor of future COVID-19 deaths than short-range movements.

Discussion
In this work, we have introduced a novel framework based on transfer entropy to quantify the amount of information that is transferred from mobile phone-derived mobility metrics to epidemiological time series.Given the important role that mobility indicators have played in the COVID-19 pandemic, we tested our approach on mobility and epidemic time series collected in four European countries, between 2020 and 2021, at a subnational scale.We found that, in general, the relative explanation added by mobility time series to predict future epidemic trends, whether new cases or deaths, was relatively small, ranging between 4% and 6% on average, and not statistically significant in the large majority of the provinces we considered, for any mobility metric.As a comparison, these values were about half of the relative explanation added by past knowledge of COVID-19 incidence to predict future deaths.Our method allowed us to directly compare the relative explanation added by different mobile phone-derived metrics of mobility: short-and mid-range mobility, and contact rates.We generally found a higher information transfer from contact rates than movement, in line with previous studies [48], however, we also observed significant heterogeneities within the same country and between countries.With a classification model, we identified spatial features that may explain such heterogeneities.In provinces characterized by large populations, good coverage consistency over time, and small commuting in-and outflows, short-range movements can represent a useful metric to predict disease dynamics.Where commuting flows are large, such as in France, and Austria, mid-range movements, which represent less than 10% of the total movements, provided a better alternative to short-range ones.Our results suggest the choice of the best mobility metric to inform epidemic predictions can depend on a number of different factors, even when using one single data provider.Moreover, our findings show that cell phone mobility metrics do not always capture epidemiologically-relevant behaviors and alternative data sources could be more effective for this aim, as, for instance, the collection of survey data [50].
There is an emerging common understanding that mobility indicators measured from mobile phone data present significant gaps and do not provide a consistent picture of mobility across countries, and data providers [51,52].Previous studies have also highlighted the fact that coupling between mobility indicators and COVID-19 epidemiology is often weak, and it changes over time [29].The approach we introduced here addresses the above challenges by providing a general framework to evaluate the quality of metrics derived from passively collected mobility traces as a predictor of epidemic outcomes.Our framework has the advantage of being model-free, meaning that it does not depend on modeling assumptions regarding the expected relationship between mobility and epidemic dynamics, nor it requires any parametrization.The normalized effective transfer entropy we adopted is a general method.It allows us to rigorously compare different mobility indicators, across epidemiological settings, by measuring the relative information added by mobility time series to the prediction of future disease incidence.To this end, we release the code to reproduce our analysis between any two source and target time series (see Data and Code Availability).Researchers can use this tool in any epidemiological context to gauge the added value of a specific mobile phone-derived behavioral measure for epidemic intelligence.
Our study comes with a number of limitations and opens new directions for future work.We considered mobility metrics derived from one data provider, Meta, whose user base is not representative of the population in the countries we considered.However, alternative data sources of mobility indicators in Europe with a similar breadth, such as Google or Apple, do not reach the same spatial granularity and provide their data only as relative changes with respect to a pre-pandemic baseline, thus limiting their use in a study like ours.On the other hand, movement and colocation maps by Meta have been extensively used in several studies, including European countries [53,54,55,56,57].Here, we considered four countries with different public health systems, and that adopted different testing strategies.Observed differences in the predictive power of mobility metrics across countries may depend on the varying quality of their reporting systems, especially at the province level.However, all four countries belong to the European Union and we expect very similar standards of surveillance during the pandemic.Overall, it will be important to assess our findings on mobility data from other providers, and, most importantly, in countries of the non-Western world.Finally, it is important to note that transfer entropy measurements become more accurate as the length of the source and target time series increases [44].We worked with a relatively short time series, addressing the bias due to the small sample by adopting the effective transfer entropy.However, we could not systematically investigate how the information transfer changed over time, performing our analysis over different time windows and comparing them.Future work could benefit from longer epidemic time series, over several years, to identify temporal changes in the information flow between human movements and COVID-19 dynamics.
Measures of human mobility inferred from mobile phone data have been a critical ingredient to inform the public health response during the COVID-19 pandemic [58] and they will be an important asset in the fight against future pandemics.At the same time, their widespread use raises some relevant ethical concerns due to re-identification risks [59], therefore, it is fundamental to assess the added value of using cell phone mobility data in a given epidemic scenario and whether the benefits outweigh the risks.Our work provides a practical guide to identifying when and where mobile phone mobility metrics truly capture behavioral patterns that are relevant to predict disease dynamics.

Epidemiological indicators
We collected epidemiological time series in the 4 countries under study from 2 data sources.
Daily reported cumulative COVID-19 cases were collected from the COVID-19 Data Hub [60], an open source aggregator of up-to-date COVID-19 statistics, at the NUTS3 level in Austria, France, Italy, and Spain.
Daily reported cumulative deaths in Austria, France, and Spain were also collected from the COVID-19 Data Hub.For Italy, death statistics were only available on a weekly time scale from the public platform CovidStat (https://covid19.infn.it/iss/).
For the analysis, we generated daily incidence time series from cumulative data by computing day-to-day differences.Then, we further aggregated the daily time series of deaths and cases into weekly ones, to perform the transfer entropy analysis on a weekly scale.

Mobility derived indicators
In our study, we computed daily and weekly movement and contact rates from data provided by Meta through its Data for Good program [41].Here, we first describe the raw data sources provided by Meta and then the data processing we applied to compute the time series for the transfer entropy analysis.

Raw data sources
We collected the following datasets that were publicly released by Meta since the beginning of the COVID-19 pandemic, in Austria, France, Italy, and Spain: • Movement range maps.It reports the number of users who moved between any two 16-level Bing tiles, with an 8-hour frequency.
• Users' population.It reports the number of active users in each tile with an 8-hour frequency.The tile resolution is 4800 x 4800 m 2 .
• Colocation maps.It estimates the probability that, given any two administrative regions, p 1 and p 2 , a randomly chosen user from p 1 and a randomly chosen user from p 2 are simultaneously located in the same place during a randomly chosen minute in a given week [39].The dataset also reports the number of users in p 1 and p 2 .
• Stay put.It reports for a given administrative region the daily percentage of users staying put within a single location, defined at the 16-level Bing tile.
We formalize the description of the above datasets with the notation described in province population users within tile province movement between tiles province movement S r,d ∀p ∈ r r = p mean (d ∈ w) S p,w province stay put Table 5: Aggregation of data sources described in Table 4, to generate our metrics of interest.

Aggregation of raw data
We then processed the raw data sources of Table 4 to obtain a set of time series having the same spatiotemporal resolution, that is weekly, at the NUTS3 scale.Results of the aggregation process are described in Table 5.More in detail: • Province users population.(3) we performed a temporal aggregation by averaging in each province, the 8h population records within a week.
• Within tile province movement (1) we first performed a temporal aggregation by averaging M (t1,t2),h for each pair (t1, t2) over a week and obtaining M (t1,t2),w (2) we then performed a spatial feature joining and assigned each pair (t1, t2) to the corresponding provinces (p1, p2) (3) from M (t1,t2),w we obtained a within tile province movement , that is the sum of movements which occurred in the same province p and within the same tile.
• Between tiles province movement in the pipeline above, from step (3) we obtain a between tile province movement M (between) p,w , that is the sum of movements which occurred in the same province p and between two different tiles, (t1, t2).By definition, the sum M (between) p,w + M (within) p,w represents the total volume of movements in a province, in a Given two discrete temporal signals represented as time series X and Y the Transfer Entropy (TE) [38] is a measure of the amount of information delivered from X to Y , defined as: where X (l) , Y (l) are respectively the l-lagged time series of X and Y and T E XY is formulated as a difference between two conditional entropy terms, where conditional entropy is expressed as , and H(•) is the Shannon Entropy.Given a discrete time series S, its observations can be expressed as the sample {s i ; i = 1, .., n}, and we obtain the discrete probability distribution p(s j ).We compute the Shannon Entropy as: Thus T E XY can be expressed as: The time series that we consider in our experiments are continuous, therefore they need to be discretized before computing T E XY .We employ the Kernel Density Estimation (KDE) for Transfer Entropy estimation.KDE method evaluates the entropy terms of Eq.5 from the discretized density estimated from each of the four features sets: {(Y, Y (l) ), Y (l) , (Y, Y (l) , X (l) ), (Y (l) , X (l) )}.
KDE employs a Gaussian kernel for density estimation.Performing tests on synthetic datasets of different sizes, we checked this was the method the most adapted to small samples.For the selection of the kernel's bandwidth, we use the Scott method [61].The continuous density is then discretized with a grid obtained by an equal-width discretization of each feature's density domain.We select 20 as the number of bins for each feature's domain discretization.The discretized density is computed with the integral of the continuous probability density functions over each grid cell.Concerning the implementation, for TE estimation we use the PyCausality Python package (https://github.com/ZacKeskin/PyCausality).
Effective Transfer Entropy.We introduce the Effective Transfer Entropy (ETE) as a correction to TE for small sample time series, as originally proposed by [44]: where the correction term is obtained by performing N s iterations of Y shuffling, obtaining Ŷj and computing the average of {T E X Ŷj ; j = 1, .., N s }.In our experiments, we performed 500 shuffling iterations.
Normalized Transfer Entropy.We would like to employ TE in order to compare a set ) as in [62]: In this way, the NETE accounts both for bias in small sample time series and it ensures comparability between different input sources {X j } in terms of information transfer to different targets.
Besides, it enables estimating the percentage of explanation value added with respect to only knowing the past of the time series used as a target.

Classification model
The introduction of the ETE allows associating a p-value, a metric of statistical significance, to each NETE value computed between any pair of time series.
In our study, we investigated a number of explanatory features to better understand why in some provinces the NETE could not identify a significant transfer of information between mobility time series and epidemiological indicators.More specifically, we trained a Random Forest classification model to predict the significance of N X→Y at the threshold of p < 0.01, in each province under study.The random forest was performed with 100 decision tree classifiers on various sub-samples of the dataset and used averaging to improve the predictive accuracy and control for over-fitting.The function to measure the quality of a split was the Gini impurity.
Before applying the random forest, the data were split between training and test sets (30%).To compensate for the imbalance of the datasets, we applied a Synthetic Minority Oversampling Technique [63] on the test set.
As input to the classification model we used a set of features that characterize each province: 1. population size (as reported by the latest available census); 2. area (in km 2 ); 3. density of Facebook users (measured as N p,w divided by area); 4. total cumulative number of reported COVID-19 deaths during the study period; 5. commuting flow;

coverage consistency;
The commuting flow is defined as the ratio between the total number of daily commuters who travel from or to a province and the total number of commuters who work and live in that province.Commuting data were collected from the latest available census statistics in each country.The coverage consistency is the correlation over time between the users' populations and N (coloc) p,w .
To quantify the importance of different features in our classification model, we used their SHAP (SHapley Additive exPlanations) values [49].SHAP is a method to explain model predictions based on Shapley Values from game theory.In particular, we use TreeSHAP [64], an algorithm to compute SHAP values for tree ensemble models, such as the random forest classifier of our study.

Data and code availability
The data and code to reproduce our analysis are available at: https://zenodo.org/record/7464949#.Y6L0CfxKhNg 6 Funding

Correction to the colocation probability
Colocation maps provided by Meta is defined as the number of colocation events over the number of possible events.This, by design, includes interactions between users staying within the same tile but not having actual contact with other users.For this reason, we estimate the contact rate in each province by removing the contribution due to the users staying put.We explain our approach to estimating such contribution in the following.Let us start by writing the original colocation probability P as: where: • E is the number of colocation events within the province • N is the number of province colocation users.
The exact formula should be P = E N (N −1) but as N is large we approximate it to 8. Let us denote R (c) the number of measured colocation events that are due to users who stay put only, then the corrected colocation probability should be written in the following way: We estimate R (c) by using the stay-put probability S, which is the probability of a user staying put.Let us call the tile population ratio probability distribution {f t l ; t = 1, .., T l } where T is the number of tiles in a province.This gives us an estimate of the contribution of the users who stay put to the colocation probability, as: So we rewrite: Pp,w = P − S 2 We do not have access to the population of the tiles used for the colocation so we make an approximation using the population distribution given for each tile with dimensions 4800 m × 4800 m.As there are by definition 64 colocation tiles within a single population tile, the expression Eq.11 can be formulated as: • N t,w : population at (tile,week) resolution.It is obtained through mean temporal aggregation of N t,h over the week interval denoted by w.
• N p,w : population at (province, week) resolution.It is obtained through sum spatial aggregation of N t,w over the tiles belonging to province p.
• T is the number of tiles 4800 m × 4800 m We can introduce the quantity Q p,w as the sum of squared frequencies of the province tile t,w ) 2 , so that, finally: which is the lowest, i.e. the most granular, level of the standard hierarchy of administrative regions in Europe (Fig.1, leftmost column).In all administrative regions, we collected indicators of the COVID-19 epidemic dynamics, namely, the weekly and daily numbers of new COVID-19 cases and deaths over the period, from September 2020 until July 2021.During this period, the dynamics of COVID-19, exemplified by the incidence of new cases (Fig.1, rightmost column), displayed subsequent waves, as a result of the complex interaction between the spread of new variants, the adoption of non-pharmaceutical interventions, the introduction of vaccines.

Figure 1 :
Figure 1: Summary of behavioral and epidemiological indicators.In each country under study (from top to bottom: Italy, France, Austria and Spain), we consider three different types of indicators: contact rates, movements (here for the sake of simplicity we only show the short-range movements), and COVID-19 cases.In each plot, the blue shaded area highlights the within-country variability, corresponding to time series in every administrative subdivision.The blue solid line represents the average value.All curves are normalized between 0 and 1, corresponding to their maximum value.

Figure 2 :
Figure 2: Illustration of study design.We computed the transfer entropy T E X→Y to measure the information flow between source X (on the left) and target time series Y (right), for a given time lag l.In the figure example, as target time series we consider the number of COVID-19 deaths, D(t).As source time series, we consider either mobility indicators, M s (t), M (t), CR(t), or COVID-19 cases C(t).Transfer entropy quantifies the amount of information that is added by past knowledge of mobility or cases (green and cyan bars, respectively) to current knowledge of deaths, with respect to the knowledge of past deaths only (blue bar).After correcting the TE for small sample sizes, and normalizing by the reference value represented by the blue bar, we finally compare the Normalized Effective Transfer Entropy of mobility and cases (rightmost box).

Figure 3 :
Figure 3: Information flow between COVID-19 incidence and deaths.Normalized Effective Transfer Entropy (NETE) between COVID-19 weekly reported cases and deaths in the NUTS3 administrative subdivisions (provinces) of Austria, France, Italy and Spain.NETE is computed for lags ranging from 1 to 8 weeks, on the x-axis.Boxplots are computed on the distribution of NETE values of all the administrative subdivisions in each country.The horizontal red line marks the value N C→D = 0.

Figure 4 :
Figure 4: Information flow from mobility data to COVID-19 incidence and deaths.Comparison between the normalized effective transfer entropy computed from source time series X and target time series of reported COVID-19 deaths D (a) and cases C (b). Source time series are COVID-19 cases (only for deaths), contact rates, short range and mid-range movement.Boxplots are computed from the distribution of NETE values for a given time delay, l.In panel a: l= 2 weeks for cases, 7 weeks for contact rates and movement.In panel b: l= 6 weeks for short range and mid-range movement.The horizontal red line marks the value N X→D = 0.

Figure 5 :
Figure 5: Spatial variations of normalized effective transfer entropy.Maps of NETE values computed for different source time series and weekly COVID-19 deaths, in the provinces of Spain: (a) source is COVID-19 cases at lag l=2 weeks, (b) source is contact rate at lag l=7 weeks, (c) source is short-range movement at lag l=7 weeks.(d) source is mid-range movement at lag l=7 weeks.Dark grey indicates provinces with non-significant values of NETE (p > 0.01).

Figure 6 :
Figure 6: SHAP plots of feature importance to predict the statistical significance of the NETE for all selected provinces.Color represents the feature value (blue is low and red is high).Panel a describes the results for N M s →D , panel b for N CR→D .The SHAP value, on the horizontal axis, indicates the feature importance on the model output, with larger values corresponding to higher relevance.Each dot represents a single observation.Features are ranked by importance.

( 1 )
we performed a spatial aggregation by summing the population of tiles belonging to province p, thus obtaining a population at a (province, hour) level: N (pop) p,h .(2) we performed a linear interpolation of the temporal gaps that were present in N (pop) p,h of input signals {X j ; j = 1, .., N } in terms of their Transfer Entropy T E Xj Y towards a specific output Y .From equation 4 we have that T E Xj Y is evaluated as a difference of conditional entropy where the first term H(Y |Y (l) ) depends only on target Y .In order to ensure comparability over the set {T E Xj Y ; j = 1, .., N }, we reformulate the difference as a relative difference dividing by H(Y |Y (l) ).Thus the set of inputs are compared according to {T E Xj Y /H(Y |Y (l) ); j = 1, .., N } and we refer to T E XY /H(Y |Y (l) ) as Normalized Transfer Entropy (NTE).Normalized Effective Transfer Entropy.By combining the ETE and the NTE we can finally introduce the Normalized Effective Transfer Entropy (NETE), which is obtained by dividing the ETE by the first conditional entropy term H(Y |Y (l)

F
.D. gratefully acknowledges support from the CRT Lagrange Fellowships in Data Science for Social Impact of the ISI Foundation, where this work was conducted.M.T. and L.G. acknowledge the Lagrange Project of the ISI Foundation funded by CRT Foundation.The funders had no role in the study design, decision to publish, or preparation of the manuscript.

Figure S1 :
Figure S1: Comparison of NETE values computed on weekly and daily time series.N C→D computed between time series data collected on a weekly time scale (bottom row) and a daily one (top row).Daily time series were available only for Austria, France and Spain.

Figure S2 :
Figure S2: NETE values from contact rates to deaths in Austria.Only statistically significant values are shown (p-value< 0.01).

Figure S3 :
Figure S3: NETE values from contact rates to deaths in France.Only statistically significant values are shown (p-value¿0.01).

Figure S4 :
Figure S4: NETE values from contact rates to deaths in Italy.Only statistically significant values are shown (p-value< 0.01).

Figure S5 :
Figure S5: NETE values from contact rates to deaths in Spain.Only statistically significant values are shown (p-value< 0.01).

Figure S6 :
Figure S6: NETE values from movements to deaths in Austria.Only statistically significant values are shown (p-value< 0.01).

Figure S7 :
Figure S7: NETE values from movements to deaths in France.Only statistically significant values are shown (p-value< 0.01).

Figure S8 :
Figure S8: NETE values from movements to deaths in Italy.Only statistically significant values are shown (p-value< 0.01).

Figure S9 :
Figure S9: NETE values from movements to deaths in Spain.Only statistically significant values are shown (p-value< 0.01).

Figure S10 :
Figure S10: Comparison of NETE values computed on full time series and reduced time series.N M →C computed between time series data collected including the vaccination campaign (full) and not (reduced).The reduced study period ranges from September 1, 2020 to January 31, 2021.The full study period extends up to July 31, 2021.We consider daily time series only to address biases due to small samples.

Figure S11 :
Figure S11: Spatial variations of normalized effective transfer entropy.Maps of NETE values computed for different source time series and weekly COVID-19 deaths, in the provinces of Austria: (a) source is COVID-19 cases at lag l=2 weeks, (b) source is contact rate at lag l=7 weeks, (c) source is short-range movement at lag l=7 weeks.(d) source is mid-range movement at lag l=7 weeks.Dark grey indicates provinces with non-significant values of NETE (p > 0.01).Provinces in white are excluded from our sample.

Figure S12 :
Figure S12: Spatial variations of normalized effective transfer entropy.Maps of NETE values computed for different source time series and weekly COVID-19 deaths, in the provinces of France: (a) source is COVID-19 cases at lag l=2 weeks, (b) source is contact rate at lag l=7 weeks, (c) source is short-range movement at lag l=7 weeks.(d) source is mid-range movement at lag l=7 weeks.Dark grey indicates provinces with non-significant values of NETE (p > 0.01).Provinces in white are excluded from our sample.

Figure S13 :
Figure S13: Spatial variations of normalized effective transfer entropy.Maps of NETE values computed for different source time series and weekly COVID-19 deaths, in the provinces of Italy: (a) source is COVID-19 cases at lag l=2 weeks, (b) source is contact rate at lag l=7 weeks, (c) source is short-range movement at lag l=7 weeks.(d) source is mid-range movement at lag l=7 weeks.Dark grey indicates provinces with non-significant values of NETE (p > 0.01).Provinces in white are excluded from our sample.

Figure S14 :
Figure S14: Percentage of statistically significant NETE values, disaggregated by country and by mobility metric used as source variable.Target variables are: weekly COVID-19 cases (panel a) and weekly COVID-19 deaths (panel b).
19 deaths, which are encoded in the value of N M →D ,N M s →D and N CR→D , we compared them against the transfer entropy N C→D , where C is the time series of new COVID-19 cases.As the causal relationship between the number of cases and deaths is established by definition, we used the transfer entropy N C→D as a benchmark to evaluate the added value of mobility indicators to predict COVID-19 deaths.As an example, similar values of N M s →D and N C→D would suggest

Table 1 :
Percentage of statistically significant NETE values across provinces in all the countries studied.This table shows the percentage of provinces, in all countries, in which the NETE is statistically significant (p < 0.01) for lags (l) from 2 to 8 weeks.

Table 2 :
NETE results across provinces in all the countries studied.The table shows the average relative explanation added by source time series, with respect to past knowledge of the target only.Only provinces having a statistically significant NETE are considered.Numbers in parenthesis report the standard deviation computed over all provinces for which the NETE was statistically significant.datato capture behavioral patterns relevant to predict COVID-19 spread.By focusing only on those provinces where we could identify a significant information flow between mobility traces and COVID-19 indicators, we observe that the averaged relative explanation added by mobility data with respect to the epidemiological data ranges between 4 − 6%, which is about half of the averaged relative explanation added by past knowledge of cases to the prediction of future deaths (see Tab. 2 and Figs.S2-S9 in the SI).

Table 3 :
Classification performance metrics.Summary of model's classification performance to predict the statistical significance of NETE at the p < 0.01 threshold when the input source is short-range movement (a) or contact rate (b) and target variable are COVID-19 deaths.

Table 4 :
Summary of raw data sources as time series records X s,t , where s denotes the spatial resolution and t the temporal resolution.

Table S6 :
Relative proportion of mobility components in each country.Each row displays the proportion of movements, as a percentage of the total movements within each province, that are represented by the short-range mobility (M s (t)) and the mid-range mobility (M (t)).Each table entry reports the median value and the IQR, computed over all provinces, and all weeks of the study period.Short-range mobility represents the large majority of movements within a province, in all countries.

Table S7 :
Coverage consistency and commuting flow distributions by country.Each table entry reports the median value and the IQR computed over all provinces, in each country, considered in the study.
Nt,wNp,w ; t ∈ p : tile t population frequency in province p.