Predictability of real temporal networks

Abstract Links in most real networks often change over time. Such temporality of links encodes the ordering and causality of interactions between nodes and has a profound effect on network dynamics and function. Empirical evidence has shown that the temporal nature of links in many real-world networks is not random. Nonetheless, it is challenging to predict temporal link patterns while considering the entanglement between topological and temporal link patterns. Here, we propose an entropy-rate-based framework, based on combined topological–temporal regularities, for quantifying the predictability of any temporal network. We apply our framework on various model networks, demonstrating that it indeed captures the intrinsic topological–temporal regularities whereas previous methods considered only temporal aspects. We also apply our framework on 18 real networks of different types and determine their predictability. Interestingly, we find that, for most real temporal networks, despite the greater complexity of predictability brought by the increase in dimension, the combined topological–temporal predictability is higher than the temporal predictability. Our results demonstrate the necessity for incorporating both temporal and topological aspects of networks in order to improve predictions of dynamical processes.


INTRODUCTION
Link temporality describes the time-varying nature of couplings and interactions between nodes in real networks [1][2][3][4][5][6][7][8][9][10][11][12], which has been found to significantly affect network dynamics. Examples include innovative or epidemic diffusion [13], information aggregation [14], the emergence of cooperation [15] and the achievability of control [16]. Hence, in order to alter network dynamical states in a desirable way, it is essential to quantitatively understand both topological and temporal patterns. This raises a fundamental question: how predictable are real temporal networks? This question is much broader and distinct from time-series forecasting [17][18][19], which aims to predict the future evolution of single variables, and link prediction [20][21][22][23], the goal of which is to uncover the missing or future links in static networks. Here we offer an entropy-rate-based framework that considers the combined topologytemporal patterns and apply them to a wide range of model and real weighted and unweighted temporal networks, uncovering the prediction limits of real temporal networks.

ANALYTICAL FRAMEWORK FOR PREDICTABILITY
A temporal network with n nodes consists of a series of snapshots (Fig. 1A), which can be described as a 2D expanded matrix M (Fig. 1B). Each column in M represents one snapshot and each row represents the temporality of a possible link, i.e. whether this link is present in a specific snapshot and its weight. Since all pairs of nodes must be taken into account, the number of rows in M is n 2 . The full information of the temporal network is encoded by this matrix M , which can be viewed as a stochastic vector process-a sequence of random vectors. To quantify the predictability of this vector process, we use the entropy rate, H , i.e. the asymptotic lower bound on the per-symbol description length [24], which is a rigorous measure of the level of randomness in the process. As illustrated in Fig. 1C, H can be calculated using a generalized Lempel-Ziv algorithm [25], of which the essence is to calculate the recurrence times of different patterns within a square: a 2D square with side k is defined as M C (k) , where C (k) = {v = (t, s ) ∈ Z 2 : 0 ≤ t ≤ k, 0 ≤ s ≤ k} and v when the temporal network has a large number of snapshots (see Supplementary Material, Section II). The predictability of a temporal network is the probability M that a predictive algorithm can correctly forecast the future evolution of this network based on its history. Once we have the entropy rate H (M ), the upper bound of predictability max M can be obtained by solving (2) where N is the number of unique values in matrix M (see the 'Methods' section and Supplementary Material, Section III). Here, max M is the fundamental limit of predictability, i.e. in principle, no algorithm can predict the temporal network with an accuracy higher than max M . It is worth stressing that the entropy rate obtained with the generalized Lempel-Ziv algorithm is an asymptotic measure of randomness and Eq. (1) becomes more accurate when the number of time steps is larger. Hence, given the finite number of snapshots in real temporal network data sets, the calculated value of max M should be interpreted as an asymptotic estimate of the upper bound of predictability. Moreover, max M is an intrinsic property of the temporal network and does not depend on a specific predictive algorithm.
The snapshots of real temporal networks are usually very sparse, so most rows in M consist of many zeros. Thus, we sort the rows, i.e. all potential links, in descending order according to the number of their occurrences in all snapshots and remove those links that are present in <10% of the snapshots, obtaining a new matrix M (see the 'Methods' section). Our analyses in both model and real networks show that the filtering process and the ordering of rows in M have a negligible effect on the predictability (see Fig. 1D, and also Supplementary Material, Sections IV and V); therefore, we use max M to quantify the predictability of temporal networks hereafter. Note that the original entropy rate (Eq. 1) applies to square matrices only, although the matrix M of a temporal network can be non-square. To overcome this issue, we split the original matrix into smaller squares with shorter history and find a linear relationship between the predictability and the number of squares, implying that longer history leads to higher predictability, allowing us to calculate the predictability of any temporal network (see the 'Methods' section and Supplementary Figs 1 and 2 in Supplementary Material, Section III).

VALIDATION ON MODEL NETWORKS
Next, we test and validate our measure, max M , i.e. the topological-temporal predictability (TTP), in synthetic weighted temporal networks ( Fig. 2A). The initial snapshot is a network with communities generated by a stochastic block model [26] with links assigned with random weights. In each snapshot henceforth, to generate a neighbor correlation for each link, we activate either the temporal parameter γ or the structural parameter β. With probability β, we modify the structure and the link changes its weight to that of an adjacent link; with probability γ , we modify the temporal aspect and the link weight stays the same as in the last snapshot; otherwise, the link is assigned a random value (see the 'Methods' section). Long-range correlations are generated through 2D fractional Gaussian noise (FGN) [27], with a power-law correlation function C (r, ϕ) = r −γ x cos 2 ϕ + r −γ y sin 2 ϕ, where (r, ϕ) are polar coordinates and γ x is regarded as a decay parameter in the temporal dimension, while γ y is for the topological dimension. We compare, in Fig. 2B and C, our TTP with an existing measure, namely temporal predictability (TeP) [28], which considers the links of a temporal network as merely a set of uncorrelated time series and captures only the temporal regularity (see the 'Methods' section), and also with three predictive algorithms. For this, we employ three commonly used methods, namely Markov [29], ConvLSTM [30] and PredNet [31], to forecast the future evolution of real networks. Markov considers a temporal network as a set of uncorrelated time series, ConvLSTM takes into consideration link correlations, and PredNet is a dynamic matrix-prediction algorithm based on ConvLSTM (see Supplementary Material, Section IX for details). As shown in Fig. 2B, TeP is significantly smaller than TTP and, when parameters β and γ increase, TeP can be seen to be nearly independent of the structural parameter β, due to the fact that TeP only partially characterizes the regularity of temporal networks. Note that, since β and γ are not completely independent of each other, TTP is still slightly higher than TeP even when β and γ

RESEARCH ARTICLE
The fluctuations of topological-temporal predictability (TTP) for different orders of rows in matrix M . All matrices are extracted from a synthetic temporal network in Fig. 2A with rewiring probability p = 0.5. We also change p and observe that it has no effect on the results (see Supplementary Material, Section V).
approach zero (highest correlations). The higher accuracy of PredNet than the upper bound of predictability provided by TeP, as well as the poor performance of Markov, both indicate the significance of topological information. The unexpected insufficient performance of ConvLSTM, however, is caused by the deconvolution layer, which introduces errors. We further find similar results for 2D FGN, although the varying range of predictabilities is much smaller due to fewer possible values.
We also introduce synthetic unweighted temporal networks (Fig. 2D) to validate our measure (TTP). The initial snapshot is a ring and each snapshot thereafter is generated by randomly rewiring a fraction p of links in the most recent snapshot. Obviously, as p increases, the network becomes more random, and hence less predictable (see Supplementary Material, Section VI). However, the structural consistency of the aggregated network-a measure that captures only topological regularity on static networks [32]-leads to conflicting increasing predictability (Fig. 2E), demonstrating again the necessity for considering link temporality.
In contrast, our measure, TTP, decreases monotonously when p increases. Yet, due to the high sparsity of these temporal networks, TTP remains high for all values of p. To remove the impact of sparsity, we define and calculate the normalized topological-temporal predictability (NTTP) = (TTP − TTP bl )/(1.0 − TTP bl ) for TTP bl < 1.0 (see the 'Methods' section), where NTTP is the normalized TTP and TTP bl is the TTP of the shuffled network, which can be viewed as the lower bound of the predictability of temporal networks. In comparison with NTTP, we also normalize TeP (called here the normalized temporal predictability (NTeP)) over shuffled links (see the 'Methods' section). As shown in Fig. 2F, for p = 0, the network is fully predictable (NTTP ≈ 1.0) and, for p = 1.0, the network becomes totally random and unpredictable (NTTP vanishes). Even though NTeP has the analogous decreasing behavior, it is usually lower than NTTP due to the lack of topological information. Therefore, the NTTP indeed captures the intrinsic regularity of temporal networks.

PREDICTABILITY AND PREDICTIVE ALGORITHMS ON REAL NETWORKS
We apply our framework on 18 real temporal networks in diverse scenarios, including animal interactions, human contacts, online communications, political events and transportation (see Supplementary Material, Section I for the description of these network datasets). We group these networks into five categories and reveal the intrinsic predictability profile, consisting of NTTP and NTeP, for each network (Fig. 3A). We find that human contacts have the highest averaged NTTP, probably resulting from their synchronized bursty nature, while temporal regularities dominate the overall predictability of To test the impact of link weights on predictability, we develop a temporal stochastic block model with nearestneighbor correlations (see Supplementary Material, Section VII for details) and long-range correlations [27]. The initial snapshot is generated by the stochastic block model [26], consisting of nodes uniformly assigned to specific communities. There are four communities with 100 nodes and 300 snapshots in each network, while the degree of each node is 3. The link weights in subsequent snapshots are generated according to topological parameters and temporal parameters for the two models, respectively, without changing the network topology. (B) Two predictability measures (TTP and TeP) with three predictive algorithms (Markov [29], ConvLSTM [30] and PredNet [31]) on nearestneighbor correlations with topological parameter β and temporal parameter γ . Maximum of β or γ means the strongest memory in the topological or temporal dimension. TeP is obtained by averaging PIL. (C) Two predictability measures with three predictive algorithms on long-range correlations with a power-law correlation function C (r , ϕ) = r −γx cos 2 ϕ + r −γy sin 2 ϕ, where (r , ϕ) are polar coordinates and γ x is regarded as the temporal parameter, while γ y is the topological dimension. Results are averaged over 10 independent realizations of the networks. (D) To test the impact of network topology on predictability, we develop an evolving small-world network model. The first snapshot is a ring network; subsequent topologies of the network are generated by randomly rewiring a fraction p of links in the previous snapshot. (E, F) Predictabilities of evolving smallworld networks against rewiring probability. The networks are generated by the model in (D) with 50 nodes and average degree 2. Structural consistency (SC) is an existing predictability measure for static undirected and unweighted networks [32]. We normalize TTP over the TTP bl to eliminate the impact of link sparsity (see the 'Methods' section), obtaining the intrinsic predictability of a temporal network, and also obtain normalized TeP for comparison (see the 'Methods' section).
transportation networks due to the periodicity of each link (see Supplementary Material, Section VIII for details). Since the baselines for the normalizations in NTTP and NTeP are different, NTeP can be higher than NTTP. However, we find another interesting phenomenon in most networks (excluding Enron-Email (EE), Levant-Event (LE), Aviation-Network (AN) and Britain-Transportation (BT)): the intrinsic combined predictability is higher than TeP despite the greater complexity of capturing 2D regularity rather than 1D. This implies the significance of the topological information as well as the correlation between the temporal and topological patterns. Surprisingly, we also find strong correlations between topological regularity (characterized by the Hamming distance between each link pair) and the difference between TTP and NPIL (normalized predictability of individual links) (see Fig. 3B and C), suggesting that the intrinsic predictability of real networks mostly originates from temporal and topological regularity, rather than from the interdependence between them.
Next, we compare our measure to the predictive power of existing algorithms. We find that the above existing algorithms mostly fall short in prediction (see Fig. 4). Indeed, for a few networks (Ant-Colony (AC): p-value = 7.1 × 10 −15 , College-Message (CM): p-value = 6.1 × 10 −7 ), their accuracy is higher than the maximum predictability found by TeP. This is probably because TeP fails to incorporate the topological aspects. However, we found that the predictability given by our TTP measure always remains out of reach from the current algorithms, indicating again that TeP alone cannot characterize the regularities in temporal networks. Figure 4. Predictive power of existing algorithms. Markov considers a temporal network as a set of uncorrelated time series [29], ConvLSTM takes into consideration of link correlations [30], and PredNet is a dynamic matrix-prediction algorithm based on ConvLSTM [31] (see Supplementary Material, Section IX for details). Error bars are the standard deviation of each algorithm over 10 different runs. Note that all algorithms do not reach our topological-temporal predictabilities of real temporal networks. The accuracy of at least one algorithm is higher than TeP on AC, Marseilles-Contact (MC), Workplace-Contact (WC), College-Forum (CF) and CM networks.

DISCUSSION AND OUTLOOK
We developed a 2D framework, based on combined topology-temporal features, for quantifying the intrinsic predictability and uncovering the predictability profile of any temporal network. Importantly, we find that the accuracy of current algorithms could be higher than the current temporal-only predictability methods for some real temporal networks. Furthermore, they never exceed our TTP measure. Given the fact that predictability is an essential property of temporal networks, our findings suggest that more accurate predictive algorithms are needed to capture the regularities of real temporal networks, i.e. there is room for researchers to continue improving their predictive algorithms. In addition, applying our measure of predictability to detect the changing points of temporal networks and systematically investigating the impact of predictability on dynamical processes and control on temporal networks are worth future pursuits.

Matrix filtering
As mentioned above, most rows in matrix M for a real temporal network consistently remain zero. Such link sparsity leads to high predictability. We sort links 1 , 2 , . . . , n in the matrix M by their  activation rates a 1 , a 2 , . . . , a n in descending order, hence a 1 > a 2 > . . . a n , then obtain M according to the filtering rules: Since the estimation of the entropy rate converges to the real entropy when the size of the matrix goes to infinity [25], we include at least m θ most active links or 60% of non-zero elements in the matrix to diminish errors. Due to computational restrictions, we set m θ = 1000, although, in principle, it could be higher with sufficient resources. All the calculations are performed on M . We show that the matrix-filtering method has no influence on the results after reducing the matrix M to M in Supplementary Material, Section IV.

Derivation of predictability and its normalization
Although a detailed derivation is provided in Supplementary Material, Section II, here we adumbrate the main steps used to derive the upper bound of predictability. The entropy rate of a temporal network, which is characterized as a random field, is defined as where history of M l t ≡ {M i j : ( j < t) or ( j = t and i < l )}.

RESEARCH ARTICLE
Tang et al. 935 Suppose P(M l t = M l t |τ l t ) is the probability that is based on the history τ l t , the actual value of M l t agrees with our estimation M l t , and λ(τ l t ) is the probability that M l t takes the most likely value given τ l t , thus Let P(τ l t ) be the probability of observing a specific history. It can be demonstrated that the best prediction strategy based on this history is to adopt the most likely value [28]; thus, the predictability of M l t is Then the overall predictability of a random field is Because the entropy increases as the distribution becomes uniform, the distribution created by setting the remaining probabilities to be the same while preserving the most likely value λ(τ l t ) = p max has an entropy no less than the original distribution. Note that M v ∈ A, M v ≡ |A| and denote N ≡ |A|. The entropy of the new distribution is Then the solution of max M in the above equation is the upper bound of predictability M . We adopt the entropy estimator [25] as the entropy rate H (M ) for the calculation of predictability's upper bound max The link sparsity of a temporal network largely determines its predictability even after we adopt matrix filtering. To remove the impact of sparsity and obtain the intrinsic predictability of a temporal network, we normalize max M over the baseline (i.e. predictability of shuffled network), which captures only the regularity in the link-weight distribution. Therefore, we have where p is the original predictability and b is the baseline. It is worth noting that the NTeP is the average of the NPIL, which is obtained by normalizing PIL over its own baseline PIL bl .

Generalization of predictability using predictive congruency
Since the application of our entropy estimator is limited to only square matrices [25], we explore the correlation of weighted-average predictability and number of squares by gradually splitting the nonsquare matrix into a set of units, i.e. 1 × 1 squares, and compute the predictability of each square. We find the linear relationship between the weightedaverage predictability and the number of squares, including units. Assume matrix M is split into Q squares s 1 , s 2 , . . . , s Q in the first splitting stage, along with u units; let e s 1 , e s 2 , . . . , e s Q be the sizes of squares, then the areas of the squares are e s 1 2 , e s 2 2 , . . . , e s Q 2 . It is worth noting that we define the predictability of units as 1 |A| , where A is the finite value set of link weights in the temporal network. Then the weighted-average predictability at stage i , of which the weight equals the portion of corresponding square in the matrix, is defined as while the number of squares for splitting stage i is where 1 ≤ i ≤ Q − 1, k and b are constants. According to our observations of k < 0, the negative linear relationship between p i and N i indicates the positive correlation between the length of memory and predictability, since a smaller N i means larger squares with more memory. We define this as

RESEARCH ARTICLE
predictive congruency and use it to obtain the TTP of each temporal network with non-square matrix

Synthetic networks
The temporal stochastic block model is used to test the impact of link weight while topology remains invariant. When generating neighbor correlation in a temporal stochastic block model, we determine the weight of links individually according to parameters β and γ . For each link, there is a probability of modifying the link weight based on structural or temporal aspects. If the structural parameter is selected, then the probability for the link to adopt the same weight as its neighboring link is β; when the temporal parameter is activated, the probability for the link to remain the same as the previous snapshot equals γ . Otherwise, the link is assigned a random value. Suppose there are m links 1 , 2 , . . . , n in the matrix; then, the probability density function of a link weight at a certain time is f( i (t)) = p β βδ i( t) j (t) + p γ γδ i( t) i( t−1) where i, j ∈ [1, n], i = j . p β and p γ are the probabilities to choose the structural parameter β and temporal parameter γ , respectively, p β + p γ = 1. δ xy is the Kronecker delta function, and r is a random number. Specifically, we assume p β = p γ = 0.5 and generate the matrix column by column, from top to bottom within each column. If a link is determined to adopt the weight of its neighboring link at a certain time, we assign the prior link weight to it, which is its adjacent element in the matrix. The initial snapshot of the evolving small-world model is a ring network; then, we obtain each snapshot by rewiring a fraction of links in the previous snapshot.

TeP and NTeP
TeP is the average PIL in the network (Fig. 1). To eliminate the influence of sparsity, we also normalize PIL over its own baseline to obtain NPIL, and NTeP as the average of NPIL. , otherwise , where PIL bl is the PIL of shuffled links.

SUPPLEMENTARY DATA
Supplementary data are available at NSR online.