The friendship paradox in real and model networks

The friendship paradox is the observation that the degrees of the neighbors of a node in any network will, on average, be greater than the degree of the node itself. In common parlance, your friends have more friends than you do. In this paper we develop the mathematical theory of the friendship paradox, both in general as well as for specific model networks, focusing not only on average behavior but also on variation about the average and using generating function methods to calculate full distributions of quantities of interest. We compare the predictions of our theory with measurements on a large number of real-world network data sets and find remarkably good agreement. We also develop equivalent theory for the generalized friendship paradox, which compares characteristics of nodes other than degree to those of their neighbors.


I. INTRODUCTION
It may appear to you that your friends are more popular than you are, and if so, you may well be right. It is however not necessarily your fault. It is, rather, a natural consequence of network structure. Feld [1] has shown that in any network the average degree (i.e., the number of neighbors) of the neighbor of a node is strictly greater than the average degree of nodes in the network as a whole. Applied to networks of friendship, this implies that on average your friends have more friends than you do. This phenomenon is known as the friendship paradox.
A related phenomenon, the generalized friendship paradox, describes similar behavior with respect to other attributes of network nodes [2]. Are your friends richer than you, for instance, or smarter, or more attractive? Generalized friendship paradoxes arise when such attributes are correlated with node degree. If richer people are on average also more popular, then wealth and popularity will be positively correlated and hence the tendency for your friends to be more popular than you could mean they are also richer. Examples of generalized friendship paradoxes occur for instance with citation counts in collaboration networks (your collaborators have more citations than you do) [3] and viral content dissemination in online social networks (your online friends receive more viral content than you do) [4].
Understood as a mathematical statement about network averages, the friendship paradox occurs in all networks. How the effect is manifested, however, depends on the details of a network's structure. For any given person we can measure the difference between how popular their friends are and how popular they are. The results of Feld [1] tell us that this difference must be positive on average, but to accurately characterize the effect we should consider the entire distribution of differences. How large will the differences be? Is their average driven by a hand- * Email address: gcant@umich.edu ful of outliers? How much variation is there? For how many people will the difference be positive? The answers to such questions all depend on the specific details of the network under study.
In this paper we develop the theory of the friendship paradox in both its original and generalized versions. We derive expressions for the full distribution of differences in three progressively more complex network models and find that the observed distributions in real-world networks are in good agreement with our theoretical results.

II. THE FRIENDSHIP PARADOX
Informally, the friendship paradox states that people's friends tend to be more popular than they themselves are. Stated a little more precisely, nodes in a network tend to have lower degree than their neighbors do. Consider an undirected network of n nodes, labeled by integers i = 1, . . . , n. For any given node i we can compute the difference ∆ i between the average of its neighbors' degrees and its own degree: where A ij is an element of the adjacency matrix and k i = j A ij is the degree of node i. (The value of ∆ i is undefined for nodes with degree zero, since they have no neighbors. We will assume that there are no such nodes in the network, or that all of them have been removed before the analysis.) One statement of the friendship paradox is that the average of ∆ i across all nodes is greater than zero, which can be proven as follows. We write the average as arXiv:2012.03991v1 [cs.SI] 7 Dec 2020 Exchanging the summation variables i and j and adding the result to Eq. (2), this result can also be written The exact equality holds only when k i = k j for all pairs of neighboring nodes, i.e., when the network is a regular graph (or, technically, when every component is a regular graph). In all other cases, the average difference between the mean degree of a node's neighbors and its own degree is strictly greater than zero. For the generalized friendship paradox, which considers attributes other than degree, one can define an analogous quantity ∆ (x) i for an attribute x according to which measures the difference between the average of the attribute for node i's neighbors and the value for i itself.
When the average of this quantity over all nodes is positive one may say that the generalized friendship paradox holds. In contrast to the case of degree, this is not always true-the value of ∆ (x) i can be zero or negative-but we can write the average as where the second line again follows from interchanging summation indices. Defining the new quantity and noting that we can then write Thus we will have a generalized friendship paradox in the sense defined here if (and only if) x and κ are positively correlated. This is perhaps not exactly the result one might have anticipated when, in the introduction, we argued that generalized friendship paradoxes arise when an attribute x is positively correlated with degree-it turns out that correlation with the more complex quantity κ is the crucial behavior. Degree, however, is always positively correlated with κ since, combining Eqs. (3) and (8), we have Cov(k, κ) = (1/n) i ∆ i ≥ 0. While it is not mathematically guaranteed, in practice we therefore expect properties that correlate with degree to also correlate with κ and hence in such situations we can reasonably expect to observe a generalized friendship paradox.

III. MODELS
While these results are illuminating, there is more to be said on this topic. We have shown that the network average of ∆ i will always be greater than zero, but we have not said by how much or what the value depends on. Nor have we considered how ∆ i varies across a network. As an example (albeit a somewhat contrived one), consider a 1000-node network consisting of a complete graph with one edge removed. In such a network almost all of the nodes-998 of them-have a degree larger than the average of their neighbors, but there are two outliers that substantially skew the distribution so that over the whole network nodes are still less popular than their neighbors on average. In this case, therefore, the average is not very informative. To understand the friendship paradox fully we need to move beyond statements about averages.
In doing so, however, the specific structure of the network becomes important. A common way to study the effects of structure is to examine the behavior of model networks and this is the approach we take in the remainder of this paper, considering the friendship paradox for three successively more complex network models, the Poisson random graph (sometimes called the Erdős-Rényi model) [5][6][7], the configuration model [8,9], and a more sophisticated model that incorporates degree correlations.

A. Poisson random graph
The Poisson random graph is the simplest of random graph models and one of the most widely studied. In this model one takes n nodes and connects each pair independently with some probability p. When the network is large and sufficiently sparse-when n → ∞ and p = λ/(n − 1) with λ growing slower than n so that p → 0-the degrees are Poisson distributed with mean λ, meaning that the probability p k of a node having degree k is Let us compute the complete distribution within this model of the quantity ∆ i defined in Eq. (1). To compute this distribution, we note that where P (∆|k) denotes the probability that a node has value ∆ given that it has degree k. For given ∆ and k, Eq. (1) tells us that the sum of the neighboring degrees of the node is j A ij k j = k∆ + k 2 . Let us denote this sum by K. Then we have We know that K is an integer with K ≥ k, since each of the node's k neighbors necessarily has degree at least 1.
Hence the allowed values of ∆ that satisfy K = k∆ + k 2 must be rational numbers of the form ∆ = 1 − k + m/k where m is a non-negative integer.
To evaluate (11) we need to know the distribution of the sum K = j A ij k j of neighbor degrees. While nodes in general follow the degree distribution p k , the degrees of neighbors follow a modified distribution. A neighbor, by definition, is a node arrived at by following an edge and each node of degree k is at the end of k edges, so the degree distribution for nodes at the ends of edges goes not as p k but as kp k , which after appropriate normalization gives a probability distribution q k for neighbor degrees of the form For the Poisson degree distribution of Eq. (9), we then find that In other words q k is also a Poisson distribution, but shifted by one, meaning that k − 1 is a Poisson variable with mean λ. We can write the sum K as But , since the sum of the Poisson variables (k j − 1) is itself a Poisson random variable with mean a factor of k i greater. Hence  of equally spaced peaks and the overall resulting distribution is quite complicated-both widely dispersed and jagged, even for this simple network model. Note also that a significant fraction of nodes have ∆ < 0, meaning that they do not satisfy the traditional definition of the friendship paradox-they have more friends than their average neighbor does. It is also possible to calculate the mean and variance of ∆ for the Poisson random graph. We find that (Note that again we remove any nodes of degree zero before computing E[k −1 ].) While the expected value of ∆ is always 1, the variance is larger than the mean degree λ, meaning that more and more nodes have ∆ < 0 as λ becomes large, with the fraction tending to a half. For example, in Fig. 1, where the mean degree is 8, around 35% of nodes have degree larger than that of their average neighbor. This rises to 44% when the mean degree is 64, and 49% for a mean degree of 1024. For large λ therefore, no meaningful "friendship paradox" applies. Here, as is often the case, looking only at the average value of the distribution is misleading when there is large variation.

B. The configuration model
The random graph of the previous section is in many respects not a realistic model. In particular, as we have noted, it has a Poisson degree distribution, which is very different from the broad degree distributions seen in typical real-world networks [10,11]. We can address this shortcoming by using a more sophisticated random graph model that allows for arbitrary degree distributions, the so-called configuration model [8,9]. In this model one fixes the degree of each of the nodes and then draws a network at random from the set of all networks with the given degrees.
Calculations on networks such as the configuration model can be greatly simplified by using generating function methods [12,13]. For instance, many properties of the model can be expressed in terms of the generating function for the degree distribution p k , defined by We employ this approach here too, but there is a catch, in that generating functions are normally applied to distributions over integer quantities, like the degree, but the quantity ∆, whose distribution is our main focus here, can take non-integer values. To allow for this, we make use of the (two-sided) Laplace transform which is the standard extension of the generating function to a variable x on the real line. However, the distribution of ∆ is not continuousvalued either. It is nonzero on a dense set of rational values but zero everywhere else-see Fig. 1. To allow for this, we consider functions p(x) that are equal to a discrete sum of Dirac delta functions. For the degree distribution p k , for instance, we would define With this definition Eqs. (17) and (18) are essentially equivalent, since but Eq. (18) also allows for quantities like ∆ that have non-integer values. The Laplace transform has several properties that will be useful for our purposes. First, if x is a random variable whose distribution has Laplace transform F x (s), then the Laplace transform for the distribution of ax + b is F ax+b (s) = e −bs f x (as).
Second, if x and y are independent random variables, then the Laplace transform for their sum is the product of the Laplace transforms for x and y alone: These two results now allow us to calculate F ∆ (s), the Laplace transform for P (∆). We follow essentially the same logic as we did for the Poisson random graph: we consider the Laplace transform for the distribution of ∆ for fixed degree, then we average over degree.
If node i has degree k i , then from Eq. (1) ∆ i is Each k j is the degree of a neighbor node which, as before, is a random quantity distributed according to q k ∝ kp k . Let G(s) be the Laplace transform for this distribution: Then from Eq. (21) the Laplace transform for k j /k i −1 is e s G(s/k i ) and, since each of the k i nonzero terms in the sum of Eq. (23) is independent, the Laplace transform for the full sum is [e s G(s/k i )] ki by Eq. (22). This is for a node of degree k i . Since the Laplace transform is linear, we can now simply average over degree to compute the transform for the full distribution of ∆: Given any degree distribution p k we can use this equation to compute F ∆ (s). Inverting the Laplace transform then gives us the density function ρ(x) of ∆ itself: where the sum over ∆ is a sum over all rational numbers-all possible values of ∆. Since P (∆) is a rather complicated object, it is in practice simpler to integrate ρ(x) to compute the probability that ∆ falls between any two values-in other words a histogram of ∆. In fact, we can do something more sophisticated: we can use any kernel we like to calculate a kernel density estimate of the distribution of ∆. For a general kernel function κ(x) with Laplace transform F κ (s) we have A conventional histogram is equivalent to using a rectangular ("top hat") kernel, but for our figures we use a smoother double-exponential kernel (also known as a Laplace distribution): where the parameter b sets the width of the distribution.
(We arbitrarily pick b = 1/3.) The Laplace transform for this choice of κ(x) is (29) Figure 2 shows the distribution of ∆ computed in this way for configuration models with the truncated powerlaw degree distribution (30) and three different choices of the parameters α and β. Each example has the same mean degree but, as the figure shows, the distributions of ∆ are quite different.

C. Random graphs with degree correlations
The configuration model of the previous section improves on the Poisson random graph by allowing arbitrary distributions of node degrees, but like the random graph it lacks any correlation or assortativity between the degrees of adjacent nodes. Such correlation is common in real-world networks [14][15][16] and will clearly impact friendship paradox phenomena.
One can create a model network with degree correlations by fixing not only the degree distribution p k as in the configuration model, but the joint distribution of adjacent degrees Q jk , which is the fraction of edges that join nodes of degrees j and k [14]. Note that the distribution of degrees at the end of an edge is then given by q k = j Q jk , so that fixing Q jk also fixes the degree distribution.
The calculation of the distribution of ∆ proceeds in a similar manner to that for the configuration model. If we follow an edge that begins at a node of degree k, it will end up at a node of degree j with probability Q jk /q k , meaning that the Laplace transform for the degrees of neighbors is Note that this function depends on the degree k of the node at which we started. Nevertheless, the calculation proceeds essentially as before. The equivalent of Eq. (25) is and the distribution of ∆ can be calculated from F ∆ (s) using Eq. (27). Degree correlations are commonly quantified using an assortativity coefficient r, defined as the Pearson correlation coefficient of degrees across edges [14]. In terms of the quantities defined here, where σ 2 q is the variance of the distribution q k . To study how the effects of the friendship paradox vary with varying r it is convenient to define a model network that allows us to adjust r, in effect a correlated version of the configuration model, parameterized by its degree distribution and a single extra parameter controlling the assortativity. This means choosing a suitable value of Q jk , which we do by maximizing the entropy subject to the constraints j Q jk = q k = kp k / j jp j for all k and jk jk(Q jk − q j q k )/σ 2 q = r. The maximum entropy distribution is often considered to be the least biased choice for a given set of constraints, meaning that it makes no assumptions other than those implied by the constraints themselves.
The maximum entropy solution for Q jk in this case is where γ is a Lagrange multiplier whose value controls the assortativity and the Z k are normalizing constants that satisfy the equations which we solve numerically by iteration. This model defines an ensemble of random networks with a desired level of assortativity and allows us to study the generic effects of assortativity on network properties, including the friendship paradox. Figure 3, for example, shows the probability distribution of ∆ for fixed degree distribution and three different choices of the assortativity coefficient r. Note how the average value of ∆ decreases as the networks become more assortative, but so too does the variance, leading to a more complex picture. This is a good example of why we should be wary of conclusions based on the average value of ∆ alone. Figure 4 sheds more light on this point, showing the average value along with the expected fraction P (∆ > 0) of nodes with positive ∆, both as a function of r. Both of these quantities can be viewed as measures of the strength of the friendship paradox, but they behave in different ways. While E[∆] does indeed decrease monotonically with assortativity, the fraction of nodes with positive ∆-in effect, the fraction of nodes that display the classic friendship paradox behavior-peaks at small negative r, and has lower values for both large positive and large negative r.

D. Comparison with real-world networks
How good a guide are these model calculations to the behavior of real-world networks? To shed light on this question we compare the mean and variance of ∆ in 32 real-world social networks with values calculated from the assortative network model of the previous section with the same r. (The mean and variance in the model can be computed from the first and second derivatives of Eq. (32).) The results are shown in Fig. 5 and, as we can see, there is remarkably good agreement between theoretical and empirical results-we find R 2 values of 0.93 and 0.99 between theory and experiment for the mean and standard deviation of ∆ respectively. By compar- ison, the standard configuration model, which fixes the degree distribution only, gives R 2 values of 0.77 and 0.95. Thus it would be fair to say that the distribution of ∆ is fairly accurately captured by the degree distribution alone, but that the inclusion of assortativity results in a significant improvement.

E. Generalized friendship paradox
Before finishing let us return to the generalized friendship paradox. Recall that for any quantity x defined on the nodes of a network the quantity ∆ (x) i , Eq. (4), measures the difference between the average value of x at node i's neighbors and i's own value. We can compute the distribution of ∆ (x) , for instance for the degree-correlated model of Section III C, using the Laplace transform formalism again. The argument differs from previous developments in some details but remains conceptually similar. One first writes the Laplace transform for the distribution of a node's value of x given its degree k where P (x|k) is the probability that a node of degree k has value x. Since the neighbors of a degree-k node have degree distributed as Q jk /q k , they have a distribution of x values with Laplace transform Applying the key properties in Eqs. (21) and (22), we then arrive at and inverting F ∆ (x) gives the distribution for ∆ (x) .
As an example, we have tested Eq. (39) for normally distributed x linearly correlated with degree and find behavior closely similar to that of Fig. 3. Behavior like this could have an impact in any situation where the generalized friendship paradox has practical consequences. For example, it has been found that people's individual well-being can be substantially affected by the behavior of their network neighbors. People whose acquaintances smoke are more likely to smoke themselves [33]. People whose friends appear to be better off than they are may develop a lower sense of self-worth [34,35]. If people with more acquaintances tend to smoke more, or if well-off people with exciting lives have a lot of friends or followers on social media, then we may have a generalized friendship paradox in which you are most likely to have contact with precisely those people who would adversely affect you.
As another example, it has been shown that polling forecasts of election outcomes can be significantly improved by focusing not on how study participants say they will vote but on how they expect their acquaintances to vote [36], in part because this reduces variance in the estimates of outcomes. If voting intention is subject to the generalized friendship paradox however-if for instance partisan inclination is correlated with degreethen the tendency for one's friends to have high degree will cause the resulting sample of the population to be biased and introduce systematic errors [37]. The formalism developed here allows us to quantify these effects not only in terms of the average individual but in terms of the complete distribution of outcomes over the entire population.

IV. CONCLUSIONS
In this paper we have quantified the friendship and generalized friendship paradoxes in terms of the difference ∆ between the characteristics of a node and the average of the same characteristics for the node's neighbors. Previous studies have examined the mean of this difference but, as we have argued here, to get a full picture one must examine the complete distribution of values. We have performed theoretical calculations of this distribution for three classes of model networks, the Poisson random graph, the configuration model, and a model of a random degree-assortative network. Among other things, our results indicate that the friendship paradox will tend to be strongest in networks with very heterogeneous degree distributions and negative assortativity. Conversely, the effects will tend to be muted when degrees are fairly homogeneous and the network is degree assortative. On the other hand, we have also seen that even in simple network models the distribution for ∆ can be widely dispersed, meaning that the average value offers an incomplete description of the behavior.
We have also compared our results with a selection of real-world networks, finding remarkably good agreement between theoretical predictions and empirical measurements, particularly in the case of the model that incorporates assortativity.