Nonassortative relationships between groups of nodes are typical in complex networks

Abstract Decomposing a graph into groups of nodes that share similar connectivity properties is essential to understand the organization and function of complex networks. Previous works have focused on groups with specific relationships between group members, such as assortative communities or core–periphery structures, developing computational methods to find these mesoscale structures within a network. Here, we go beyond these two traditional cases and introduce a methodology that is able to identify and systematically classify all possible community types in directed multi graphs, based on the pairwise relationship between groups. We apply our approach to 53 different networks and find that assortative communities are the most common structures, but that previously unexplored types appear in almost every network. A particularly prevalent new type of relationship, which we call a source–basin structure, has information flowing from a sparsely connected group of nodes (source) to a densely connected group (basin). We look in detail at two online social networks—a new network of Twitter users and a well-studied network of political blogs—and find that source–basin structures play an important role in both of them. This confirms not only the widespread appearance of nonassortative structures but also the potential of hitherto unidentified relationships to explain the organization of complex networks.


S1 Data
We consider all networks from the Netzschleuder repository Ref. [30] in the main manuscript (accessed Aug. 2022) which belong to one of 5 common domains (Online Social, Economic, Technological, Biological, Informational) and satisfy the following constraints: 1) number of nodes in the largest connected component is between 50 and 20, 000; 2) not bipartite; and 3) not multilayer and no multiple types of edges.For the list of all networks see Table.S1.
In order to allow the systematic application of the different communitydetection methods, we removed all self-loops and we consider only the largest connected component of the network.We found 52 networks with 26 undirected graphs and 26 directed graphs.
The data for our first case study in Sec. 4 is from Ref.
[27]1 and corresponds to polblogs in the Netzschleuder repository.The data for the second case study in Sec. 4 was obtained from the online platform Twitter by (i) collecting all tweets containing the word 'climate' between November 2019 to June 2020, and coming from accounts with Australia specific locations in the user location metadata; (ii) identifying the top 100 most retweeted tweets each day during this period; (iii) capturing the user timelines for the authors of these daily top 100 most retweeted tweets; and (iv) keeping only retweets in these top-user timelines that retweet other top users.The resulting retweet network consists of 4,029 nodes representing the top users and 429,235 directed edges representing retweets of one top user by another top user (multi-edges are kept).The classification of these users was based on the manual classification of individual tweets performed in Ref. [28], where the classification of the tweet is attributed to the user.This data is available in our repository Ref. [33].

S2 Community detection methods
The five community-detection methods we apply to each network g were chosen to represent different approaches used in this area: (i) Modularity-based methods search for partitions of g which maximize a quality function (the modularity) which favours groups of densely connected nodes.Here we use Louvain with the implementation available in Ref. [9,35].
(ii) Spectral approaches are based on spectra of matrices that represent the network and can often be seen as a minimization on the number of edges between groups.Here we use the version of spectral clustering based on the Bethe-Hessian matrix for heterogeneous graphs available at Ref. [31,38].
(iii) A probabilistic approach based on statistical inference can be obtained using Stochastic Block Models (SBM) as the random-graph generative process Ref. [24,25,26,34].Here we use the degree corrected SBM available in Ref. [36].
(iv) Dynamical processes that take place on networks provide an alternative approach to partition nodes into groups.We use the most popular approach, Infomap Ref. [11]., with the implementation available in Ref. [37].
(v) Deep-learning methods are increasingly being applied to the communitydetection problem, in combination with a variety of approaches Ref.
[5] Here we use the popular DNGR method for learning graph representations, as proposed in Ref.
[32] and implemented in Ref. [39].This method can be used to partition nodes into groups, as considered here, using an additional clustering method.We use k-means as in Ref.
[32], with a specified number of communities.In the main text we show results for B = 10, similar results are obtained for different B's (see SM-Sec.S5 below).
The implementation of these five methods is available also in our repository Ref. [33].

S3 Community Structure Statistics
Consider a B × B edge density matrix ω obtained applying a communitydetection method to a network g.The M B×B interaction classification matrix of ω contains the community structure type τ of all pairs of communities r, s ∈ [0, B], with τ ∈ {"Assortative", "Core-Periphery", "Disassortative", "Source-Basin"}, obtained using the classification introduced in Fig. 2 and Eqs. ( 3)-( 4).The fraction of τ of a network g is then given by counting how often τ appears in M as where δ x,y = 1 if x = y and δ x,y = 0 if x ̸ = 0. Knowing P τ for all structure types τ in a network g, we define the dominant type τ d as the τ with largest P τ and occurring types τ o as the set of τ 's with P τ > 0. We then compute the dominance of each type τ over a set of networks g as the fraction of networks that has dominant type τ d = τ as where |g| is the number of networks in g.Similarly, the occurrence of type τ is computed as the fraction of networks in which τ occurs (τ ∈ τ o ) as In the results reported in this paper we applied the analysis above only to communities with more than 5 nodes, N r > 5, because small communities play a minor role in explaining the properties of the network and the classification of their structure types τ is often not robust (when applying the tests reported in Sec.S4 below).

S4 Robustness Estimation
We test the statistical robustness of our classification of communities in types by (1) comparing to a null model and (2) considering small perturbations of the underlying network.

S4.1 Null model
We consider a null model of random density matrix ω null to assess the significance of the results we obtained applying our pairwise classification of community types to ω.Since our pairwise classification depends only on the ranking order of the entries in ω, we build ω null as a random B × B matrix with entries from 1 to B 2 .We then apply our classification -defined in Fig. 2 of the manuscript -to obtain the interaction classification matrix M null , the structure type fraction P null  6) and ( 7) of the manuscript-for the null model.The result obtained from this analysis represent our random expectation for an arbitrary partition of the network g in B communities.We further propose a statistic over many networks g by generating null models for each g with the same B obtained in the original analysis.This method is summarized in Algorithm 1.The dominance and occurrence obtained using this null model are reported in Fig. 4  and Occurrence(τ ) null from Eq. ( S2) and (S3)

S4.2 Robustness of categorical classification
Our community structure classification is computed for a given partition of the network.In practice, small variations of the network or algorithms (e.g., the initialization of optimization steps) may result in (slightly) different partitions.
To have a better understanding of the robustness of our classification and results to such variations, we introduce a bootstrapping method for estimating the uncertainty of the results obtained from our classification.Given a certain graph g (with N nodes and E edges) and partition p with its corresponding pairwise community classification matrix M , we first rewire the graph by randomly choosing E edges (with replacement) from the edge list of g.We denote this generated graph g ′ .Then we compute the density matrix ω ′ (based on g ′ and p) and we obtain a new interaction classification matrix M ′ .We repeat this procedure k times and compute the certainty for each pair of communities r and s, P rs , as the fraction of all k interaction classifications that are equal to the original type We applied this analysis in several cases and confirmed that our main conclusions remain unchanged.In particular, for the two case studies reported in Sec. 4, we find that all classifications for B = 5 lead to P rs > 0.99 except for one core-periphery classification in the blogs network which had P r,s = 0.86 (see Fig. S2).

S5 Survey
Apart from dominance and occurrence shown in the main manuscript, we also consider an intermediate quantity -denoted proportional occurrence -to mea- As noted in the main text, the edge direction of the political blogs network is reversed when compared to the original dataset in order to fit the convention used in our text.The labels in each entry of the matrix correspond to the structure type τ and robustness score obtained from Eqs.(8) (see Sec. S4, Algorithm 2), where "A" stands for "Assortative", "CP" stands for "Core-Periphery", "D" stands for "Disassortative" and "SB" stands for "Source-Basin".Left: Community 0 and 3 are in a Source-Basin relationship.Communities 4 and 2 are in a Core-Periphery relationship and communities 2 and 1 are in a Core-Periphery relationship (i.e., community 2 is a periphery in respect to 4 and core in respect to 1).Right: Community 0,1,2 are climate change supporter groups.Community 0 and 2 are in a Source-Basin relationship.

τ
from Eqs. (5), and the dominant type τ null d and occurring types τ null o -from Eqs. ( of the paper.Algorithm 1 Density Null Model Require: number of communities B of network g 1: Randomly assign ranking from 1 to B 2 to ω null B×B 2: Compute M null from the classification in Fig. 2 3: Calculate P null τ for community type τ from Eq. (S1) 4: Get dominant type τ null d and occurring types τ null o 5: Repeat the steps above for many networks g to compute Dominance(τ ) null

Algorithm 2
Fig. S1: Fraction of non-Assortative relationships (y-axis) as a function of the number of communities(x-axis) in empirical graphs.Each symbol represents a partition result of a network obtained using a specific community-detection method (see legend).

Fig. S2 :
Fig. S2: Density matrices for the partition obtained by SBM with B = 5 for political blogs (left) and B = 4 for bushfire retweets(right).Colours indicate the community densities ω r,s .The blocks b = 0, 1, . . ., 4 are ranked according to community size N r = {354, 291, 291, 210, 76}(left) and N r = {1473, 1031, 956, 563}(right) from community 0 to community 4(left)/3(right).As noted in the main text, the edge direction of the political blogs network is reversed when compared to the original dataset in order to fit the convention used in our text.The labels in each entry of the matrix correspond to the structure type τ and robustness score obtained from Eqs.(8) (see Sec. S4, Algorithm 2), where "A" stands for "Assortative", "CP" stands for "Core-Periphery", "D" stands for "Disassortative" and "SB" stands for "Source-Basin".Left: Community 0 and 3 are in a Source-Basin relationship.Communities 4 and 2 are in a Core-Periphery relationship and communities 2 and 1 are in a Core-Periphery relationship (i.e., community 2 is a periphery in respect to 4 and core in respect to 1).Right: Community 0,1,2 are climate change supporter groups.Community 0 and 2 are in a Source-Basin relationship.

Fig. S4 :
Fig. S4: Community structure type in empirical undirected graphs (left) and directed graphs (right).Top: fraction of networks (y-axis) in which community type (x-axis) is dominant.Middle: weighted fraction of networks in which community type appears.Bottom: fraction of networks in which community type appears.

Fig. S5 :
Fig. S5: Community structure type obtained applying the DNGR method to the empirical graphs in our survey.Results for three pre-defined number of communities are shown B = 5, 10, 20.Overall, the results with different B suggest an increasing occurrence of non-assortative structures and an increasing dominance (and proportional occurrence) of Assortative structure as B grows.
Columns include network type(Type), network name(Name),node number(N), edge number(E) and the consistency between information flow and edge direction(Information/Edge Direction).