Non-linear consensus dynamics on temporal hypergraphs with random noisy higher-order interactions

Complex networks encoding the topological architecture of real-world complex systems have recently been undergoing a fundamental transition beyond pairwise interactions described by dyadic connections among nodes. Higher-order structures such as hypergraphs and simplicial complexes have been utilized to model group interactions for varied networked systems from brain, society, to biological and physical systems. In this article, we investigate the consensus dynamics over temporal hypergraphs featuring non-linear modulating functions, time-dependent topology and random perturbations. Based upon analytical tools in matrix, hypergraph, stochastic process and real analysis, we establish the sufficient conditions for all nodes in the network to reach consensus in the sense of almost sure convergence and L 2 convergence. The rate of consensus and the moments of the equilibrium have been determined. Our results offer a theoretical foundation for the recent series of numerical studies and physical observations in the multi-body non-linear dynamical systems.


Introduction
Network theory has gained considerable prominence over the past decades in the physics community. This is partly due to its capability to represent and solve real-world problems, offering substantial insights into topological and dynamical aspects of networked complex systems [1]. By investigating the systemlevel characterization of network interactions between nodes and edges, universal structural features such as heavy-tailed distributions [2] and small-world effect [3] and emergent collective behaviours including synchronization and consensus in myriad complex biological, physical and social systems have been revealed [1,4].
The network architecture underlying a complex system is naturally represented by graphs with interactions between nodes limited to dyadic connections. This traditional representation falls short to capture group interactions involving three or more nodes simultaneously, which very recently triggers a significant transition towards the high-order networks [5][6][7][8]. In fact, higher-order interactions beyond the pairwise scheme are ubiquitous in many real-world complex systems ranging from neuronal activity patterns in brains [9,10] to cooperation evolution in social networks [11], from multi-protein complexes prediction [12] to species coexistence in ecological systems [13,14]. Taking this into account, group interactions lend essential descriptive power to the network theory in that it not only extracts the higher-order organization of complex networks [15] but also helps to clarify intrinsic non-linearity in system dynamics. For example, some information diffusion mechanisms such as opinion dynamics models [16] and threshold models in social contagions [17] rely on the group decision among neighbours of an

Model
Let N be the set of non-negative integers. At any time step t ∈ N, we consider n dynamical agents encoded in the temporal hypergraph structure H (t) = (V, E(t), A(t)), where V = {1, 2, . . . , n} represents the node set, and E(t) is the set of hyperedges containing some triples of the form {i, j, k} with i, j, k ∈ V. The hypergraph H (t) is called a 3-uniform hypergraph in hypergraph theory [34] since each hyperedge is a subset of V with cardinality 3. Given t ∈ N, the topology of H (t) is captured by the adjacency tensor A(t) = (a ijk (t)) ∈ R n×n×n with a ijk (t) = 1 if {i, j, k} ∈ E(t) and a ijk (t) = 0 otherwise. By convention, a ijk (t) = 0 if the edge becomes degenerate, that is, i = j, i = k or j = k.
Two nodes i and j in H (t) are adjacent if they appear in the same hyperedge of E(t). i and j are said to be connected by a path if there exists a sequence of nodes starting from i ending at j, and every two contiguous nodes in the sequence are adjacent. The hypergraph H (t) is called connected if any two nodes in V are connected by a path [34]. In our temporal hypergraph framework, we only assume that some piecewise unions of hypergraphs over the time axis are connected. Formally, we call H (t) jointly uniformly connected with bound > 0 if there is an infinite sequence of time steps 0 = t 0 < t 1 < t 2 < · · · < t l < · · · satisfying t l − t l−1 ≤ and ∪ t l+1 −1 s=t l H (s) being connected for any l ∈ N. The set of equations governing the dynamics of n nodes is where γ > 0 is a constant gain, x i (t) ∈ R is the state of node i and u i (t) ∈ R is the interaction function delineating the influence of other nodes over node i at time t. We consider the following interaction function where the two modulating functions f and g : R → R satisfy the following Assumption 1 and the random noise w ijk (t) satisfies Assumption 2 below.
Assumption 2 The random variables {w ij (t)} i,j∈V,t∈N are independent and identically distributed for different i, j and t. Moreover, E(w ij (t)) = 0 and Var(w ij (t)) = 1, where E and Var mean expectation and variance, respectively. We assume that w ijk (t) is an independent copy of w ij (t) (or equivalently, w ik (t)). The perturbation w ijk (t) represents a random noise associated with the hyperedge {i, j, k} ∈ E(t). A schematic of the interaction and evolution of H (t) is shown in Fig. 1. The modulating function f is bounded and captures the influence of state differences between nodes j and k on node i. To model the group reinforcement effect on consensus dynamics, f is naturally chosen as a monotonic function such as exponential functions and the Heaviside function in [18,19,31]. If f is decreasing, it is in line with the bounded confidence opinion model [35] and the flocking models [36] in biology, where closer states exert a stronger cohesive force among each other. Our assumption on the modulating function g is mild. A typical choice is a linear g(x) = βx for some constant β > 0. The noises are assumed to be standardized random variables for the ease of presentation.
It is also worth noting that in the above interaction function (2.2), for simplicity, we deliberately do not include a modulating function h(|x j (t) − x k (t)|) in the deterministic coupling part like n j,k=1 and our main focus here is on dealing with random noisy higher-order interactions. However, it can be seen below, if such a function h is bounded and positive, the results of this article still hold; c.f. Corollaries 2 and 3 in Section 4. The decreasing exponential function considered in [18], for instance, is a special candidate for h. Since the random noises are factored in the dynamics, we are interested in the consensus behaviour of the system (2.1) with interaction function (2.2) in the following two senses.
• Almost sure consensus: There exists some random variablex such that the probability P(lim t→∞ x i (t) =x) = 1 holds true for any i ∈ V and initial configuration {x i (0)} i∈V .
It is well known that almost sure convergence does not imply L 2 convergence and vice versa; see for example [37].

Results
Here, we present the main theoretical results of this article with detailed proofs deferred to Section 4. For i ∈ V, it follows from (2.1), (2.2) and Assumption 2 that The system (3.1) can be further rewritten as where diag(g 1 , g 2 , . . . , g n ) represents the diagonal matrix with diagonal components g 1 , g 2 , . . . , g n , and means the matrix transpose. Note that the dimension of the diagonal matrix depends on the diagonal components.
Let x(t) = (x 1 (t), . . . , x n (t)) ∈ R n be the state vector. Define the Laplacian matrix L(t) := (l ij (t)) ∈ R n×n with l ii (t) = j∈V b ij (t) for i ∈ V and l ij (t) = − b ij (t) for i = j. Furthermore, define the two system matrices B(t) := I n − 2γ L(t), C(t) := diag(c 1 (t), c 2 (t), . . . , c n (t)) ∈ R n×n 2 , where I n ∈ R n×n is the identity matrix and c i ( With these notations, the system of equations in (3.2) can be recast as for t ∈ N. Our first result concerns the system behaviour with degenerate noises.
Theorem 1 Consider the system (3.3) with degenerate noises w ij (t) ≡ 0 for all i, j ∈ V and t ∈ N. Assume that γ < 2 sup t∈N max i∈V j,k∈V a ijk (t) (i) There exists a probability vector sequence α(t) = (α 1 (t), α 2 (t), . . . , α n (t)) ∈ R n for t ∈ N such that i∈V α i (t) = 1, α i (t) ≥ η and Both η and η are absolute constants that do not rely on the gain γ .
Several quick remarks are in order. Firstly, it is easy to see that the upper bound for γ in Theorem 1 always holds if we choose γ < 1 (n−1)(n−2) . Secondly, an important implication of Theorem 1 is that the system (3.3) reaches consensus exponentially fast when noises are absent; see Corollary 1 in Section 4 below for a precise presentation. Moreover, {B(t)} t∈N is a sequence of symmetric stochastic matrices. For any s ∈ N, we define F(s, s) = I n ∈ R n×n and for l ∈ N. Clearly, {F(t, s)} t,s∈N is also a collection of stochastic matrices. Theorem 1(ii) will contribute to the convergence of the general case of system via the quadratic Lyapunov function (t) defined in (3.5).
Our main random consensus result is the following.
Theorem 2 Consider the system (3.3) under Assumptions 1 and 2. Suppose that γ < min 2 sup t∈N max i∈V j,k∈V a ijk (t) −1 , η · (4ξ) −1 , where η > 0 is given in Theorem 1 and ξ > 0 is a constant defined by (4.40). Assume that H (t) is jointly uniformly connected with bound and time sequence {t l } l∈N . Then, the following hold: (i) There exists a random variablex such that the almost sure consensus and L 2 consensus are achieved.
(ii) We have and  Theorem 2 shows the random consensus for the system (3.3) and identifies the final equilibriumx. The expectation of the equilibrium is determined by the initial configuration of the agents while the variance expressed in (3.8) is a convergent series, which will be clear in the proof below. When considering the generalized system with an additional modulating function h(·) as in (2.3), we can readily extend Theorem 2 to Corollary 3 (see Section 4 below), which is the most flexible result in this article. Under the jointly uniformly connectedness of the hypergraph, Corollary 3 guarantees stochastic consensus when the non-linear system is governed by three types of modulating functions over the hypergraph structure: a positive bounded function h concerning the 3-way coupling strength, a bounded function f concerning the gap among agents in an edge, and a function g with bounded derivative concerning the strength of random noises. Many of the previous frameworks [18,19,31] can be regarded as special cases.
Remark 1 Before proceeding to the technical proof of all these results, we point out that although we have stressed the random noise interpretation, our framework is actually capable of accommodating more general random phenomenon such as random hypergraphs [38,39]. For instance, we consider the interaction function of (2.1) as The function (3.9) can be recast as Clearly, E w ijk (t) = 0. Hence, this system fits into the current framework and we can apply Corollary 3 to conclude the random consensus result.

Proofs of main results and corollaries
In this section, we present the proofs for the main results of Theorems 1 and 2 in the above section together with some corollaries and remarks. The mathematical methods are based upon hypergraph theory, matrix analysis, real analysis, probability, martingale convergence and the Lyapunov stability techniques.
Proof of Theorem 1. (i) Using the assumption that the hypergraph H (t) is jointly uniformly connected, we have an infinite sequence of time steps Since w(t) = 0 n 2 ∈ R n 2 , by (3.3) we are dealing with the deterministic system in Theorem 1. By the choice of γ , {B(t)} t∈N is a sequence of stochastic matrices with all diagonal elements having a positive lower bound. This lower bound is an absolute constant irrelevant to γ . In view of the definition (3.6) and the comment above regarding connectivity, the same can be said for the matrix sequence {F(t l+1 , t l )} l∈N . By our choice of γ , using (3.6) and (4.1), we calculate where 1 n ∈ R n is the vector with all elements being 1 and the Landau notation In view of the connectivity of the union graphs ∪ Since {B(t)} t∈N is a sequence of symmetric stochastic matrices, the same can be said for {F(t l+1 , t l )} t∈N by (3.6). With the above discussions, according to Lemmas 8 and 9 in [41], there exists a probability vector sequence for i ∈ V and l ∈ N, where the constant η ∈ (0, 1) is irrelevant to γ . To conclude the proof of (i), we need to extend the definition of the probability vector α(·) to all-time steps. By (4.3), we have is still a probability vector. As the diagonal elements of any matrix in {B(t)} t=t l+1 −1 t=t l have a positive lower bound irrelevant to γ and all these matrices are symmetric and stochastic, the property Taking l = 1 in (4.4), we can analogously define each probability vector α(t) for t 1 + 1 ≤ t ≤ t 2 − 1 one by one as above. Repeating this procedure by taking l = 2, 3, 4, . . ., the probability vector α(t) can be defined for all time step t ∈ N and the properties specified in (i) hold. This concludes the proof of (i).

9
(ii) To show the inequality for the quadratic Lyapunov function given in (3.5), we first note that where we have used the fact that α(t l ) is a probability vector. On the other hand, by the results of (i) and (3.5), we derive where the third inequality is an application of the inequality of quadratic and arithmetic means. Combining (4.5) and (4.6) we know that where the Landau notation for t ∈ N. Note thatD(t) is a non-negative matrix since B(t) is a stochastic matrix. Based on (3.5), (4.3) and the previous analysis, the role of B(t),ᾱ(t),¯ (t) and t ∈ N can be replaced by F(t l+1 , t l ), α(t l ), (t l ) and l ∈ N, respectively, under the current setting. Therefore, we obtain with where (4.3) is used in the first equality and (4.2) is used in the second equality. The matrix D(t l ) in (4.10) can be further written as Apparently, we can always choose two indices a, b ∈ N with a < b satisfying the above arrangement. [In fact, if a = b, we obtain (t l+1 ) = (t l ) by (4.9). Hence, the result in (ii) holds trivially.] It follows from (4.11) that d ir,i r+1 (t l ) = d i r+1 ,ir (t l ) = (γ ) for a ≤ r < b. By the relationship between D(t l ) and D (t l ), we further know that 0 where the second inequality above is due to an application of the inequality of quadratic and arithmetic means. Combining (4.7) and (4.12), we have for l ∈ N, for some constant η such that 0 < η < γ −1 and η is irrelevant to γ . Using (4.9) and (4.13), we finally derive which readily completes the Proof of Theorem 1.
An immediate corollary of Theorem 1 is that all node states {x i (t)} i∈V of the system (3.3) achieve consensus at an exponential rate (in a deterministic way) when the noises are absent.
Corollary 1 Under the conditions of Theorem 1, we have for t ∈ N, where all notations are defined as in Theorem 1.
Proof. By (4.14), we have (t l ) ≤ (1 − η γ ) l (0) for l ∈ N since t 0 = 0. For any t ∈ N, suppose that t ∈ [t l , t l+1 ) for some l ∈ N without loss of generality. We obtain where the first inequality comes from the monotonicity in (4.8) and the last one is based on the joint uniform connectivity of H (t).
Since 2 , multiplying α i (t)α j (t) on both sides and taking sums over all i, j ∈ V give rise to where we used Theorem 1(i) and the definition (3.5). The result (4.15) follows from (4.16) and (4.17).
If we consider the generalized system (2.3) with the additional modulating function h(·), we can obtain analogous results.
Proof. (i) Note that {B (t)} t∈N is a sequence of stochastic matrices although they are no longer symmetric. In analogy to (3.6), for any s ∈ N, we define F (s, s) = I n ∈ R n×n and for l ∈ N. Therefore, {F (t, s)} t,s∈N also forms a family of stochastic matrices. The same line of proof in Theorem 1(i) can be applied here as we did not actually rely on symmetry of these matrices.
(ii) Here, it follows again from the same line of Proof in Theorem 1(ii). We replace D(t) with D (t) := B (t) diag(α(t + 1))B (t) for t ∈ N. The same proof can be applied.
Clearly, Corollary 1 also holds verbatim for the system (2.3). As a byproduct, our results present a solid theoretical support for the consensus phenomena observed in the recent works [18,19,31]. We will see later in Corollary 3 that even with random perturbation, the consensus results still hold. The proof of our main contribution to Theorem 2 is given below.
Proof of Theorem 2. (i) To show the almost sure and L 2 convergence, we will again examine the asymptotic behaviour of the quadratic Lyapunov function (t). However, different from Theorem 1, we need to cope with the random noises.
Since F(t l+1 , t l ) is a stochastic matrix, by using (4.3) we obtain where 0 n ∈ R n is a zero vector. Recalling the definition of D(t l ) ∈ R n×n in (4.10), we derive 1 (t l ) = D(t l ) − diag(α(t l )) with the sum of each row being 0 as implied by (4.25). Hence, since D(t l ) is a nonnegative matrix. Combining (4.14), (4.24) and (4.26) we have Let F t be the σ -algebra generated by the list · F(t l+1 , s + 1)C(s)G(s)w(s) . (4.28) With the above comments, taking expectation over (4.27) yields and (t l , s) are symmetric matrices for l ∈ N. In the following, we aim to bind the second term on the right-hand side of (4.29) in terms of the expected Lyapunov E (t l ), similar to the first term on the right-hand side of (4.29).
To this end, let ξ 1 be the maximal spectral norm of the matrices (t l , s) across l ∈ N and s ∈ {t l , t l + 1, · · · , t l+1 − 1}. Namely, (4.30) where σ (·) means the largest singular value of a matrix. Using the condition t l+1 − t l ≤ , Assumption 1, as well as the construction and symmetry of (t l , s), we know that ξ 1 < ∞ by applying the Gershgorin circle theorem. Clearly, the constant ξ 1 is irrelevant to γ . It then follows from Assumptions 1, 2 and the Rayleigh-Ritz theorem that Define the matrices P 1 = [1, 0 n−1 ] ∈ R 1×n , P 2 = [1 n−1 , −I n−1 ] ∈ R (n−1)×n and P = P 1 P 2 . For t ∈ N, we define the vector z(t) ∈ R n−1 through the following transformation [ (t), which further implies that . .
where L P 2 (t) ∈ R (n−1)×(n−1) is defined as above. Let B P 2 (t) := I n−1 − 2γ L P 2 (t) ∈ R (n−1)×(n−1) . Define two constants Similarly as in (4.30), we have ξ 2 < ∞ and ξ 3 < ∞ by using the construction and symmetry of the two involved matrices and the Gershgorin circle theorem. It is easy to see that ξ 2 = O(γ ) and ξ 3 is irrelevant to γ . In view of Assumptions 1 and 2, we obtain for t ∈ N, where we have applied (4.33), (4.34) and the Rayleigh-Ritz theorem. Noting that we further write (4.35) as By a similar argument in (4.6), we derive for any i, j ∈ V and t ∈ N, With (4.31), (4.36), (4.37) and (4.38) at hand, the second term on the right-hand side of (4.29) can finally be estimated as :=ξ E (t l ), (4.39) where we used the condition t l+1 − t l ≤ for l ∈ N and the constant (1) is irrelevant to γ . Feeding (4.39) into (4.29) yields the following estimate for l ∈ N. By Theorem 1(ii), 1 − η γ > 0 and hence the coefficient in (4.41) is 1 − η γ + 4ξγ 2 > 0. Moreover, we have since 4ξγ < η by assumption. It is easy to see that the inequality (4.42) or equivalently 4ξγ < η is feasible when γ > 0 is sufficiently small. Since t 0 = 0, it follows from (4.41) that Namely, the expectation E (t l ) converges exponentially fast as l → ∞.
To show the consensus of system (3.3) in the sense of the definitions given at the end of Section 2, there are still several steps to go. For example, we need to kind of get rid of the subscript l and the expectation operator E in (4.43).
By repeatedly applying the update rule of (3.3) in a similar way as (4.22) , we obtain where y (t) := 2γ t−1 s=0 F(t, s + 1)C(s)G(s)w(s) ∈ R n . For any t ∈ N, letl := max{l ∈ N : t l < t}. Thanks to Assumption 2, we have where we have applied a similar argument as (4.39) for each term in the last step, where ξ > 0 is a constant. In view of (4.42), (4.43) and E (0) = (0), the upper bound in (4.45) is a sum of geometric sequence and hence we obtain (4.46) It can be seen from (3.5) and (4.8) that (t) is a (random) non-increasing function with respect to time and it is non-negative. Therefore, there exists a random variable v ∈ R such that lim t→∞ (t) = v almost surely and v ≥ 0. Clearly, almost surely for all i, j ∈ V. Taking expectation on both sides of (3.5) leads to For the first part (i) of Theorem 2, what remains to show is the existence of a limit random variable. The trick up our sleeves is the following martingale convergence theorem; c.f. [37,Theorem 4.4.6]: Suppose that {X(t)} t∈N is a martingale with sup t∈N E X(t) 2 < ∞. Then, there is a random variable X such that X(t) → X almost surely and in L 2 .
It follows from (3.3) and (4.44) that x(t) and y (t) are F t−1 -measurable. Accordingly, Let · 1 be the 1-norm of a vector. By (4.46) and the Cauchy-Schwartz inequality, we have where the second inequality is due to Hölder's inequality in probability. Therefore, {y (t)} t∈N is a martingale (relevant to {F t−1 } t≥1 ). It follows from the aforementioned martingale convergence theorem that there exists a random vectorŷ ∈ R n such that y (t) →ŷ almost surely and in L 2 as t → ∞. Capitalizing (4.44), (4.47), (4.48) and lim t→∞ F(t, s) = 1 nα (s), we obtain that there exists a random variablex such that x i (t) →x almost surely and in L 2 as t → ∞ for any i ∈ V. This finally concludes the proof of (i).
The random matrix sequence {M(s)} s∈N will play a central role in the derivation of properties of random variablex. Since Eŷ = 0 n using Assumption 2 and the fact y (t) →ŷ almost surely, we obtain by employing (4.44) that where Cov(·) represents the covariance matrix of a random vector. The largest singular value of E(M(s)) can be estimated as follows by Assumption 1 and (4.38). Applying (4.43) and the uniform bound for the hypergraph connectivity, we write (4.52) further as In view of (4.53) and the fact σ (F(t, s+1)) = σ (F(t, s+1) ) < ∞, we derive that the series lim t→∞ As the series is convergent, for any ε > 0, there is s 1 ∈ N such that for any t > s 1 we have Likewise, since σ (1 nα (s )) = σ α(s ) 1 n < ∞ for any s ∈ N, we can choose the above s 1 sufficiently large such that we also have for any s ∈ N. Recall lim t→∞ F(t, s) = 1 nα (s) for any s ∈ N and therefore, there must exist s 2 ≥ s 1 such that σ (F(t, s) − 1 nα (s)) = σ F(t, s) −α(s) 1 n ≤ ε for any t ≥ s 2 . We can further choose s 3 ≥ s 2 such that for any t ≥ s 3 , where we have employed properties of singular values and applied (4.53) in the last step. If we take for any t ≥ s 3 . Therefore, Invoking (4.44), we obtain where we have applied the definition of y (t) and Assumption 2. This yields (3.7). On the other hand, The four terms on the right-hand side of (4.61) can be further calculated as follows. The first term tends to 1 nα (0)x(0)x(0) α(0)1 n = (Ex) 2 1 n 1 n as t → ∞. The second and third terms are zero by Assumption 2. Hence, as t → ∞, which gives rise to (3.8). The Proof of Theorem 2 is complete.

Concluding remarks
Higher-order interactions and models have emerged as a new frontier in modelling complex networked systems, as they represent more realistic backbones encoded in group interactions in real-world systems [42]. This article attempts an initial step in establishing analytical theory of consensus dynamics over networks with higher-order interactions accommodating non-linear modulating functions, temporal networks and random perturbation. We show that almost sure consensus and L 2 consensus can be achieved when the network topology represented by a temporal hypergraph is jointly uniformly connected. To model group interactions, three types of modulating functions have been introduced, which governs the 3-way coupling strength, in-group discrepancy and the noise intensity. Random hypergraphs can also be viewed as a special case in our framework. Building upon the mathematical tools of real analysis, matrix analysis, hypergraph, martingale and Lyapunov stability theory, we are able to characterize the statistics of the random consensus process and its equilibrium.
In the noise-free case, we note that the convergence is shown to be exponentially fast. It would be interesting to further investigate the consensus rate under random noises. From a mathematical point of view, an analysis of tail probability estimates and large deviation bounds regarding the equilibrium state would also be desirable. Since we have only considered 3-uniform hypergraphs, understanding how the topology of more general uniform hypergraphs and non-uniform hypergraphs affects the consensus dynamics is also a worthwhile future research direction.