Information synergy maximizes the growth rate of heterogeneous groups

Abstract Collective action and group formation are fundamental behaviors among both organisms cooperating to maximize their fitness and people forming socioeconomic organizations. Researchers have extensively explored social interaction structures via game theory and homophilic linkages, such as kin selection and scalar stress, to understand emergent cooperation in complex systems. However, we still lack a general theory capable of predicting how agents benefit from heterogeneous preferences, joint information, or skill complementarities in statistical environments. Here, we derive general statistical dynamics for the origin of cooperation based on the management of resources and pooled information. Specifically, we show how groups that optimally combine complementary agent knowledge about resources in statistical environments maximize their growth rate. We show that these advantages are quantified by the information synergy embedded in the conditional probability of environmental states given agents’ signals, such that groups with a greater diversity of signals maximize their collective information. It follows that, when constraints are placed on group formation, agents must intelligently select with whom they cooperate to maximize the synergy available to their own signal. Our results show how the general properties of information underlie the optimal collective formation and dynamics of groups of heterogeneous agents across social and biological phenomena.


I. INTRODUCTION
Collective behavior is a general feature of biological and social systems.It mediates the survival and evolution of populations under resource constraints, competition, or predation in natural systems [1] and the formation and persistence of social organizations in human societies [2].Much past work has modeled collective dynamics using homogeneous interaction rules, common to all agents, that are often also phenomenological.While these models have produced diverse insights, they typically lack a theoretical foundation to explain how specific social behavior emerges among individual agents with heterogeneous information and behavior.Thus, there remain significant knowledge gaps in most realistic situations, where agents with distinct but potentially complementary traits act collectively to maximize their joint growth (fitness, wealth) in knowable, noisy environments.
Some examples help illustrate the present situation.Game theorists and ecologists have considered many different cooperative interaction schemes [3] and explored evolutionary stable behavior [4], particularly on networks [5][6][7], where optimal behavior is identifiable under given interaction rules.Elaborating these schemes by introducing higher order interactions has broadened our understanding of more complex social networks [8][9][10], and their dynamical phase-stability under varying interaction strengths [11].Researchers have also studied, both theoretically and in the laboratory, how memory of previous interactions influences agents' preferences for future encounters [12][13][14][15], the spread of social crises across distance [16], and the formation and scaling properties of social collectives [17,18], such as cities [19,20].
In addition to interaction rules and associated pay-offs, collective dynamics is predicated on maximum principles, which specify agents' preferences in view of a goal and thus render their behavior intelligent (optimal).For example, inclusive fitness theory, which assumes a reproductive benefit to cooperation because of shared genes [21,22] has been studied in mixing populations and over networks [23] where it predicts population benefits to cooperation through several forms of reciprocity [24].
More recently, researchers have studied resource pooling in models of growth as a means to minimize environmental uncertainty and associated loss of fitness among agents experiencing independent fluctuations with shared statistics [25,26].Such approaches remain limited by the association between collective behavior and (genetic) homophily but they can help explain the existence of phase transitions in cooperation networks [11,18], and specify agents' plausible behavioral patterns [14], even if doubts remain about inclusive fitness's predictive power [27].Generally, however, most current quantitative frameworks fail to address collective dynamics when agents remain heterogeneous across skills, knowledge, and behavior [28,29].Developing more general approaches to collective behavior that include adaptation and learning along with heterogeneity, is a crucial step towards understanding how agents self-organize in more complex and dynamical environments, where specialization and the division of labor and knowledge become key.
Adaptive behavior requires agents to acquire and process information over time [30,31] in response to their environments and to each other.In any realistic situation, limited experience, specialization costs and physical limitations of effort, energy and time, all prevent agents from perfecting their knowledge of complex environments [32,33].A natural way to mitigate these individual limitations is to pool knowledge across agents leading to the formation of social organizations [34], and the division and coordination of labor in terms of their behavior [35].This is widely observed in human organizations, but also in animal social behavior starting with the division of labor by age and sex.
By working jointly to predict characteristics of their environment [29] and gather resources, groups of agents can maximize their collective fitness even when each individual has very limited knowledge.In a setting where there are resource returns to successful prediction and behavior, information of the state of a statistical environment determines the fitness of the population [36,37], though there are questions about how such benefits emerge quantitatively [38].Here we formalize the calculation of these social benefits in terms of the properties of information and show how maximizing knowledge complementarities (synergy) maximizes the long-term growth rate of collectives.Specifically, we derive an expression for the additional payoff to cooperative behavior in terms of the joint information synergy about the agents' dynamical environment.
These results lead us to introduce the principle of maximum synergy, which maps the maximization of collective resource growth rates into optimal social interaction structures.This work adds new dimensions to the study of collective dynamics by connecting the structure of groups to that of information in complex environments mediated by agents' diverse subjective characteristics, such as their present knowledge and their life histories.

II. THEORY OF COLLECTIVE GROWTH
We start by demonstrating how the benefits of collective action emerge from pooling information in synergistic situations.Synergy means the combination of behavior, knowledge, and skills that complement each other towards a goal.This concept is necessary for creating effective organizations that embody complex information [29], but it is often not sufficiently formalized in common language, such as in discussions of innovation [39] or firm structure.
Here, we will refer to synergy as an explicit information theoretic quantity that measures the additional predictive power that a group has upon pooling its agent's information together, relative to the knowledge of each individual separately.This quantity has been introduced sometime ago in the context of studying circuits in information processing systems [40,41], and has provided a framework for studying higher-order neuron interactions in the brain [42], and causality and information in complex systems [43,44].As we will show, synergy results formally from the conditional dependence between the probability of predictive signals distributed in a population and events in a shared environment.The gain in predictive power from agents pooling information as collectives allows them to obtain additional resources from a knowable environment beyond what agents alone can do, thus boosting their relative fitness or productivity.
It follows that collectives that seek to maximize their resources over long times must combine the information from their agents' individual models of the world in a way that accesses the most synergy.Groups that do not know a priori how to realize their synergies must adjust their collective knowledge and interaction structure by observing outcomes of their environment in an iterative learning process.After developing the general framework for group formation and collective growth across group sizes, we demonstrate a model environment that exhibits synergy using logic gates that take signals as inputs, and output probabilistic events.We will demonstrate how synergy scales with the number of unique signals in a collective, and how specific combinations of signals affect the average growth of resources for the group.

A. Collective Growth in Synergistic Environments
We consider a population of N agents, each with initial resources r i that can be (re)invested into the set of outcomes of their environment to generate returns.Each agents has access to a private signal, s j ∈ S j , which is used to predict the state of the environment and makes resource allocations to possible outcomes e ∈ E. This signal may represent a number of different processes such as sensory input, or a lead retrieved from memory.With accurate parameterization of a model of the environment, P (E|S j ), an agent's optimal investment strategy leads to a resource growth rate γ = I(E; S j ) [36].Agents with better models (and better statistical estimations) experience higher average growth rates given by the information rate of the agent's signal for the environment.
We now define our environment by a set of l signals with unique statistics, S ≡ {S 1 , . . .S l } as P (E|S), with marginals of events P (E) and signals P (S).The joint information that S has on E is at least equal to S j , that is I(S; E) ≥ I(S j ; E).Generally, this inequality is strict if the conditional information I(S|E) > I(S) [40,41].We compute the total information by summing over the mutual information between the signals independently, subtracted by an interaction term across signals, The quantity, R P , denoted the coefficient of redundancy, measures the strength of this conditional dependence across larger sets of signal (two, three, etc way).It is defined in App.V A The between the signals and environmental states.When R P > 0, there is information across signals irrespective of environmental events.This means that signals are redundant, and consequently there are diminished returns to pooling information as I(E; S) < j I(E; S j ).
Conversely, when R(E; S) = 0, the signals are statistically independent, and the benefits of pooling information increase linearly with the information of each signal on the environment but there is no synergy.Finally, when R(E; S) < 0, there is conditional dependence of the signals on the environment.This is called synergy and corresponds to a superlinear benefit to pooling information in the number of agents, above and beyond the information contributed from each signal individually.

Group formation and collective decision making
We have now defined individual resource growth as a quantity of information and discussed how information can be aggregated across signals to express their synergy relative to states of the environment.Now we can explore how agents with different signals can pool information together as coordinated groups, and access the synergy in their environment through collective decision-making.
Consider the undirected hypergraph H = (A, G) of vertices, A, and hyperedges G.We consider a discrete number of vertices, A = {a 1 , a 2 , . . ., a N }, where a i identifies agent i.The set of hyperedges, g ∈ G = {1, 2, . . .}, called groups, defines the number of cooperating collectives.A hyperedge connects 1 ≤ N g ≤ N agents, and we assume that agents can only belong to a single group.Therefore, by construction, g N g = N and the sum over all nodes of every hyperedge yields the number of agents in the population.There exist two extremes of cooperation.First, when a single hyperedge spans every node, meaning all agents pool information in one group.In the limit of no cooperation, N g = 1 for all g, and no agents pool information.In this case, the dynamics of the model are similar to previous work [36].
Let S g be the set of unique signals held by the agents of a group g to be pooled, such that S g ⊆ S. The number of cooperants is defined by the number of unique signals, |S g | = k g , and is bounded by 1 ≤ k g ≤ l.When k g = l and the group has a complete signal, the collective can make maximally informed decisions.Conversely, when k g < l, the signal is considered incomplete, and the collective can only interpret and act on a subset of signals.As we will see, the number of unique signals a collective can observe determines the amount of information they can access.Now that we have defined how agents organize into groups of various sizes, we can discuss how agents pool their information to make collective decisions and grow their resources in dynamic environments.At every time step, a collective with access to all signal types observes a unique private signal s = {s 1 , . . ., s l } ∈ S. Each agent then allocates its resources r i on events according to collective g's allocation matrix B(E|s).As the event e is observed, the agent is rewarded with returns w e to the fraction of resources invested in e, B(e|s).In the limit of many sequential investments n, the average growth rate of resources converges to The optimal investment in the large n limit is the conditional probability of the event given the signals, B(e|s) = P (e|s).When the rewards are "fair", and w e = 1/P (e), the optimal growth rate is given by the mutual information [45] defined in equation 1, γ = I(E; S).The typical collective may not have a complete signal, and instead may only observe and interpret a subset of all unique signals S g .Their optimal allocation, given by P (E|S g ), then has mutual information I(E; S g ) ≤ I(E; S), with equality only if the omitted signals are completely redundant with present signals.Unless there are redundant signals, an incomplete group is guaranteed to have suboptimal information and growth rate.
Furthermore, agents may also not start out with perfect knowledge and must invest using their best estimate of the true environmental probability, X(E|S g ) = P (E|S g ).In this case, the collective's average growth will be submaximal by the number of signals and lack of information on signals, and is described by where E sg is the expectation value over the states of the group's signals, and D KL P (E; s g )||X(E; s g ) = e P (e|s g ) log(P (e|s g )/X(e|s g )) ≥ 0 is the Kullback-Leibler divergence, an information measure expressing how similar the distributions are in their inputs.This result shows that collectives with both a better model as reflected by the first term, a better characterization of the model and its various synergies by the second, and a more complete signal will experience higher growth rates.Furthermore, γ g < γ unless g is the full set of signals, so it is typically valuable to add more signals to the group.This setup is illustrated in Figure 1.

B. Maximum synergy principle and optimal growth
These results introduce important considerations for how collective innovation and growth determine strategies for group formation.In theories of cooperation such as kin selection [46] and scalar stress [47], group formation is advantaged by member relatedness and disadvantaged by unfamiliarity.This is intuitive in many situations, as agents are more likely to cooperate when they are more certain others will reciprocate [48], and cooperating with similar agents minimizes this uncertainty.Equation 4 counters this intuition by defining an explicit benefit to cooperating with dissimilar agents across heterogeneous, complementary skills and information.Specifically, a group with more synergistic signals, as defined through the conditional dependence of their decisions on states of the environment, will experience higher growth.So, even if there are additional coordination costs for more heterogeneous agents, there is now a possiblity that cooperation will emerge as there are also greater informational benefits, formalizing intuitive ideas about the value of diversity [49].
The beneficial contribution of synergy to the growth rate of resources provides an important input to models of random multiplicative growth, such as those commonly used to study wealth dynamics and mathematical finance.In its simplest form, the stochastic growth rate in such models is characterized by its first two temporal moments.The average over time, η, and the resource temporal standard deviation (volatility), σ, combine under Itô integration to give actual growth rate γ = η − σ 2 /2.Maximizing this growth rate (as a positive quantity) entails maximizing η and minimizing σ, which at the individual agent level can be achieved by (Bayesian) learning over time [36].
At the population level, it has been proposed that pooling resources in groups would naturally emerge as a means to reduce σ, when growth rate fluctuations are independent across agents, and thus maximize γ [25,50].
Our results introduce a different possibility of cooperation, through pooling information in structured groups, that maximizes η (and γ) through synergy effects.Thus, to maximize γ, agents should pool information with a most diverse set of collaborators possible to access the most mutual synergy viz. the environment.This maximum synergy principle defines the benefit of intelligent collective behavior in complex environments where there are agent level limitations to knowing the environment fully and where mechanisms of the division of labor and knowledge are favored.This principle is general and applies across levels of cooperation, whether it be individuals matching skills to form groups, or specialized groups organizing into more complex collectives [38], all the way to large scale societies.
Generally, these two strategies-information synergy versus resource pooling under independence-are distinct modes of cooperation over which groups can maximize γ, as demonstrated in Figure 2.
As we will see later, the decision of who to cooperate with is not trivial, as different combinations of signals may yield varying synergies.This means that under constraints of group size, such as from cooperation costs per connection, groups satisfying the maximum synergy principle must intelligently select which signals and agents to integrate, and which to exclude as redundant.Furthermore, collectives may not a priori know the optimal allocation strategy that leverages the synergy available to their signals, meaning that intelligent collective behavior must itself be learned over time and by exploring the best possible matchings.We will now develop the dynamics of how a group can organize itself optimally so as to maximize its synergy.
Complementary strategies for increasing the long-term growth rate of resources from the environment in stochastic growth models.Pooling resources can reduce the volatility through a hedging strategy while pooling information creates synergy to increase average growth rates.The lines represent contours of constant average growth rates γ.

Synergy maximization through Bayesian inference
Bayesian learning is the optimal strategy to incorporate new information from observed events into the estimate of conditional probabilities, such as those of environmental states given agents' signals [36].Agents can also learn the synergy embedded in their environment in groups by collectively weighing their conditional observations across their individual signals.A group wanting to maximize their synergy must then update their conditional relationship through a Bayesian inference process where the normalization A = ( We take the prior probability, X(e 1 ) = X(e), because we are assuming that the environment is stationary or at least slowly changing relative to groups' learning rates.Bayesian inference converges X(E|S) → P (E|S) over time, decreasing the information divergence, and maximizing synergy and average growth.For groups with incomplete signals, the information acquired through learning is still bounded by what is available in the incomplete signal space.
We have thus far defined collective growth in terms of information synergy, and shown how agents can learn as a collective to increase their growth rate over time.We will now illustrate these general results using a model based on logic circuits.

III. MODELING SYNERGY WITH LOGIC CIRCUITS
Logic circuits have been used extensively as models for synergistic interactions [38,40,41].This is because their outputs are predicted by combinations of inputs, much like events are predicted by combinations of signals.Among other logic circuits (like AND or OR), the XOR gate is unique in that information between inputs and outputs only exists as synergy across all inputs [51]; no individual input has mutual information with the output.
In the following, we will show how modifying the XOR gate relaxes this condition, such that information exists for any input and scales on average with the number of cooperating signals.Similar to [36], while this model will be used to study synergy in a simplified setting, the theory is defined for general dynamical environments.

A. The Uniform XOR Gate
Consider the space of statistically independent binary signals s j ∈ 0, 1, such that a sample set s has uniform probability P (s) = 2 −l .We assign each input s a binary event, e ∈ 0, 1, using the generalized XOR rule, e = M 2 (s) ≡ l j=1 s j (mod 2) with binomial probability p s .From the sets of sampled signals, s, and binomial coefficients p = {p s }, we can define this generalized XOR circuit as a joint distribution on signals and events as This distribution is called the uniform XOR (UXOR).It performs a unique, l dimensional XOR gate on each input s with probability p s .In the case where p s = 1 for all input permutations, this circuit behaves deterministically like an XOR gate, and the complete group has 1 bit of information.In the limit of p s = .5,this no longer models a logic gate as the output is uncorrelated to the inputs.The truth table of this circuit is shown in Figure 3A for an environment with two signals.

Information scaling in the UXOR environment
With this explicit choice of distribution, we can explore quantities of information that will define a group's growth process.For simplicity, we choose a uniform prior for the distribution of p, but in principle any prior distribution is admissible.The information available in the environment measures the maximum average growth rate a group with a complete signal can experience.When averaged over all configurations of p, the information is given by I(E; S) = log 2 − 1/2 ≈ .28bits (App.V C).
For groups with incomplete signals (when k g < l), we compute the information by marginalizing equation 6 over the λ g = l−k g signals unavailable to the group.The procedure for marginalization is defined in App.V C, but in general, marginalization of one signal halves the size of the parameter space p that describes the distribution.The average information for an incomplete signal is approximately (App.V D) Average information increases exponentially, ∼ 2 k , as more signals are included.The mutual information of the complete signal is independent of the number of signals, so the information of a single signal must converge to zero in the limit of large l.
The exponential scaling of the information with the number of cooperants is demonstrated in Figure 3B, as lines on a logarithmic scale for environments of increasing l.The curves are computed by Monte Carlo sampling circuits for l signals by measuring the information after λ = l − k marginalizations.

Growth and group learning
Until now we have explored the mean behavior of this environment subject to a uniform prior.In general, collectives do not have perfect information on a single prior.In this case, their inaccurate guess for the set of binomial coefficients is parameterized by x g ≡ {x sg }, indexed by the signals available to the group s g ∈ S g , and the collective's likelihood model becomes X(e|s g ) = f (x g , k g ).The information divergence term of equation 4 becomes the divergence between f (x g , k g ) and f (p g , k g ), where p has been projected into the subspace spanned by S g , averaged over all signals E sg [D KL ] = p sg log(p sg /x sg ) + (1−p sg ) log[(1−p sg /(1−x sg )] here angle brackets denote sample averages over the binomial values.Subtracting the mutual information by this term yields the growth rate under imperfect, incomplete group information.
We have so far described growth rate dynamics under a stationary x g .To illustrate growth dynamics under group learning, we turn to the Latent Dirichlet Allocation (LDA) model.Through a categorical description of pairs of events and signals, agents experience average dynamics to x g in the limit of high sampling rate ω = n/t 1 where κ defines the Bayesian update time.The details of LDA are given in Ref. [36] and provide parametric  dynamics that converge to full information as a power law in time, in stationary environments.
To study the dynamics of resources in the UXOR environment, we simulated agent investments in a Monte Carlo sampled environment.We randomly assigned N = 2000 agents signals in an l = 4 environment, then randomly assigned them to groups sized l ≤ N g ≤ 11.This results in an ensemble of groups with cooperants 1 ≤ k g ≤ 4. We reveal Bernoulli-sampled signals to the groups, whose agents make collective decisions on which events to allocate resources.For each group, we track the resources of a representative agent, informed by the group, investing their individual resources through time.
Figure 4 illustrates the results of this simulation.In subfigures A and B, the Monte Carlo simulated means are shown as solid lines, with 95% Confidence Interval (CI) shaded regions.Theoretical means are computed from the initial population configuration using equation 9, plotted as dashed lines, with hash-filled uncertainty regions.The more unique signals a group can access increases, the more they can learn, and the more resources they acquire over time.A high signal-to-noise ratio when k g = 1, 2 causes growth rates lower than the theoretical mean, and cumulatively fewer resources over time.Top For kg = 2, 3, the synergy benefits of a parameter configuration are given by the difference between the information when averaged (small dot) and pooled (large dot).Bottom Parameter values exist where no signal combinations hold synergy (left) and synergy is equivalent across signal combinations (right).

Constrained intelligent group formation
For the groups with k g < 4 (incomplete signals) there is significantly higher variance in both information and resources compared to k g = 4.This is attributed to differences in synergy between groups with different combinations of signals of order k.This illustrates a general feature of the maximal synergy principal; that signal combinations with higher conditional dependence on the environment will have higher synergy and experience higher growth rates than other combinations.Figure 4 demonstrates the synergy effects across different combinations of signals.For each group of size k, the left, smaller dot indicates the amount of information each signal has averaged over the signals present.The right, larger dot indicates the total information the combination of signals has when pooled.The difference between the two dots gives the amount of synergy.We see, for example, that even though signals 0 and 3 have less information than signal 2, both signals have higher synergy effects when pooled with 1 individually, as indicated by their crossover with the 1, 2 line.For a group aggregator, not only does this mean that signal choice is nontrivial, but also that individual information is not generally a good indicator of synergy benefits that can be realized when pooled.This result reinforces the complexity that fulfilling the maximum synergy principle entails, as one must understand signal complementarities for a given model of the environment to all orders, a likely costly process.
As demonstrated by the bottom plots in Figure 4C, through a smart selection of p, we can also design special environments such as where either no synergy is present, or where there are uniform benefits of synergy across combinations of signals.The procedure for constructing environments with specific synergy profiles will developed in future work.

IV. DISCUSSION
In this paper, we developed a novel mechanism of cooperation among heterogeneous agents that use shared information to grow resources in noisy environments.We derived the benefits of cooperation in terms of synergy gained by pooling information across agents' unique signals.This motivates the principle of maximum synergy, whereby a group's aggregate growth is optimized when that group maximizes the synergy of its members relative to a statistical environment.We proposed this principle as a complementary avenue to cooperation compared to the reduction of volatility through resource pooling in multiplicative growth models.We then showed that a group with no a priori knowledge of its potential synergy can learn it through Bayesian inference.We illustrated these principles using a model of a high-dimensional probabilistic logic gate and showed that, on average, group synergy scales superlinearly (exponentially) with the number of unique signals in the group.We also illustrated the challenge faced by groups under constraints to size to pick not just unique signals but also admit new group members as additional signals that maximize their potential collective synergy.
These results formalize several insights into the causes and benefits of cooperation.First, the properties of information allow us to consider how the limits to human effort and ability motivate group formation.Specialization through learning or adaptation is costly in terms of time and resources, motivating a division of labor to fully learn and maximize productivity across disparate but synergistic agents [34].
Second, these results motivate analyses of how information and resource pooling strategies affect different levels of selection within an organizational hierarchy.Effective resource pooling relies on uncorrelated fluctuations across participants, which is not possible when agents are making coordinated decisions across correlated signals.We therefore expect information and resource pooling strategies to create tradeoffs in group formation, and apply to different environmental features and levels of selection.
Groups lacking informational complementarities (because they are homogeneous) operating in variable environments should pool resources.This may apply to people in insurance pools, or independent economic sectors within a common population, such as a city or nation.Conversely, groups in complex environments made up of agents with complementary knowledge, such as within a firm or innovation ecology, should engage in information pooling and skills specialization to maximize their production whenever the variability of the environment and costs of cooperation are sufficiently low.
Parsing out these modes of cooperation becomes more important when considering how groups respond to changing environmental or social conditions.As new environmental conditions emerge, such as new industries, the distribution of synergy across different group configurations will also change, selecting for different group compositions and skills combinations.This has the interesting implication that new knowledge (science, technology, institutional change) should be disruptive of established social and economic structures because it enables new synergies.This also has implications for natural ecosystems [52] where changing environmental conditions and variability, such as via climate change, may alters their structures.
Third, the framework developed here describes a general approach to describing interaction dynamics in many fields.The conditional probabilities P (e|s) capture the general structure of information between populations and their environment.Through synergy, that information becomes encoded in how groups form and are structured, and which sets of coordinated behaviors produce beneficial or detrimental behaviors across agents.By tracing over states of E, averaging over (stationary) environments, we can produce a set of rules for (average) rewards associated with agents' perceptions and actions.This shows how general conditional probabilities of choices and behaviors in given environments may underlie particular games and other phenomenological agent interaction rules [53].Furthermore, because conditional distributions are general and multi-dimensional they also provide natural models of higher order interactions expressing large groups' synergy, such as reciprocal cooperation and the emergence of culture as shared knowledge and behavior [24].In summary, the formal properties of information, made explicit over group structures and time, provide the theoretical basis for a broad class of agent interaction models found throughout the social and ecological sciences.This includes the formation of complex societies made up of diverse cooperating agents in situations where large scale synergy becomes possible.
This work is supported by the Mansueto Institute for Urban Innovation and the Department of Physics at the University of Chicago and by a Na-tional Science Foundation Graduate Research Fellowship (Grant No. DGE 1746045 to JTK), and by the National Science Foundation, through the Center for the Physics of Biological Function (PHY-1734030), as well as the National Institutes of Health BRAIN initiaitive (R01EB026943) to AGK.

V. APPENDIX A. Information Aggregation
Consider a target statistical variable E (environment), that we wish to predict using l other variables (signals) S = {S 1 , . . ., S l }.The mutual information between each signal S i separately and E is given by [54] where H(E) is the Shannon entropy of E, and the variation measures the difference in entropy of the event when conditioned on the signal.From the rules of information aggregation, this expression generalizes to information across every added signal [55].The mutual information between the event and the set of several signals is given by The first term of this expansion is just a sum over the mutual information of each individual signal and the environment.The goal of this section is to show that the inclusion of each new signal introduces a coefficient of redundancy of progressively higher order.The first term is where we used the identity H(A, B) = H(A|B)+H(B).We denote R as the coefficient of redundancy, which measures the difference in mutual information between the variables, I(S) ≡ I(S 1 ; . . ., S k ), and the mutual information of the variables conditioned on E, I(S|E).When I(S i ; S j ) < I(S i ; S j |R), the signals contain less mutual information in the absence of the event (we gain information by considering the event), and R(S i ; S j ; X) < 0. In this case agents experience a positive benefit from pooling information, which we call synergy.To demonstrate this effect to higher orders in goups of signals, we perform a similar calculation for a three-signal interaction.
We see that an analogous redundancy coefficient arises in three dimensions.This can generally be retrieved for arbitrary number of dimensions through a similar iterative procedure.We refer to the sum of these moments collectively as the redundancy of the joint distribution, denoted R P [55], Note that redundancies of lower order than cardinality of the signal space must be computed over every combination of signals.For example, when l = 3, there are three second order redundancy terms.This expansion generally defines the benefits to cooperation over increasingly higher orders of cooperation (number of signals).This expression can be used to compute the relative strengths of the various orders of interaction for any set of signals and environmental variables, given their conditional distributions.

B. Kelly Growth rate
Consider an environment with events conditionally dependent on signals characterized by a joint distribution P (E, S) for event E and l signals S. Consider a cooperative Kelly investment scheme whereby each participant, agent i, witnesses signal s i ∈ S i , and informs the collective how to invest their shared resources r.The mechanics of pooling resources and collectively investing will be discussed below.Kelly's formalism can be adapted by expanding the environmental probability to contain l signals, P (E, S) → P (E, S), as can the betting matrix X(E|S) → X(E|S), where S = {S 1 , . . ., S l }.When odds are fair, the Kelly growth rate is given by the returns to each investment, averaged over the probability of that signal, event pair Then, using the fact that the output E is also a Bernoulli variable, P (e = 1|s) = 1 − P (e = 0|s), and where g is a function representing application of the UXOR gate and is defined over binomial parameters x as The number of terms in (17) grows exponentially with l and quickly becomes large.When it is sufficiently large, the sum can be approximated by an average.In particular, for uniformly distributed P (e = 0|s), 1 2 l s g(P (e = 0|s)) ≈ g(x) x∼U (0,1) .
This expectation can be analytically evaluated.With this, (17) gives

D. Mutual Information for incomplete signal sets
Here, we demonstrate that the information the collective has about the gate output scales exponentially with respect to the number of cooperants, k, as is depicted in Figure 3.Following the previous section, the introduction of the function g(x) = x log x + (1 − x) log(1 − x) simplifies the expression for mutual information As before, this sum can be interpreted as an average over the uniform distribution when the number of terms is large.
Here, the removal of parts of the signal changes the distribution of parameters, so the measure that approximates this sum also changes.We call this new measure P k .Furthermore, whereas in the main text, the subscript of S g denoted the signals of group g, here the subscript of S k will denote the signal set of cardinality k to be marginalized.With this new notation, the first term in ( 18) may be written approximately as: To compute P k for k < l, we need to calculate how the probability of E conditional on the remaining signals s k changes under the removal of the k th signal.For this model, By iterating this sum, we reduce the number of parameters required to describe P k (k), which in the main text are given by the set of binomial parameters p.Additionally, the distribution P k (x) becomes increasingly narrow, centered around 1/2, which is the mean of all probabilities P (e|s).Parameterizing P k (x) by its moments allows us to directly compute the mutual information.The moment expansion of the distribution is given by Using standard arguments, which we provide in the following section, these moments approximately scale like which is related to the onset of central limit theorem behavior.This provides us with a heuristic explanation for why I k scales as 1/2 λ .After only a few cooperants are removed, higher order terms in the expansion (with order denoted by a) quickly die away, leaving only the second-order term k → 2m (2) k ) 2 ) .
Once this occurs, we can see plainly that between each marginalization the mutual information reduces by half, While this explanation gives approximately the correct scaling behavior, it does not admit a good estimate of I k near full cooperation, since there the higher-order terms in the expansion are not small.To explicitly include these terms, we need all derivatives of f , evaluated at x = 1/2 f (a) (1/2) = (−1) a 2 a a! a(a − 1) .
Inserting these derivatives and the approximation (20) for moments of P k gives an approximation of I k as a series.Then, evaluating this series analytically yields a closed form expression.
This expression gives a good approximation for I k in the small λ regime and also captures the scaling in the intermediate regime.For λ → ∞, an estimate of the asymptotic behavior is given by setting z = 2 −λ/2 and Taylor expanding around 0. Because z = 2 −λ/2 , the quadratic leading term agrees with the observed exponential scaling I k ∝ 2 −λ .Although the scaling prediction is correct, there is not a regime where this high-λ expression consistently estimates I k in its actual value.The essential reason is that as k decreases, the number of terms in the sum over remaining signal states decreases, and the approximation of that sum as an average over P k (x) begins to break down.This manifests as an error in the overall scale of the estimate, but not in the exponential dependence on k.
Instead, using the fact that I l is known to a very good approximation and further that information is approximately exponential in k, a good estimate for I k at large to intermediate k is given by This is the quantity quoted in (7) in the main text.

Moment scaling with k
The following argument justifies (20) and is standard.We produce it here for completeness.Upon moving from k to k − 1 cooperants, the new conditional distribution is given by (19).This means that P k−1 is given by a convolution of P k with itself.The characteristic function of a distribution over a continuous variable is its Fourier transform.Since P k−1 is a convolution, its characteristic function is the product of characteristic function of P k with itself.A slightly more convenient object to work with is therefore the logarithm of the characteristic function: This leads directly to the central limit theorem, since when the second cumulant is rescaled to remain constant with respect to n, all higher-order cumulants scale to zero.Here, by using the fact that the n th moment m k for b ≤ a, we can see that the second-order cumulant dominates all of these expressions once k is sufficiently small.For example, the fourth moment quickly scales like (m (2) ) 2 as k is decreased from l because the second cumulant begins much larger than the fourth cumulant and remains dominant.

FIG. 4 .
FIG. 4. Groups learning an l = 4 environment using more unique signals acquire more resources and information, but combinations of signals have unique amounts of information.A. Temporal resource trajectories, grouped by number of unique signals in the corresponding group show that growth increases with the number of signals.B. Groups with more signals can gather more information from the environment.There is high variability when kg < l, as different combinations of signals access different amounts of information.C.Top For kg = 2, 3, the synergy benefits of a parameter configuration are given by the difference between the information when averaged (small dot) and pooled (large dot).Bottom Parameter values exist where no signal combinations hold synergy (left) and synergy is equivalent across signal combinations (right).

ϕ,
k (z) = log R dxP k (x)e izxThe sum rule above gives a recursion relation for ϕ kϕ k−1 (z) = 2ϕ k (z/2) + log 2 .Now, the cumulants of P k can be calculated from ϕ k c Meaning there are also recursion relations for the cumulants: c FIG.1.Groups of agents with different signals grow resources based on the information between their signals and states of the environment.A. Groups, denoted g, are composed of an arbitrary number of agents.Each agent belongs to only one group and can observe and contribute one signal to the group.A group contains kg unique signals.B. At each time step, (a) the groups's private channel outputs a signal s ∈ S with probability P (s).(b) Each member of the group observes their signal sj and (c) the group consults their collective belief for the conditional outcome probability of the environment, X(E|s).(d) The agents make proportional resource allocations on all possible outcomes B(E|s).(f and e).The true event e ∈ E is observed in the environment with probability P (e), and (g) the agents receive payouts proportional to the marginal probability of e.
Here we compute I(E; S) for the UXOR logic circuit.This represents the information that a group of agents with l distinct signals have about the output E of the probabilistic gate, averaged over all configurations of the gate for a uniformly distributed prior.Because the signals s i are independent Bernoulli trials with probability 1/2, These two terms can be simply expressed as G = I(E; S) − E s D KL P (E|s)||X(E|s) , similar to previous work, but we can decomposes this equation in terms of redundant information across the signals using equations 11 and 14.C.Information for UXOR circuits