Markets as ecological networks: inferring interactions and identifying communities

Financial markets are paradigmatic examples of complex systems and have been compared to ecological networks in which different species (ﬁrms) interact and co-evolve. A central object governing species dynamics in ecology is the community matrix, whose elements are closely related to pairwise interspeciﬁc interaction coefﬁcients. Using this ecological analogy we propose a method, based on the Maximum Entropy (MaxEnt) principle, that allows us to infer candidates for an economic community matrix from time series data of market values. To assess the usefulness of this picture, we construct community matrices for a set of companies belonging to the Fortune 500 list and perform a community analysis on the resultant networks. This analysis shows these networks to strongly reﬂect the known industry groupings of the ﬁrms. We conclude therefore that our community matrices capture non-trivial information about the interaction of ﬁrms, not immediately apparent from the covariance of market values. We anticipate our approach being useful in elucidating further aspects of market structure, as well as forming the basis of forecasting market dynamics.


Introduction
Financial markets are paradigmatic examples of complex systems, comprising a large number of interacting agents (traders and algorithms) whose decisions dictate their ongoing evolution. Correspondingly, markets have been regarded as being similar to ecosystems in which species interact and co-evolve [1][2][3][4][5], and population dynamics has been proposed as a model of the dynamics of companies regarded as species in the 'business ecosystem' [3,4]. This ecological paradigm has been applied to different industrial sectors, such as the newspaper industry [6], the airline and oil-drilling industries [7], internet [8] and software firms [9].
Ecosystems comprise a biotic component (communities of living organisms) and an abiotic one (the non-living chemical and physical parts of the environment that affect living organisms), interacting as a single system [10]. A central piece of the biotic component is the ecological interactions between organisms. Such biotic interactions, which can be regarded as operating either between individual organisms or entire species, are often difficult to define and measure [11,12]. The most direct procedure to estimate the MARKETS AS ECOLOGICAL NETWORKS 3 be modelled by a probability distribution for the different possible observable states of that system. This probability distribution will typically not be known, and in fact there are many possible choices for it that are compatible with observations. Of all such distributions, the recipe of MaxEnt is to choose the probability distribution that maximizes the information entropy subject to the constraints of the available data. We consider a system of N random variables, {v i | 1 ≤ i ≤ N}, arranged into the vector v. Then, if we assume that the only information we possess about these variables is their mean v and their covariance matrix , the MaxEnt principle posits the following joint probability distribution This distribution has previously been interpreted in terms of the equilibrium statistical mechanics of a generalised Ising model [18] in which J = − −1 is the matrix of interaction strengths between 'spins' of length v i . In this work, however, we consider an interpretation of Eq. (1) in terms of ecosystem interactions where, as we now show, the matrix J can be related to a community matrix. Let = v(t) − v be the vector of deviations from the mean, and let F(t) be a random force vector with zero mean and temporal correlations described by were B is a symmetric matrix. In this light, the MaxEnt distribution of Eq. (1) can then be interpreted as the stationary probability distribution of the Langevin equation [40] d dt = C · + F(t), provided that matrix C satisfies with T denoting the matrix transpose. In a population-dynamics setting, Eq. (3) describes behaviour of populations v near stationary point v with F(t) describing external (typically abiotic) forcing [41] and with matrix C the community matrix, describing interactions between the species 1 . Further progress can be made by splitting C into two components: C = K + , the first of which gives the symmetric part of C · , and the second, the antisymmetric part. From Eq. (4), we see that the symmetric part obeys This equation shows that although −1 does not determine K uniquely by itself, with the addition of a noise model (B) it does. In principle, the noise affecting different species could be correlated, but here, we consider all correlations to come from interactions, such that the noise for each firm is 4 C. EMARY AND H. FORT independent. We will consider two particular cases. Firstly, we posit that the noise is of constant strength for each firm with amplitude scaled to one, that is, B ij = δ ij with δ ij the Kronecker delta. This gives us the community-matrix contribution which is the same interaction matrix implied by the Ising-model interpretation. Ignoring for a second interactions between firms (i.e. just looking at the diagonal elements of K), in this model the relative size of the fluctuations in v i is determined by the size of the 'restoring rate' |K ii |, since the noise driving each market value is the same. In our second model, we assume that the restoring rate |K ii | is constant for all firms, and that it is the differences in noise amplitude that drives the difference in the size of fluctuations. Thus, we set B ij = |J ii | −1 δ ij in Eq. (5) such that the community-matrix contribution reads This has diagonal elements I ii = −1; ∀i. Thought of in term of Ising-model interaction strengths, matrix element I ij is the strength of the interaction between firms i and j relative to the self-interaction of i. Note that whereas matrix J is symmetric, matrix I is not. The MaxEnt procedure with fixed mean and covariance tells us nothing about the antisymmetric part of the product C · . For want of additional information, we therefore assume this contribution is zero and base our subsequent analysis exclusively on the part K-a point we will return to in the discussions.

Market data
The NYSE market has 2800 listed firms and because we do not have time series data for all 2800 firms and working with 2800 × 2800 matrices would increase the difficulty of the analysis considerably, we consider a subset of these firms, roughly corresponding to the "largest" firms in our data set. This is similar to the situation in ecological networks, which typically only include the most relevant species, i.e. the most abundant ones or those species that are a priory expected to exhibit stronger effects over other species [18]. In detail, then, the firms we consider in our analysis were selected according to the following criteria: (1) They are all amongst companies with the largest revenues in the Fortune 500 list as of March 29, 2018 [42], coinciding with day 1000 of the time series we have.
(2) They simultaneously were in the list for 5 years in a row, from 2014 to 2018 2 .
(3) The market values of these firms were available in the 2014-2018 Fortune 500 lists 3 .
This results in a list of 39 firms. However, inspecting the market value times series, we found that those corresponding to the two home mortgage companies created by the U.S. Congress, Fannie Mae and Freddie Mac, are almost exactly linearly dependent. This poses a problem for inferring the effective interaction matrix (see below) since this requires inversion of the covariance matrix which becomes singular with the above linear dependence. To overcome this difficulty we treat these two firms as a single firm, FNMA+FMCC, by summing their market values. In this way, we obtain the set of 38 firms in Table 1. For each firm we also list its ticker symbol and its Industry Group based on Global Industry Classification Standard (GICS) [39,43]. Taken together they represent always at least 25% of the total NYSE market value along this period [44], and thus this set constitutes a significant and important sample of the market. For each of the firms in our set we consider a 1000-day time series of market values from 4 October 2014 to 29 March 2018. Let us define v i (t) to be the market value of firm 1 ≤ i ≤ 38 in day 1 ≤ t ≤ 1000 of this time series. In our subsequent analysis, we consider a coarse-graining of the data over different time scales. To this end, we divide the 1000-day data into time slices, each of length T and labelled by 1 ≤ n ≤ n max = 1000/T . Within time slice n the mean market value of firm i is v (n,T ) and market-value covariance matrix has elements

Community and adjacency matrix construction
In this section, we use matrices (n,T ) to construct community matrices and the interaction networks they imply. Our first step is to combine the matrices from different time slices. For the covariance and J matrices, we simply take the mean of the individual matrices over the time slices We could similarly define a total I matrix as the average over the individual I-matrices defined as in Eq. (7) for each time slice. Empirically, however, we find that for the following analysis a preferable procedure is to construct a total I matrix by obtaining mean values of on-and off-diagonal elements of J separately, and then combining. Thus we define the matrix I tot elementwise as We then use the matrices tot , J tot and I tot as the bases of adjacency matrices of the interaction networks. We specifically want to maintain information about interaction strengths in these networks and so the networks will be weighted. Interpreting them directly as adjacent matrices brings up a number of issues, particularly in the context of community detection. The first is that the matrices are signed. Although community analysis can be carried out on signed networks [45], it is not clear that the sign here is relevant for defining communities. The second issue is the diagonal elements. Again, whilst self loops can in principle be included in community analysis, the significance of them here is unclear. Thus, 6 C. EMARY AND H. FORT Plots of the adjacency matrices A , A J and A I for T = 1000 with firms ordered by mean market value as in Table 1. Each matrix has been normalized to have a maximum element of one.
in defining adjacency matrices, we take the absolute value of the corresponding interaction matrix and drop the diagonal elements, and thus the adjacency matrices we consider are A plot of the matrices A , A J and A I is given in Fig. 1. With firms ranked by mean market value, this representation shows that both A and A J are quite nested [46], albeit with the direction of the nesting occurring in opposite directions. The matrix A I also has a concentration of large matrix elements, but this occurs in the upper right quadrant. The adjacency matrix A I presents an additional problem for community detection as it is nonsymmmetric and thus generates a directed network. A number of approaches have been proposed to address community detection in directed networks [47], with the simplest being to consider standard (undirected) community detection algorithms applied to an appropriately symmetrized version of the original matrix [48]. The most obvious symmetrization is to take the arithmetic mean of the adjacency matrix and its transpose: 1 2 A I + (A I ) T . However, in this work we employ that the geometric mean symmetrization A I sym = √ I I T , where denotes the Hadamard product, or in terms of matrix elements The significant difference between the two symmetrizations is that, in the arithmetic case, significant values can be obtained when only one of I ij or I ji is large, whereas in the geometric case, this requires both I ij and I ji to be significant. Thus, two-way links are picked out as stronger that those in a single direction. In the following, we will only give results for the geometric symmetrization A I sym and discuss briefly those from the arithmetic symmetrization at the end.

Thresholding
From Fig. 1, it is clear that these adjacency matrices contain a large proportion of very small values, and it might be wondered whether these represent genuine interactions or are simply a product of data imperfections or some random process not significant to the properties under consideration. To investigate how this potential "noise" effects our results, we introduce a threshold τ below which values in the 8 C. EMARY AND H. FORT adjacency matrix are set to zero. Specifically, for a weighted adjacency matrix with elements A ij ≥ 0 of which A max ij is the maximum value, we define a thresholded version as where 0 ≤ τ ≤ 1 is the (fractional) threshold and θ[x] is the unit-step function. Increasing the threshold causes the initially fully connected networks to first become more sparse and eventually break up into disconnected pieces. Due to the different distributions of matrix element magnitudes, comparing different matrices with the same value of the threshold is not a like comparison. We thus define the "weight removed" in the thresholding procedure as such that the impact of thresholding two different matrices to the same value of weight w is roughly comparable. In the following, we thus plot our results as a function of this weight.

Null model
To interpret community analysis results, it is important to compare with a null model [49]. The null model we consider here results from randomizing the temporal order of original market value data for each firm separately. This preserves mean and variance for the individual firms but randomizes their correlations. Adjacency matrices are constructed exactly as above, and we present results here obtained from 200 such randomizations.

Community structure
The modularity Q(g, C) is a measure of the extent to which network g is connected along the lines of community structure C [50]. We will not repeat its definition here but note that it is applicable to both weighted and unweighted networks [51]. Modularity is bounded |Q| ≤ 1, with a value of Q → 1 indicating that g possesses exactly structure C. A value of Q = 0 indicates agreement no better than random, and a value Q < 0 indicates a tendency for the graph to clustered in a fashion 'opposite' to the way defined by C (i.e. more links between the communities of C than within them). We will look at the modularity of the network defined above in two ways. First we consider the modularity with respect to the communities C ind defined by the industry groups of Table 1. This value of modularity we denote as Q ind = Q(g, C ind ). Second, we consider the modularity for a particular network maximized over all community structures Q max = Q(g, C max ) = max C Q(g, C), where C max is the maximizing community structure. Given the small size of the network, it is possible to find exactly the community structure that maximizes the modularity. This procedure is slow and, given that we will sweep over threshold and consider 200 random instances in the null model, to consistently use this exact optimization is impractical. We therefore consider instead the 'greedy' algorithm of Ref. [52] to obtain approximations to Q max and C max . We ran both the greedy and full optimization algorithms 4 for a test case of matrix A I sym with T = 100. In this case, the 'greedy communities' were found to have a modularity Q max to within 4% of (a) the exact maximum across the whole τ range. For τ > 0.25 (w > 0.75), when the networks become sufficiently sparse, the optimal communities found by both algorithms were exactly the same. From this we conclude that the greedy algorithm gives a close enough approximation to the optimal communities for our purposes here, and we use this algorithm to generate the results that follow. Figure 2 shows the modularities Q max and Q ind as a function of weight removed for the three matrices A , A J and A I sym after thresholding. Results are given for time-slices of T = 1000 (top row) and T = 100 (bottom) and are compared with those obtained from the null-model randomizations. 5 We begin by considering the results for network derived from the covariance matrix. With T = 1000, the maximum modularity Q max remains close to zero across the full range of thresholding (Fig. 2a), a result which, compared with the randomizations, is anomalously low. This can be explained by noting that the full A matrix (at τ = 0) is far more nested that is typical for the randomized ensemble and, at high connectances, there is an inverse relation between modularity and nestedness [35] because the dominant firms (i.e. those with the largest mean market value) connect strongly to most other firms in the network. Moreover, as 10 C. EMARY AND H. FORT the threshold increases, the high nestedness means that it is single nodes that become disconnected from the network (rather than more sizeable subgraphs), and these individual nodes contribute nothing to the modularity. For T = 100, the Q max modularity of the covariance network is increased slightly with a maximum value Q max ≈ 0.2 (Fig. 2d). This is consistent with the randomization results, and thus not indicative of any particular structure in the network. Concerning the industry-group modularity Q ind in the covariance case, the T = 1000 results match with randomizations, whereas for T = 100, Q ind is slightly higher than expected from the randomizations, but still negative. Overall, then, the covariance adjacency matrix (both values of T ) shows little trace of any particular community structure, and certainly none relating to the industry groups.
Turning now to the results for the A J matrix, we first comment that the randomizations have similar modularity properties to those for the A matrix. The properties of the actual A J matrices are significantly different, however. For both T = 100 and T = 1000, the maximum modularity Q max lies at or above the upper end of expectations from randomizations, and this is most pronounced in the T = 100 case (panel 2e). The industry-group modularity Q ind is now positive across the range of threshold (compared with the negative values from the randomizations) and for T = 100 lies around the upper end of the randomization range for the maximum Q max .
This trend towards increasing modularity (both Q max and Q ind ) is continued by the symmetricinteraction matrix A I sym . The strong increase in Q max as a function of the threshold is not in itself especially significant, as this is also demonstrated by the randomizations. However, for T = 1000, the actual Q max lies slightly above expectation from randomizations (panel 2c), and for T = 100, it lies significantly above the random results (panel 2f). As for A J , the industry group modularity Q max is positive across the range, and in the T = 100 case, it is large and lies around or above the upper end of expectation for the optimal modularity Q max found from the randomization. These results mean that the matrices A J and A I both show a more modular structure that would otherwise be anticipated for the null model. Moreover, Q ind being high for both these networks, especially in the T = 100 case, hints that the industry-group communities, whilst not the optimal partitioning of these networks, are playing a significant role in structuring them.
We can investigate this latter point quantitatively by comparing the community structures C ind and C max using the 'adjusted Rand index' (ARI) [54]. This assumes a value of 1 when the communities are identical, and 0 when they are only as closely related as chance would suggest. The ARI can take negative values for anti-correlation. Again, we interpret these results through comparison with the randomization null model [49]. Figure 3 shows the ARI comparing C max and C ind as a function of the weight for the same adjacency matrices as in Fig. 2. The first general trend is that the ARI for the T = 100 case (bottom row) is higher and more distinct from the randomizations than is the T = 1000 case (top row). Moreover, for T = 100 results as we go from A to A J to A I sym , the ARI values increase markedly. In particular, the results for A I sym with T = 100 stand out with ARI 0.35 for most of the threshold range, well in excess of the randomization, and with a peak value of 0.61 (Fig. 3f). This peak value occurs for large thresholds (large weight) where the network is heavily disconnected. At the other end of the spectrum, the ARI for the A matrix with T = 1000 shows that the optimal communities bear no significant relation to the industry groups (Fig. 3a).
From these results, we therefore expect the A I sym network for T = 100 to strongly reflect the structure of the industry groups, and this is apparent when we visualize the network. The main panel of Fig. 4 shows the network for A I sym with T = 100 and no thresholding applied. The nodes are colour coded according to industry group, and grouped together into the optimal communities C max . There are six modules in the optimal structure. The only module to match perfectly with a community in C ind is the telecoms community consisting of T and VZ. Nevertheless, there a number of modules with clear connections to Fig. 3. Plot of the ARI measure of similarity between maximum-modularity communities C max and the industry groups C ind as a function of weight removed by thresholding. Layout of the panels is as in Fig. 2. Continuous black lines show the results for the adjacency matrix in question; coloured lines show results for the randomized null model including 10:90 quantiles. The degree to the ARI exceeds the randomization results indicates how the communities found by community detection reflect the externally determined industry groups. industry groups. All five energy firms are bunched together, along with the two auto companies F and GM. All five banks are bunched together, along with the other finance-sector firm BRK.B, IBM, and the capital goods firms GE and BA. One module contains most of the healthcare firms plus the only biotech-pharms firm in the network, plus the seemingly unrelated firms LOW, HD and CMCSA. Then there are two modules which, although the firms within them belong to different industry groups, are clearly closely related. There is a 'tech community' consisting or AMZN, GOOGL, MSFT and AAPL, and one consisting of COST, KR, WMT and WBA (CS retail) together with PG (CS products) and CVS and TGT. Figure 4b shows the network diagram for A I sym with T = 100 again but this time with a threshold (corresponding to a weight removed of 0.79 obtained with a threshold τ = 0.271) chosen to give the maximum ARI value for this network in Fig. 3. With this high a threshold, the network is highly disconnected, and only strongly-connected modules persist. These include a healthcare-services module (5 firms, all HC Svcs minus CVS), an energy module (5 firms), a finance module (4 banks pls BRK.B, minus FNMA+FMCC which is perhaps an exception), a 6-firm module that groups together four CS retail firms COST-KR-WBA-WMT with CVS and TGT and tech module consisting of AMZN, GOOGL and MSFT. There are also several pairs. Within industry groups we have T-VZ, HD-LOW and F-GM. We also have the AAPL-CMCSA pair, a relationship which seems plausible given Apple's role as a media provider.

13
In strong contrast to these networks, Fig. 4c shows the network based on covariance matrix A with T = 100 and zero threshold. Here, community detection finds three large modules, each of which contains firms from various industry groups. This division into modules is not particularly robust with respect to thresholding, as when the threshold is changed, a number of firms switch modules. At high levels of thresholding, we obtain mostly disconnected single nodes. The only robust non-trivial modules that appear are the single pair CVX-XOM and one including the all financial firms except FNMA+FMCC. All of which reinforces what we found by studying the modularity and the ARI index, namely that the traces of the industry groups in the covariance network are extremely faint.
Finally, we address in more detail the impact of the time-slice length. Figure 5 shows the ARI for A I sym constructed using a range of values of T from 50 to 1000. Whilst there is considerable variation with threshold, it is clear from this figure that overall the ARI shows a non-linear dependence on T . In particular, the results for T = 50 and for T ≥ 250 are generally lower than those obtained for T = 100 and T = 200. Although the results differ in detail, the overall character of the T = 100 and T = 200 results is similar. From this, we see that a choice around T = 100 to T = 200 gives the greatest similarity with the industry groups.

Discussion
In this article, we have presented a method that uses the principle of MaxEnt along with time-series data of market values to infer community matrices and networks that describe the interactions between firms in a fashion similar to how theoretical ecology pictures the interaction of species in an ecosystem.
We then considered the question of whether the networks of interacting firms so-derived exhibit a significant community structure. If we were to base our answer to this question directly on the covariance matrix itself, we would conclude that the answer is no, as the corresponding adjacency matrix shows a modularity no greater than we would expect by chance and no particular trace of the GICS industry groups. A very different story emerges, however, when we look at the interaction networks derived from the MaxEnt community matrices. Here, we see modularities above what chance would suggest and, more importantly, we see a strong overlap between the optimal communities of networks and the externallydetermined GICS industry groups. And, although the match between industry groups and the optimal communities of the A I sym network is not perfect, some of the optimal communities have a logical coherence of their own, for example, the tech stocks grouped together. For some purposes these communities might conceivably be preferable to GICS groups. A related question that this community analysis enables us to answer is which of the community matrices derived from MaxEnt is the best. If we take overlap with the GICS classification as the metric, then it is clear that, although the 'Ising' matrix J shows some positive features, it is the matrix I, with adjacency matrix symmetrized as discussed, that shows the strongest signatures of the industry group structure. This analysis therefore clearly picks out matrix I as the best candidate for interaction matrix, at least in the present context.
Our second main observation in this regard is that we obtain far greater overlap with the industry groups when we split the time-sequence data into 100-day chunks and then average, rather than considering the 1000-day series as a whole. We take this as a sign that the interactions between firms change over the course of the 1000 days, and that building the community matrices over this complete time period tends to average out some of these interactions. Dividing the data up into 100-day time-slices and then constructing community matrices preserves more of this interaction information. This implies that the Langevin interpretation of Eq. (1) should be thought of as holding quasi-statically, with matrix J and stationary vector v evolving on a scale longer than 100 days. The results presented here represent a selection from a number of possibilities for constructing adjacency matrices that we considered. When set against a metric of reflecting the GICS industrial groups, these alternative approaches were found to be inferior to, or at least no better than the approaches reported above. For example, in building the adjacency matrices, we looked at taking the positive elements of the matrix rather than the absolute value, and we looked at exchanging the order of taking the mean with taking the absolute value or positive part. Concerning the matrix I tot we considered constructing the individual I matrices for each time slice and then averaging. We also considered the arithmetic symmetrization A I + (A I ) T , which gave modularity results roughly comparable with those obtained from the A J matrix, as well as the 'bibliometric symmetrization' A I (A I ) T + (A I ) T A I [47], which gave no evidence of community structure. Finally, we note that we also used the classification in terms of 11 GICS industry sectors, rather than industry groups, and obtained very similar results. This can be appreciated in Fig. 4a, where firms in the same industry sector but different industry group are grouped in the same optimal community, for example, financial and health care sectors.
Inspection of the community matrices [55] here shows that for every pair of species i, j matrix elements K ij and K ji are always of the same sign. In ecological terms, these interactions represent therefore either competition (−−) or mutualism (++) [56]. For the purposes here, this difference was not important, as we used the absolute value of interaction strengths as basis for the adjacency matrix weights. However, for further understanding the relationship between firms, this information would seem highly significant. The third major class of ecological interaction is that of antagonism, where diagonally opposite elements in the community matrix have opposite sign (+−). In ecological terms, this might represent a predator-prey or a parasite-host interaction, and here it would indicate that one firm profits at another's expense. These interactions are absent in the community matrices here, as their appearance requires that the antisymmetric component be finite, and indeed that it dominates the symmetric component for the antagonisticallyinteracting firms. However, due to its asymmetry, matrix I can model situations approaching either commensalism (+0) or amensalism (0−), and actually we found many instances in which |I ij | and |I ji | are very different in size. Providing an estimate for is outside the current analysis, and will be the subject of further investigations. At any event, the fact that we obtain sensible community detection without an explicit consideration implies that the effect of this component on the overall network structure is perhaps small. Based on its ability to reproduce non-trivial community structure of interacting firms, we hypothesis that the MaxEnt network approach described here may prove to be useful in elucidating further aspects of the structure and behaviour of economic systems. As Foster [57] explains, a network formulation is crucial to understand market dynamics, driven by positive feedbacks [58], from a complex systems perspective upon the economy. The approach outlined here provides an empirical way of putting this perspective on a quantitative footing. A future application is the detailed modelling of the dynamics of the community of firms with an eye to the forecasting of future stock prices.