Immunity-induced criticality of the genotype network of influenza A (H3N2) hemagglutinin

Abstract Seasonal influenza kills hundreds of thousands every year, with multiple constantly changing strains in circulation at any given time. A high mutation rate enables the influenza virus to evade recognition by the human immune system, including immunity acquired through past infection and vaccination. Here, we capture the genetic similarity of influenza strains and their evolutionary dynamics with genotype networks. We show that the genotype networks of influenza A (H3N2) hemagglutinin are characterized by heavy-tailed distributions of module sizes and connectivity indicative of critical behavior. We argue that (i) genotype networks are driven by mutation and host immunity to explore a subspace of networks predictable in structure and (ii) genotype networks provide an underlying structure necessary to capture the rich dynamics of multistrain epidemic models. In particular, inclusion of strain-transcending immunity in epidemic models is dependent upon the structure of an underlying genotype network. This interplay is consistent with self-organized criticality where the epidemic dynamics of influenza locates critical regions of its genotype network. We conclude that this interplay between disease dynamics and network structure might be key for future network analysis of pathogen evolution and realistic multistrain epidemic models.

INFV epidemiology has benefited from decades of research using phylogenetics and molecular evolution to carefully interrogate features of INFV evolution (13)(14)(15). Exercises in applied evolutionary theory have served as validations for the use of molecular methods toward meaningful predictive evolution (16,17). These methods, in combination with larger data sets, offer increasingly accurate probabilistic models for INFV evolution. As effective as they have been, these approaches are based on particular population genetic assumptions and limitations. For example, tree-based methods are necessarily acyclic and as such do not fully capture the relatedness of strains.
Phylodynamic approaches have features of neutral networks, defined by genotypes that evolve via drifting through epochal evolution (18,19). Genotype networks constitute another approach used to study INFV evolution, and are built on different assumptions and constraints than other approaches (20)(21)(22)(23)(24). Previous networks have been constructed from the highly antigenic hemagglutinin (HA) protein sequences of INFV (20). The networks revealed features not well represented in phylogenetic trees, such as identical trait evolution in separate lineages (convergent evolution). More importantly, dynamical systems describing the spread of pathogens are often parameterized through genotype networks rather than phylogenetic trees to better capture strain-transcending immunity (25)(26)(27)(28)(29)(30). Unfortunately, these previous studies use toy networks as genotype networks are prone to fragmentation in the presence of low sampling rates, reducing the number of observed plausible evolutionary pathways. Sampling has increased dramatically in the last decade, which now allows for a more accurate account of the evolution of INFV genotype networks.
In this study, we utilize a large modern data set of INFV H3N2 sequences (over 28,000) and a genotype network approach to capture the genetic relationship between the 2010 and 2020 INFV H3N2 strains and their evolutionary dynamics. Sequences of the highly antigenic HA protein of INFV A (H3N2) are used to analyze the structure and temporal evolution of the genotype network and its exploration of genotype space. Finally, a multistrain epidemic model is implemented to explore how the density and distribution of edges (or mutation pathways) determine epidemic potential in the context of strain-transcending immunity. We demonstrate the existence of a fundamental structure underlying INFV genotype space, one that captures temporal features of virus evolution and suggests underlying predictability. In doing so, we fortify the relevance of genotype networks as a meaningful approach to the study of virus evolution, one that can complement mathematical and phylodynamic approaches in future efforts to study and predict the dynamics of evolution of INFV and other RNA viruses.

Network generation
Protein sequences were obtained for complete INFV A (H3N2) HA samples from the Influenza Research Database (31). Samples acquired from the Influenza Research Database are sourced from databases that include NCBI GenBank and RefSeq. Samples were obtained on 2020 January 16 and restricted to a collection date of 1999 January 4 through 2019 October 1 and collected from human hosts only. A 3-month delay between final sample collection date and data retrieval date was implemented to account for delays in data reporting.
A total of 30,175 sequenced samples for HA were obtained. Sequences were further restricted to allow for the precise genetic se-quence comparison required for network edge construction. Samples with missing or uncertain residues (n = 1,278) and sequences with more or less than 566 amino acids (n = 17) were removed. The remaining 28,880 samples were condensed into set V of 9,714 unique sequences.
The number of differing amino acids across all sites for sequences v and w, d v,w , was found for all pairs of sequences of length l = 566: An edge e v,w is formed if d v,w = 1. Each edge indicates a plausible, but not definitive, mutation pathway between two viable strains that requires one point mutation, thus no intermediate strains nor multimutation events. The resulting genotype network is defined as G = (V, E), where E is the set of all edges e v,w . Temporal analyses restricted data by year using seasonal trends of the Northern Hemisphere, given its dominance of the data set. Sequences were binned according to a 5-y window, where each year consisted of July 1 through June 30 of the following year. For example, a 5-y window centered on 2010 would contain sequences from 2007 July through 2012 June.

Multistrain epidemic model
Building on previous work (25)(26)(27)(28)(29)(30), we assume that the epidemiological dynamics of INFV follow the classic Susceptible-Infectious-Recovered-Susceptible (SIRS) model, and introduce an underlying, data-driven, genotype network that defines potential mutations and allows strain-transcending immunity. An individual infected with strain i ∈ [1, N] can cause a mutation at a rate μ to a strain j ∈ N i , where N i is the set of first network neighbors of strain i.
All strains spread concurrently in a well-mixed host population. Individuals are susceptible (S) if they possess no previous immunity. Each susceptible individual progresses to infectious state I i , corresponding to strain i, at a rate βI i . The basic transmission rate β is held constant for all strains, as we focus on neutral evolution (antigenic drift) as a first approximation.
Infectious individuals in I i will either (i) recover at rate γ to state R i and acquire full immunity for strain i and partial immunity to other strains j = i or (ii) undergo a mutation to strain j at a rate μ for all strains j in N i . Recovered individuals in R i will either (i) lose immunity and progress back to S at rate α or (ii) get infected with strain j = i and progress to I j at a reduced rate β * due to their partial immunity. Specifically, β * is an exponentially decaying function of genetic distance between strains i and j, where x ij is the network distance between strains i and j (shortest path between strains i and j in the genotype network, different from d v,w used above) and is the characteristic length of immunity (0 < < ∞) as it transcends specific strains over the genotype network. Note that we make the assumption that an individual's immune response is set by the most recent infection as accounting for a full immune history would result in N! possible immune states. The model assumes that (i) an individual may be infected by at most one strain at a time, (ii) an individual's immune response is determined by the strain responsible for their last infection, and (iii) transcendence of immunity decays exponentially as a function of the distance between strains. The model was implemented  Number of nodes n and edges m as well as average degree k , maximum degree kmax, diameter D, clustering coefficient C Global , and assortativity (degree correlations) coefficient r. Numbers in parentheses correspond to the average values obtained under 100 realizations of two null models, respectively: Erdős-Rényi random graphs parameterized by density only and a configuration model parameterized by the full degree distribution.
with a system of differential equations containing one susceptible state and an infected and recovered state for each strain. The dynamical system describing this model is presented in the "Materials and methods" section, and its dynamics were studied in ref. (30). The model itself can run over any genotype network defined as a number of strains i ∈ [1, N] and a set of neighboring strains j ∈ N i for each strain. In what follows, we therefore couple the model with known generative models of networks that can help explain some key network features found in the genotype data.

INFV A (H3N2) HA genotype network
The INFV A (H3N2) HA genotype network represents 28,880 samples of HA, resulting in 9,714 nodes (unique strains), 7,599 edges (possible point mutations between strains), and 3,262 connected components, of which 384 consist of more than one node. With 29.6% of nodes of degree k = 0 and 44.0% of k = 1, the network features a skewed degree distribution, stretching up to a maximum degree of k = 256. The tail of the complementary cumulative distribution function (CCDF) of degree, P(K ≥ k), exhibits power-law behavior: P(K ≥ k) ∝ k −α k with an estimated scale exponent α k = 2.29, Fig. 1 (left panel). This is in agreement with the heavy-tailed degree distribution found by Wagner in the largest connected component of a smaller data set from 2002 to 2007 (20). In growing networks, this degree distribution points to generative models with approximately linear preferential attachment underlying the dynamics of the observed genotype network (32)(33)(34). Linear attachment is a critical mechanism such that a growing network produces power-law degree distributions, at a transition between exponential distributions under sublinear attachment and condensation to a star graph under super-linear attachment (35,36).
The distribution of component sizes of the genotype network is similarly skewed. The tail of the CCDF of component sizes P(C ≥ c) follows a power-law distribution, where P(C ≥ c) ∝ c −αc with scale exponent α c = 1.66, Fig. 1 (center panel). This scaling is also be suggestive of another critical process in the formation of the genotype network, as this distribution of component sizes with scale exponent α c = 1.5 is a well-known result for the critical point of percolation processes and random graphs (37).
The degree of a node and the number of times its corresponding sequence was sampled are highly correlated, Fig. 1 (right panel). Structurally important nodes of high degree (hubs) are therefore robust to reduced sampling, given that the duplicate sample count of a strain may be a proxy for its population prevalence. The network also contains numerous cycles amidst its heterogeneous tree-like structure. Its 500 triangles indicate mutations at the same site between three sequences, while sparse squares indicate potential convergent evolution (20). These structures are clearly displayed in genotype networks, while phylogenetic tree construction do not include convergent evolution structurally. The tree-like topology of the network prevents longer cycles from forming. Further network summary statistics are shown in Table 1 for the entire network G and the giant component GC. The triangles are captured by global clustering C global , which is equivalent to the proportion of triplets (three connected nodes) that form a closed triangle. Despite the biological relevance of triangles (20), we find that they are neither overrepresented nor underpresented when compared to random networks with a fixed degree distribution; as shown in Table 1.
The degree assortativity r represents the correlation between the degree of a node and that of its neighbors (Table 1). A negative value for both the entire network G and the giant component GC indicate that high degree nodes tend to attach to low degree nodes. In fact, these negative degree correlations are the only feature that appears statistically significant when compared to random networks with fixed-degree distributions; again, this is consistent with growing random networks under positive attachment kernel (32,(34)(35)(36).

Network topology in time
The genotype network grows in time as new strains emerge and are sampled. For example, the growth of the second largest component is shown in Fig. 2, with each node colored by the first sample date for each strain. This component is large enough to span several years while remaining small enough to qualitatively observe network growth in time. The blue-shifted nodes represent the earliest observed strains among those belonging to this component, the first of which was sampled in late 2010. The major-ity of unique strains were sampled from 2012 to 2015, including multiple high-degree strains and their neighbors. The most recent strains from this network component are red-shifted, clearly depicting the tree-like growth process.
Numerous hubs are seen throughout the network, with the largest hubs existing around the 2012 and 2013 flu season that contributed numerous strains to this component (Fig. 2, bottom). Seasonality is reflected in the sample date distribution of this component, with multiple peaks around the start of the calendar year during flu season.
Features of the genotype network remain fairly stable in time, even in the presence of a constantly increasing sampling rate. Genotype networks were constructed using samples within a 5-y window, sweeping across the entire sample set. These temporally restricted genotype networks display the structure of the network local in time-an important consideration given that strains emerge and fall out of circulation. These networks display the increased availability of sequenced samples with each successive year, with notable increases in sampling since 2008 (Fig. 3, left panel). The number of nodes and edges has grown steadily in the past two decades across both the entire network of the 5-y windows and its giant component.
Scaling of both degree distribution and component size distribution tails remain fairly constant in time. The scale exponent for degree averaged 2.34 across these networks, varying from 2.10 < α k < 3.06. We find over a decade of consistency near its mean (about 2.2 or 2.5) even as the network grew several times larger, Fig. 3 (center panel). Similarly, the power-law exponent for component size averaged 1.63 and varied within 1.44 < α c < 1.73, demonstrating consistency in time and a comparable independence from sample rate as the network grew, Fig. 3

(center panel).
Here c min and k min were fixed at 7, enabling a direct comparison with the entire network.
Local cycles continue to remain prevalent in the network through time. The global clustering coefficient varied within 6.78 × 10 −3 < C global < 1.27 × 10 −2 , showing greater variability than scaling factors, Fig. 3 (right panel). Similarly, degree assortativity varied within −0.365 < r < −0.124, demonstrating variability but preserving the disassortative structure of the network. The above features demonstrate that in the presence of variable sequence sampling rates, genotype networks possess fairly consistent topological features that are highly predictable from recent years.

Multistrain epidemics with underlying genotype networks
Before running our dynamical system for a multistrain epidemics on empirical and synthetic genotype networks, it is useful to clarify what structure is represented by these networks. In theory, there is a true, fixed, full genotype network, which represents all possible sequences of INFV H3N2 regardless of viability and fitness. In practice however, only a subset of these sequences actually emerge and are viable, leaving us with a subgraph corresponding to the realized genotype network. To make their way into our dataset, this network is further sampled by the sequencing process, leaving us with a subgraph for the observed genotype network. Because our dynamical system is represented by a set of continuous and deterministic equations, the model itself blurs the line between the realized and the observed genotype network. This assumption is meaningful since sampling of complex networks can often alter their structure in nontrivial ways (38). However, sequences in our dataset are sampled an average of 2.97 times each and structurally important nodes are generally sampled proportionally to their degree as shown in Fig. 1 (right). Furthermore, Fig. 3 as already shown that key features are relatively fixed in time even when the size of the temporal samples vary by an order of magnitude.
To illustrate the output of our multistrain model, we this directly run the equations on one of the largest components in our dataset in Fig. 4. These results reproduce some of the main results known from the study of toy genotype networks (27)(28)(29)(30)39). First, increasing the depth of strain-transcending immunity in the genotype network does not alter the epidemic threshold of the system but does lower endemic burden and change its composition between currently infectious individuals [I(t)] and recovered individuals [R(t), recently infectious]. Second, we find in Fig. 4 (center) that depending on the interplay between the depth of immunity ( ) and its waning rate (α) there can exist a regime of localization where the fraction of infectious individuals in the endemic state (I * ) grows very slowly with the transmission rate (β) before a second transition [previously theorized as an immune invasion threshold (30)]. Finally, in Fig. 4 (right) we show a representative time series that illustrate the rich strain-specific dynamics that emerge even in a deterministic model, with heterogeneous time of emergence and cyclical dynamics that eventually settle at an endemic state (28).

Epidemics with random genotype networks
To investigate how the observed genotype network structure may be influenced by the spread of disease and learned host immunity, we ran our multistrain SIRS model with varied sytnthethic genotype networks. The incorporation of a genetic strain structure allows for both mutation between neighboring strains and crossprotective immune effects, defined as a function of network distance. Generative networks models then allow us to better study how network features affect the overall disease prevalence. Our hypothesis being that the evolutionary dynamics of INFV A (H3N2) would localize observed strains preferentially in regions of its full genotype network that have a structure consistent with high disease prevalence; the local network structure itself acting as an selection pressure on the realized and observed genotype networks.
The connectivity or edge density of a genotype network may influence its endemic infection capacity, as suggested by crossprotective immune effects and the observed criticality within the genotype network structure. Here, the effects of connectivity were investigated with the implementation of the multistrain model on fully random networks, namely G(n, P) Erdős-Rényi random graphs (40), with a given number of nodes n and edge probability P controlling connectivity for an average degree of k = P(N − 1). We measure the endemic disease burden I * + R * , summed over all strains, once the epidemic dynamics has reached an equilibrium (i.e. after a long period of transient dynamics). This disease burden was observed across varying edge densities and levels of immunity transcendence in Fig. 5 (top left) to determine the relationship between connectivity and endemic infections for a genotype network of a given size.
Nontrivial dynamics are revealed by the multistrain epidemic model with an underlying genotype network structure of Erdős-Rényi random networks. Endemic disease burdens are lowered in random genotype networks in the presence of high connectivity and nonzero transcending-immunity parameter , producing cross-protective immune effects that outweigh the increase in mutations. On the opposite end, extremely low connectivity also The network is now generated by a nonlinear preferential attachment scheme with a fixed density (corresponding to the empirical INFV genotype network) and varying attachment kernel ν. In this scheme, ν = 0 corresponds to uniform attachment, ν = 1 to scale-free networks, and ν = 2 to star-like networks. Other parameters: Network size n = 250, mutation rate μ = 1/50, transmission rate β = 1/2, recovery rate γ = 1/6, and immune loss rate α = 1/100. (Bottom left) Component size of random nodes found in the networks with highest endemic burden in the top left panel, i.e. k = 1. (Bottom right) Degree distribution of nodes found in the networks with strongest preferential attachment before endemic burden decreases due to condensation in the genotype network, i.e. ν = 1. lowers disease burden through increased network fragmentation, resulting in numerous components that restrict mutation pathways between all strains. Together these dynamics produce an optimal connectivity that maximizes disease burden. While slightly affected by parameters, we find that the optimal average degree is observed increasingly close to k = 1 as the pervasiveness of immunity increases. This density is a critical point of the network structure where a giant component emerges. Around this critical transition, we find a power-law distribution of component sizes with exponent 1.5. This distribution shown in Fig. 5 (bottom left) is similar to that empirically observed in Fig. 1 (center).
This critical component size distribution is not found at an average degree k = 1 in the INFV genotype network since its structure is far from that of Erdős-Rényi random networks. Most notably, the degree distribution of the real network is not homogeneous: The power-law degree distribution shown in Fig. 1 is radically different from the Poisson degree distributions of Erdős-Rényi networks. As previously stated, the observed degree distributions and negative correlations are both consistent with preferential attachment models. To explore degree heterogeneity, we therefore turn to a nonlinear preferential attachment model where networks are grown according to a rich-get-richer process where new strains are a mutation of existing strains chosen randomly but proportionally to their current degree to some power ν, controlling the network heterogeneity (35,36). In Fig. 5 (top right), we find that the strongest rich-get-richer effect that a genotype network can support before decreases in disease burden is a lin-ear attachment effect, reminiscent of the relationship observed in Fig. 1 (right). Under this linear preferential attachment, we find a power-law degree distribution with scale exponent 2.43, close to the exponent of 2.3 observed in the INFV genotype network in Fig. 1.
The results of these two experiments are consistent with our hypothesis. Namely, regions of a full genotype network that are at critical points in terms of density and rich-get-richer processes lead to higher disease burden such that there is a selection pressure for the realized genotype network to localize around these regions. Importantly, these experiments do not test actual mechanism for the growth of the realized and observed genotype networks. While growing networks with a power-law degree distribution can imply a preferential attachment statistic (33,34), other mechanisms can produce similar networks (41). We can, however, venture two hypotheses for how a rich-get-richer comes into play in the observed genotype network. It can emerge from either (i) mutation patterns, as strains with more neighbors are more likely to re-emerge and re-explore their neighborhood or (ii) prevalence patterns, as the individual strain fitness and reproduction rate can be estimated structurally from strain degree.

Discussion
In this study, we utilize a large data set and a genotype network to examine INFV evolution from 1999 to 2020. In doing so, we reveal features suggestive of a fundamental structure underlying INFV genotypic space, and by extension, virus evolution. The INFV genotype networks explore a subspace of all networks that is predictable in structure as they grow in time. Features such as scalefree degree distributions and component size distributions, both related to underlying critical phenomena (42), remained present and consistent in networks generated using temporal subsets of strain samples.
It may be hypothesized that selecting vaccine strains near to hubs can provide a set of candidate strains for vaccine selection. It may also be hypothesized that well-selected strains would not be near observed hubs, were they to effectively neutralize spread of strains near to it in the network space. Strains selected for vaccines are recommended in part by the antigenic similarity between candidate strains and strains circulating during the targeted INFV season. Interestingly, we see the A/Texas/50/2012(H3N2)-like virus (WHO vaccine recommendation from 2014 to 2015) and A/Victoria/361/2011(H3N2)-like virus (2012 to 2014) as edgeless nodes, respectively, two and four mutations from their nearest strains in the network. The 2010-2012 H3N2 recommendation, A/Perth/16/2009(H3N2)-like virus, has four neighboring strains, including a hub two strains away. Future work will include a thorough analysis of the relationship between vaccine strains and temporal network structure. This relationship introduces further dynamical interplay as vaccines often include strains already spreading successfully but then, in turn, affect the epidemic dynamics and therefore limit further spread and mutations around these strains.
Given the numerous mutations possible, it may not be realistic to use genotype networks to predict new strains with meaningful accuracy. However, it may be possible to predict their genetic relationship to strains existing in the network structure. Any such efforts would effectively create a map of the genotype space currently occupied by INFV, and suggestive of where in that space it may evolve (see Fig. 6). Here, we find that INFV evolutionary dynamics never returns to regions of the network left more than a year or two in the past, in line with descriptions based on We also note that components are never rediscovered after more than a few months without new strains emerging therein. Altogether, this analysis shows that the sampling of strains might be better than originally expected, but also that higher-order networks structures (paths of multiple mutations) might eventually help us better understand the global patterns of INFV immune evasion.
travelling wave models (43) but adding a descriptive layer for the growth and local structure of the observed network. Assuming the genetic distance is proportional to antigenic distance (22,44,45), this is a consequential development with regards to our understanding of cross-protective immune effects and vaccination strain selection. That is, the outlined approaches may offer perspective on which specific genotypes of a given INFV strain might offer the best cross-protective immunity. Future work may include development of models further able to predict and refine the space of plausible future strains. The predictable statistics of the genotype network topology indicates that the INFV A (H3N2) HA genotype network is influenced by strain-transcending immunity. This is consistent with the dynamics of a multistrain epidemic model. As the pervasiveness of learned immunity increases, the peak endemic burden expected in our multistrain model shifts closer to, and becomes narrower around, a critical network density corresponding to the emergence of a giant component and a power-law distribution of component sizes. In the future, knowing what mechanisms help shape the genotype network could allow network inference frameworks to identify critical new strains as they emerge (46,47).
The strong positive relationship between degree and sample count implies preferential attachment based on degree; however, node age implements a consequential maximum age at which a node may acquire new neighbors. This corresponds to the point at which the strain is not widely circulating or extinct in the host population. Furthermore, the multistrain model indicates that strain-transcending immunity drives this strain extinction process as cross-protective effects increase population immunity toward strains in time. As shown by the model, any stronger preferential attachment mechanisms would also decrease the expected epidemic burden.
This study introduces the use of network growth processes that could be used in parallel to other methods used to study pathogen evolution. These include phylodynamics (43), genealogical trees (48), antigenic cartography (23,24), and other network approaches. With regard to phylodynamics, our approach requires few of the population genetic (and other) assumptions that are embedded in phylodynamic approaches. Moreover, our results offer improvements over existing network models through the additional insights: the identification of critical properties in INFV genotype networks and the offering of mechanisms for its underlying structure. Our observations are consistent with our hypothesis that the observed INFV genotype network explores a subspace predictable in structure, influenced by the effects of strain-transcending immunity. A more realistic network growth process [involving, for example, convergent evolution, correlated mutations, and epistatic effects (49)] would be necessary to better fit the observed genotype network structure. Likewise, this observed structure is also impacted by the imperfect sampling of INFV strains. Future efforts may utilize more densely sampled populations.
In summary, we stress that increased genomic surveillance of multistrain pathogens will allow for similar analyses of other diseases with variable antigenic properties. As the evolutionary forces acting on multistrain pathogens differ, we may expect differing network structures from pathogen to pathogen. For instance, HIV has unique pressures from lifetime infection and pathogen evolution, highly active antiretroviral therapy used in its management, as well as bottleneck transmission events and selection biases (50)-all mechanisms that could lead to unique network features. Rapidly changing pathogenicity and virulence in emergent viruses, such as SARS-CoV-2, could yield dynamic network features. As the COVID-19 pandemic has generated data at an unprecedented pace and level of granularity, it may offer the opportunity for an analogous comparison (51).
More broadly, our findings support the importance of multiple methods-utilizing both existing phylodynamic approaches and network and graph theoretical methods-toward a comprehensive picture of virus evolutionary dynamics. The use of multiple methods can be complementary, as standard canon from evolutionary theory and methods from complex systems can each offer useful information about pathogen evolution.
In the future, we might be able to characterize the underlying physics of RNA virus infection networks that can be used to predict long-term patterns, toward improved public health interventions: vaccine strain selection, analysis of evolutionary trajectories, and refinement of the understanding of cross-protective immunity.

Statistical methods
Distribution tails were fitted with power laws using the "pow-eRlaw" package (52,53). For the full network generated from all years of the data set, we fit power-law distribution tails for observed values (degree, component size), where tail implies the distribution of observed values greater than some minimum. Minimum values (k min , c min ) of 5 or greater were considered, and the best goodness of fit was observed at k min = c min = 7 for both degree and component size. These minimum values were then constrained to 7 for networks consisting of 5 y of data in our temporal analysis.

Multistrain epidemic model
The dynamics of the model described in the main text and ref. (30) can be tracked with the following set of ordinary differential equations:

Experiment on Erdős-Rényi networks
In the left column of Fig. 5, we present the endemic disease burden i I i + R i (i.e. recent infections) of our multistrain epidemic model on Erdős-Rényi networks (40). The endemic state is defined as the fixed point where all derivatives of the system are equal to zero. Erdős-Rényi networks are obtained by generating a set of n = 250 nodes and connecting each possible pair of nodes with probability P = k /(N − 1) such that the expected degree of all nodes (number of first network neighbors) is set by k .

Experiment on preferential attachment networks
In the right column of Fig. 5, we present the endemic disease burden i I i + R i of our multistrain epidemic model on networks grown through preferential attachment (35,36). These networks are obtained by starting from a pair of connected nodes and growing the network until we reach a network of size n = 250 nodes.
The networks are grown through the following discrete stochastic process. At each time step, we either connect an existing pair of nodes with probability P or connect a new node to an existing node with complementary probability 1 − P. The probability P sets the expected density of the network, since after t time steps we expect t edges and (1 − P)t nodes for an average degree k = 2/(1 − P). In our experiment, P is chosen to fix the average degree to that observed in the giant component of our empirical data, i.e. k = 2.73.
At every time step, we therefore need to pick either two existing nodes (probability P) or one existing node (complementary probability 1 − P). These existing nodes are chosen proportionally to their degree k proportionally to the kernel k ν . Meaning a given node i of degree k i will be chosen with probability k ν i / j k ν j . A kernel with ν = 0 corresponds to uniform attachment, whereas the linear kernel ν = 1 corresponds to the much studied linear attachment model of Barabási