Many options, few solutions: over 60 million years snakes converged on a few optimal venom formulations

Gene expression changes contribute to complex trait variations in both individuals and populations. However, how gene expression influences changes of complex traits over macroevolutionary timescales remains poorly understood. Being comprised of proteinaceous cocktails, snake venoms are unique in that the expression of each toxin can be quantified and mapped to a distinct genomic locus and traced for millions of years. Using a phylogenetic generalized linear mixed model, we analysed expression data of toxin genes from 52 snake species spanning the three venomous snake families, and estimated phylogenetic covariance, which acts as a measure of evolutionary constraint. We find that evolution of toxin combinations is not constrained. However, while all combinations are in principle possible, the actual dimensionality of phylomorphic space is low, with envenomation strategies focused around only four major toxins: metalloproteases, three-finger toxins, serine proteases, and phospholipases A2. While most extant snakes prioritize either a single or a combination of major toxins, they are repeatedly recruited and lost. We find that over macroevolutionary timescales the venom phenotypes were not shaped by phylogenetic constraints, which include important microevolutionary constraints such as epistasis and pleiotropy, but more likely by ecological filtering that permits a few optimal solutions. As a result, phenotypic optima were repeatedly attained by distantly related species. These results indicate that venoms evolve by selection on biochemistry of prey envenomation, which permit diversity though parallelism and impose strong limits, since only a few of the theoretically possible strategies seem to work well and are observed in extant snakes.


Introduction
Single genes underlying major traits are the exception rather than the rule, and the dissection 48 of polygenic trait variation has been at the forefront of biological research (1)(2)(3). Much of the 49 complexity resulting from interactions between genes is mediated through their expression, 50 which plays a central role in determining phenotypic variation between individuals and 51 populations (4)(5)(6)(7)(8). In particular, levels of gene expression account for substantial sources of 52 variation in natural populations, acting as potential targets of natural selection (8-10).

53
Although population-level differences in expression may contribute to the onset of local 54 adaptation and perhaps even eventual adaptive divergence (6,11,12), how changes in gene 55 expression levels lead to evolution of complex traits over the course of millions of years 56 remains largely unknown.

58
Interactions between genes and their effect in channelling of adaptive responses have been 59 the focus of the field of quantitative genetics. How evolution results from the combined effects 60 of the adaptive landscape, and the pattern of genetic variances and covariance among genes 61 (the G matrix), is one of the key questions in this field (13,14). The covariance between genes 62 plays a vital role in shaping complex traits by determining the evolutionary trajectory through 63 natural selection (15), and the occurrence of parallelism (16). While most quantitative genetics 64 studies deal with populations, their conclusions can translate to macroevolutionary processes 65 as well. For example, estimates of divergence between populations show that the direction of 66 greatest phenotypic divergence can be predicted by the multivariate direction of greatest 67 additive genetic variance within populations (17). Unfortunately, the G matrix cannot be 68 extrapolated across macroevolutionary timescales, as it itself evolves (18). Fortunately, it is 69 possible to compute a phylogenetic covariance matrix for multivariate traits, which can serve 70 as a useful analogy to the G matrix, but over much larger timescales, and incorporating a 71 broader range of constraints (19,20). We can then examine whether the structure of the 72 phylogenetic covariance matrix corresponds to evolutionary trajectories of complex traits.

74 75
Here we use the analogy between the G matrix and the phylogenetic covariance matrix to 76 understand how gene expression evolves in a complex trait, namely snake venom. Being 77 composed of proteinaceous cocktails, snake venoms are unique in that the expression of each 78 toxin type can be quantified and traced to a distinct genomic locus (21)(22)(23). Variations in gene 79 expression alter the abundance of proteins in the venom, thereby influencing venom efficacy 80 (24)(25)(26). Thus, toxin expression levels constitute the polygenic phenotype that is the venom, 81 allowing us to examine how selection affects gene expression over tens of millions of years.

82
To examine the features of complex trait evolution at the level of gene expression, we 83 estimated phylogenetic covariance of 10 toxins using data from 52 snake species covering 84 the three venomous snake families (Elapidae, Viperidae, and Colubridae) and asked the 85 extent to which it matched observed patterns of evolutionary change across taxa.

87 88
Although we find that extant snake venoms occupy a limited area of phenotypic space, largely   Table 1). For inclusion, each study had to provide quantitative data on toxin 102 component abundance and had species for which phylogenetic data were available. We 103 restricted our dataset to include components that are found in at least 50% of snakes to focus 104 on generally important toxins, and because sample sizes for the other components would be 105 too low for accurate and phylogenetically unbiased inference. Overall 10 out of 27 toxins we 106 retained. For comparative analyses, we used a published time-calibrated phylogeny of 107 squamates, which estimated the most recent common ancestor (root) of the three snake 108 families to about 60 million years ago (27).

111
By limiting the range of responses to natural selection, the covariances between genes reflects 112 constraints that shape a phenotype. The phylogenetic covariance matrix (PCOV) accounts for 113 the effect of phylogeny on the interrelationships between genes coding for the snake venom

117
PGLMM was devised in the early 90s as a method to infer evolutionary constraints of 118 characters using only phylogeny and measures of phenotypes (19). As an extension of 119 maximum likelihood based techniques widely used in quantitative genetics, PGLMM was 120 notable for its versatility as a comparative method (28,29). We use a modern rendition of the 121 PGLMM devised by Hadfield and Nakagawa, which was optimized for faster and better 122 performance (29,30). The PGLMM estimated changes in gene expression against a 123 presumed change in diet with the effect of phylogeny being modelled as a random effect. Life 124 history characteristics and diets for snakes are difficult to obtain, particularly in a consistent 125 manner. However, a snake's potential diet is largely affected by its body size, with smaller 126 individuals consuming smaller prey, while larger individuals tend to prefer larger prey (31). Therefore, we used adult snake length as a proxy for diet. The mean effective sample size for 128 all parameters was greater than 11,000 (Supplementary figure 4). The diagnostics revealed 129 suitable convergence of the chains with negligible autocorrelation in the MCMC 130 (Supplementary Fig. 1-3). Significant values in the PCOV matrix denote the presence of 131 phylogenetic constraint, while non-significant values denote its absence. We observed a lack 132 of significant values in the PCOV (Fig. 1) for all the venom components that we modelled. In

161
It is important to note that PLA2s in elapids (group I) and vipers (group II) are produced by 162 different loci and have apparently evolved independently (33,34). In order to account for any   individual components would reflect constraints on evolutionary change and the total phenotypic space attainable by selection (38). Thus, traits that are constituted by genes under 231 high constraint would not be able to diversify as freely as traits with no constraint. Genetic 232 constraints also determine convergence and parallel evolution, where high constraint reduces 233 the likelihood of genes contributing to different convergent regimes (16). Yet, for snake venom 234 genes we see no such constraints in gene expression, suggesting that all toxin combinations, 235 in principle, are possible (Fig. 1).

237
While the lack of constraint between components implies that venom has the potential to 238 diversify freely and fully fill the possible phenotype space, this is far from what we observe.

239
Rather, the total phenotypic space has surprisingly low dimensionality, with two principal 240 components explaining 73% of the variance. Venoms form three distinct clusters around the 241 major toxin components in the phylomorphospace, indicating the possible presence of three 242 distinct adaptive optima (Fig. 2). Similar toxin-specific strategies have been observed between 243 populations of snakes, but we show that the trend extends phylogenetically to different species 244 as well as different families (39,40). While individual venom components do exhibit significant 245 phylogenetic inertia (Fig. 1), the phylomorphospace clusters often include unrelated taxa,  (41,45,46). This relaxed selective constraint could allow the duplicated genes to diversify 256 freely, as long as one of the copies performs the essential function, and the presence or 257 absence of another copy does not affect fitness. Therefore, a system that comprises of many 258 duplicated gene families would also likely have the ability to diversify freely. Snake venom fits this characteristic since it consists of gene families that have undergone varying degrees of 260 duplications throughout their history (47). We hypothesize that the lack of constraint observed 261 between expression levels of genes encoding for snake venom could be due to the fact that 262 snake venom comprises of duplicated genes.

264
One of the most prevalent theories about the origins of venom composition suggests that they 265 originated after ancestral physiological genes underwent duplication and neofunctionalization 266 (48). Since venom phenotypes need to be flexible and to adapt quickly, duplicated genes 267 make ideal toxin candidates as they are under lower selective constraints (49)(50)(51). In addition 268 to sequence-level changes, changes in gene expression also contribute to microevolution in 269 snake venom (52). To get a complete picture of the evolution of the snake venom phenotype,

282
Temporal patterns in venom evolution 283 Ancestral snake venom composition has received considerable attention, but until now the 284 analyses have been qualitative in nature (39). While the confidence intervals for ancestral 285 state reconstruction (ASR) are large ( Supplementary Fig. 14-34) owing to the remarkable evolutionary lability of venom, we can nonetheless make a number of observations about the 287 course of evolution of major components. Among the major components, the ancestral venom 288 most likely contained only appreciable amount of SVMP (Fig. 3). This finding is consistent with 289 previous estimates of a likely recruitment of SVMP into the venom at the split of vipers and 290 colubrids (~62 million years ago) (24,54). Furthermore, the SVMP-focused strategy is the only 291 convergent selective regime identified by the SURFACE analysis in all three families (Fig. 3),

292
suggesting that the machinery to produce this toxin exists in all of them. While we could not 293 detect PLA2, TFTx, and SVSP with confidence in the most recent common ancestor, they 294 could have been present at lower levels in the ancestral venom, or as ancestral precursor 295 molecules (33,34,55). This is especially likely given that all three families have some level of 296 each of the major toxin classes (Fig. 3).

297
Being present in the ancestral venom, SVMP continued to be used as a major toxin by viperids 298 and is still the dominant toxin in some genera (Echis and Bitis), as well as some colubrids.

299
However, other toxins were recruited (or increased in quantity) later in venomous snake

330
The extent to which traits are constrained by their history, vs reaching their fitness optima has 331 been a major debate in evolutionary biology. Numerous studies have relied on phylogenetic 332 regression to estimate morphological covariation between traits while accounting for 333 phylogenetic non-independence (20,(57)(58)(59)(60). In our approach we analyse more than one 334 response variable simultaneously and incorporate effects on trait relationships that arise 335 through shared ancestry (61). We show that the structure of the gene expression PCOV can 336 give insights into how traits evolve, by providing a conceptual bridge between micro and 337 macroevolutionary forces. By showing that the phenotypic space is inherently unconstrained, we are able to highlight the existence of fitness optima, and explain the existence of 339 widespread parallelism seen in snake venoms. These findings show that in the long-term

371
If the length was reported as a range, midpoint value was recorded in the dataset.   (36,67). SURFACE is unique because unlike previous methods that utilize 405 Hansen models, the placements of regime shifts is guided by trait data as opposed to some a 406 priori hypothesis regarding the location of convergence (36). The SURFACE method is divided 407 into two phases. The forwards phase adds successive regimes to a basic Hansen model using 408 input from continuous trait measurements, which in our study were normalized measurements 409 of gene expression for the four major toxins. The performance of each successive model was shifts, and k' distinct regimes, in addition to the extent of convergence which is defined as the 417 difference of these terms (Δk), c is used to represent the the shifts towards different 418 convergent regimes in multiple lineages (36). We used all standard parameters as mentioned 419 in the SURFACE vignette (37). To obtain a null distribution we ran 500 iterations of the in-built    Although ancestral states were reconstructed at each node, for clarity only the ones where substantial changes in toxin levels took place are labelled. The overall trend is that starting from an undifferentiated ancestor, snakes have increasingly focused on specific toxin families, occasionally investing into new toxin categories for their arsenals (e.g., PLA2s and SVSPs).
The increased concentration of specific venom components, relative to the ancestors, has most likely happened by increases in copy number of the specific gene families.