Selection on non-antigenic gene segments of seasonal influenza A virus and its impact on adaptive evolution

Abstract Most studies on seasonal influenza A/H3N2 virus adaptation have focused on the main antigenic gene, hemagglutinin. However, there is increasing evidence that the genome-wide genetic background of novel antigenic variants can influence these variants’ emergence probabilities and impact their patterns of dominance in the population. This suggests that non-antigenic genes may be important in shaping the viral evolutionary dynamics. To better understand the role of selection on non-antigenic genes in the adaptive evolution of seasonal influenza viruses, we have developed a simple population genetic model that considers a virus with one antigenic and one non-antigenic gene segment. By simulating this model under different regimes of selection and reassortment, we find that the empirical patterns of lineage turnover for the antigenic and non-antigenic gene segments are best captured when there is both limited viral coinfection and selection operating on both gene segments. In contrast, under a scenario of only neutral evolution in the non-antigenic gene segment, we see persistence of multiple lineages for long periods of time in that segment, which is not compatible with observed molecular evolutionary patterns. Further, we find that reassortment, occurring in coinfected individuals, can increase the speed of viral adaptive evolution by primarily reducing selective interference and genetic linkage effects. Together, these findings suggest that, for influenza, with six internal or non-antigenic gene segments, the evolutionary dynamics of novel antigenic variants are likely to be influenced by the genome-wide genetic background as a result of linked selection among both beneficial and deleterious mutations.

Seasonal influenza is a major infectious disease that causes 3 to 5 million worldwide 53 cases of severe illness and 250,000 to 500,000 deaths each year in humans (1). Of the 54 currently circulating flu viruses, influenza A subtype H3N2 is the predominant virus 55 contributing to these morbidity and mortality estimates. This virus is known to rapidly 56 evolve, particularly antigenically (2), enabling it to perpetually evade herd immunity 57 and re-infect individuals in the population. Consequently, there has been great interest 58 in understanding how this virus evolves antigenically, especially with respect to its 59 main antigenic gene, haemagglutinin (HA). In particular, these investigations have 60 focused on identifying key sites involved in viral antigenicity (3-6), which has 61 provided compelling evidence of immune-mediated selection acting upon HA. 62 63 However, the limited standing genetic diversity observed for HA has been difficult to 64 reconcile based on recurrent positive selection alone, since the high virus mutation 65 rate and the presence of strong diversifying selection predicts a large antigenic 66 repertoire over time (7). The observed low-level genetic diversity of the HA is 67 reflected in its spindly, ladder-like phylogeny, which indicates that only a single viral 68 lineage persists over time. Genetic variants belonging to this persisting lineage have 69 been characterized antigenically, indicating that every two to eight years a major 70 antigenic change occurs that necessitates the updating of components of the seasonal 71 influenza vaccine (6,8,9). Phylodynamic models have proven to be invaluable to 72 understanding how host immunity and viral evolution can lead to these interesting 73 phenomena of a spindly phylogeny and a single major circulating antigenic variant 74 patterns of influenza A/H3N2 virus genome. Furthermore, we find that the rate of 126 adaptive evolution of the virus increases under this evolutionary regime, which is 127 predominantly a result of reassortment reducing interference effects contributed by 128 the non-antigenic gene segment. Two independent chains of 200 million steps were executed for each of the eight gene 139 segments to ensure that adequate mixing and stationarity had been achieved. The 140 posterior tree distribution for each segment was further examined with PACT (32), 141 which infers the times to the most recent common ancestor (TMRCAs) across the 142 entire evolutionary history at regular intervals. To quantify and visualize patterns of 143 genetic diversity in each segment, mean TMRCAs over time were plotted using the R 144 package ggplot2 (33) and genealogical trees were plotted with ggtree (34). 145

B) Phylodynamic model of infection and coinfection 147
To explore the evolutionary processes underlying the empirical patterns of TMRCA 148 observed for the influenza A/H3N2 virus genome, we formulated a simple population 149 genetic model with a constant number of N = 1000 infected individuals. In the model, individuals were either infected with a single virus (I s ) or coinfected with two viruses 151 (I co ). We did not consider coinfection with more than two viruses.   Figure 1B). Coinfection events occurred from singly infected 172 individuals at a per capita rate of β = 0.0125 per day. This corresponds to a 173 coinfection level of approximately 5% of the total infected population at equilibrium 174 (see Text S1). Ascertaining an empirical coinfection rate for influenza A/H3N2 viruses in general, or at the within-subtype level, is very difficult, since the low 176 circulating viral diversity is likely to limit our ability to distinguish between 177 independent infecting viral strains. Nevertheless, the number of influenza coinfections 178 can be estimated when viral strains involved belong to either different subtypes or 179 types (e.g. A/H3N2 and A/H1N1 or influenza A and B viruses, respectively) (36, 37). 180 These types of coinfection have been known to occur between 1-2% in sampled 181 influenza A viral infections (36-38). We set the level of coinfection in our model 182 slightly higher than these empirical estimates, at ~5%, to reflect that these empirical 183 estimates between different subtypes or types are likely underestimates. 184 being equally likely to land on the antigenic or the non-antigenic gene segment. We 206 allow the distribution of mutational fitness effects to differ between the two gene 207 segments. Specifically, we assume that 30% of mutations are beneficial and 70% of 208 mutations are deleterious on the antigenic gene segment. On the non-antigenic gene 209 segment, we assume that 5% of mutations are beneficial, 30% of mutations are 210 deleterious, and the remaining 65% of mutations are neutral. A higher proportion of 211 beneficial mutations are assumed in the antigenic gene segment to capture the 212 selective advantage that antigenic mutations are likely to have through evasion of herd 213 immunity. The non-antigenic gene segment is assumed to have a greater proportion of 214 neutral mutations to reflect the observation that internal genes undergo greater neutral 215 evolution than external genes (25). We assume that the fitness effects for beneficial 216 mutations are exponentially distributed with mean 0.03 and that the fitness effects for 217 deleterious mutations are exponentially distributed with mean 0.09. We do not 218 consider lethal mutations. Importantly, the distributions of mutational fitness effects 219 on the antigenic and non-antigenic gene segment capture the salient features of 220 recently determined mutational fitness effects for seasonal influenza A virus (39) (see 221 Figure S1). 222 223 Viral fitness is calculated by multiplying fitness values at each site across the genome. 224 Multinomial sampling based on viral fitness is applied at each transmission event to 225 determine which individual will infect (or coinfect) next. For coinfected individuals, 226 we initially determine which virus is transmitted from the two infecting parental viral 227 strains (see Figure 1A) and compute the viral fitness accordingly. 228 229

Tracking lineages over time 230
The model is implemented in Java using a Gillespie tau-leap algorithm (40)  neutrally evolving antigenic gene segment. As a consequence, the non-antigenic gene 308 segment is able to explore more genetic backgrounds, which leads to an increase in its 309 genetic diversity. Expectedly, the rate of adaptive evolution is unaffected by changes 310 in the rate of coinfection ( Figure 4B Figure 5C). Strikingly, the rate of virus adaptation is significantly greater in the presence of coinfection than when it is absent ( Figure 5A). 329 This phenomenon appears to be primarily driven by the non-antigenic gene segment, 330 which also experiences a notably higher rate of adaptive evolution when coinfection 331 occurs in the population ( Figure 5C). In contrast, although coinfection increases the 332 fitness variation of both gene segments ( Figure 5B and C), the difference in the rate of 333 adaptive evolution of the antigenic gene segment in the absence versus in the presence 334 of coinfection appears to be slight ( Figure 5B). 335

397
When comparing TMRCA patterns between the antigenic gene segment and the non-398 antigenic gene segment, it is notable that when the non-antigenic gene segment 399 evolves neutrally, the common ancestor of the non-antigenic gene segment is 400 consistently older than the antigenic gene segment ( Figure 7F). However, when selection affects both gene segments, we note a closer correspondence with the 402 empirical TMRCA dynamics ( Figure 7C, compared to Figure 7I)). Specifically, in 403 addition to the antigenic gene segment undergoing more frequent fluctuations in the 404 TMRCA over time compared to the non-antigenic gene segment, the TMRCA of both 405 gene segments can occasionally coincide, which likely indicates a shared common 406 ancestor, perhaps as a result of a genome-wide selective sweep. We further examined 407 this observation by comparing the differences in TMRCA between the gene segments 408 ( Figure S4). The higher density around zero years of difference in the TMRCA 409 suggests that the likelihood of sharing a common ancestor is greater when selective 410 effects occur on both gene segments ( Figure S4). 411

Sensitivity of results to model parameters 413 a) Infected population size 414
While it is well established that human influenza A/H3N2 virus has a strong seasonal 415 transmission pattern in some populations, we decided to model a constant infected 416 population. This decision was motivated largely by undertaking a simple and standard 417 approach to examine the patterns of viral diversity due to selection, mutation, and 418 reassortment alone. However, given that regions with low-level, constant disease 419 transmission (e.g. the tropics) frequently seed seasonal outbreaks in temperate locales 420 (21,(41)(42)(43), the effective population size of global influenza A/H3N2 viruses is 421 expected to be relatively small and constant over time (21,42). Consequently, the 422 assumption of a constant infected population size is not unreasonable since regions 423 with year-round influenza infection are expected to ultimately shape the overall 424 evolutionary dynamics of the virus. We tested the effects of population size on the 425 evolutionary behavior of the model ( Figure S5). Specifically, we ran 100 simulations 426 at each of three population sizes (N=1000, N = 5000, and N=10000), under the model 427 parameterization with both gene segments experiencing positive and negative 428 selection. Notably, similar TMRCA patterns were observed regardless of population 429 size, such that the antigenic gene segment typically had a younger TMRCA compared 430 to the non-antigenic gene segment ( Figure S5). However, as we increase the 431 population size, the TMRCA of both gene segments increases, indicating greater 432 lineage persistence in the population. This corroborates a standard expectation from 433 coalescent theory: smaller populations have comparatively more recent common 434 ancestors than larger populations due to stronger effects of genetic drift. Thus, when 435 the infected population size is fixed at N=1000, deterministic and stochastic forces 436 will shape the population's genetic diversity, both of which have been implicated in 437 the evolutionary dynamics of seasonal influenza A viruses (21). At larger population 438 sizes, we can, however, recover lower TMRCAs when we increase the mean effect 439 size of mutations (results not shown). 440

b) Mutation and coinfection rates 442
As it is difficult to ascertain the per-genome, per transmission, mutation rate for a 443 two-segment virus, we varied the per-genome per-transmission mutation rate U 444 between 0.05 and 0.2 ( Figure S6). Overall, these simulations yielded qualitatively 445 similar results: the antigenic gene segment had a younger TMRCA than the non-446 antigenic gene segment. Interestingly, at higher mutation rates, mean TMRCAs for 447 the non-antigenic gene segment were appreciably smaller and mean TMRCAs for the 448 antigenic gene segment were slightly smaller. Further, the difference in the mean 449 TMRCAs for the antigenic and non-antigenic gene segment was smaller at higher mutations, most likely reflecting a concomitant increase in interference effects 451 between the gene segments. 452

453
We also looked at the sensitivity of the coinfection rate by varying β from 0.0025 to 454 0.25 per day ( Figure S7). When the frequency of the coinfected individuals was set at 455 1% in the total infected population (i.e. β = 0.0025 per day) the TMRCA of the two 456 gene segments was found to be very similar ( Figure  Given that only two segments are modeled in this study, it would be interesting to see 479 if these results still hold when additional non-antigenic gene segments are considered. 480 One prediction is that since linkage effects are expected to increase with additional 481 gene segments, we are more likely to see the cumulative effect of selection acting on 482 the non-antigenic segments on the antigenic gene segment. Furthermore, the fitness 483 variation of each gene segment (and the overall virus) is also likely to increase, thus 484 enabling selection to be more efficacious. Consequently, in light of this hypothesis, 485 our finding that non-antigenic gene segment has minimal impact on the antigenic gene 486 segment is likely to be overly conservative. 487

488
Since the coinfection level assumed in our model is ~5%, around 2.5% of the infected 489 population is expected to carry a first-generation reassortant virus. Interestingly, this 490 low level of reassortment is consistent with a recently estimated frequency of 491 reassortment events observed among sampled virus genomes over time, at around 492 3.35% (25). Further evidence that the intrasubtypic reassortment is restricted at the 493 between-host level comes from a recent finding that even at the within-host scale the 494 effective reassortment rate is very limited (45). This indicates that the difference in 495 the TMRCA across the seasonal influenza A virus genome is likely to arise from a 496 low-level of reassortment in the virus population. Importantly, this has strong 497 implications for the adaptive evolution of the virus, since it suggests that selective 498 interference among gene segments has the potential to influence the fate of beneficial 499 mutations in the genome. 500 501 Although reassortment is notoriously associated with pandemic influenza (46), there 502 are several historical events in both seasonal influenza A/H3N2 and in seasonal 503 influenza A/H1N1 where intrasubtypic reassortment has been implicated in antigenic 504 cluster transitions (16,22,47). Furthermore, given that these instances are often 505 associated with greater disease severity and incidence, akin to pandemic influenza, it 506 also indicates that intrasubtypic reassortment can facilitate significant improvements 507 in viral fitness. Consequently, this suggests that reassortment predominantly increases 508 the rate of virus adaptive evolution by reducing selective interference effects across 509 the genome. 510

511
We did not explicitly consider epistasis in our simulation model. There is evidence 512 that epistatic interactions both within and between gene segments can drive the 513 adaptive evolution of seasonal influenza A viruses. For example, T-cell immune 514 escape mutations in NP have been enabled by stability-mediated epistasis (19,20) and 515 functional mismatches between the activities of the HA and the NA are known to 516 decrease viral fitness considerably (48). However, to effectively model epistasis, a 517 detailed knowledge about the fitness landscape of the virus genome, which is 518 currently lacking, is necessary. Elucidating the epistatic interactions in influenza A 519 viruses should be a focus of future work, since it could help explain the role that 520 intrasubtypic reassortment plays in contributing to the adaptive evolution of seasonal 521 influenza (49), and more broadly, it could help understand the epidemic (and even 522 pandemic) potential of reassortant viruses. 523 524 Our findings that selection is likely to act upon both antigenic and non-antigenic gene 525 segments and that reassortment can influence the rate of virus adaptive evolution have 526 important implications for predicting future influenza strains. In particular, our study 527 indicates that viral mutations are subjected to linkage effects within and to a 528 somewhat lesser extent between gene segments, consistent with the conclusions of 529 (37). As a consequence, we anticipate better forecasting can be achieved if the virus 530 genetic background is considered as a whole, and is not just restricted to HA. This 531 will be largely dependent on obtaining a more comprehensive understanding of the 532 phenotypic variation in other gene segments, which we recommend should be a 533 priority for future research. 534