Global estimates of the fitness advantage of SARS-CoV-2 variant Omicron

Abstract New variants of SARS-CoV-2 show remarkable heterogeneity in their relative fitness over both time and space. In this paper we extend the tools available for estimating the selection strength for new SARS-CoV-2 variants to a hierarchical, mixed-effects, renewal equation model. This formulation allows us to estimate selection effects at the global level while incorporating both measured and unmeasured heterogeneity among countries. Applying this model to the spread of Omicron in forty countries, we find evidence for very strong but very heterogeneous selection effects. To test whether this heterogeneity is explained by differences in the immune landscape, we considered several measures of vaccination rates and recent population-level infection as covariates, finding moderately strong, statistically significant effects. We also found a significant positive correlation between the selection advantage of Delta and Omicron at the country level, suggesting that other region-specific explanatory variables of fitness differences do exist. Our method is implemented in the Stan programming language, can be run on standard consumer-grade computing resources, and will be straightforward to apply to future variants.


Introduction
Since its emergence, the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continuously generated genetic variants that increase its transmission, e.g. D614G and variants of concerns such as Alpha and Delta, causing waves of coronavirus disease 2019  surges across the globe (Korber et al., 2020;Davies et al., 2021;Volz et al., 2021;Dhar et al., 2021). Omicron, first detected in November 2021 in South Africa and Botswana, quickly spread globally, displaced Delta, and became the dominant variant in most countries. Initial studies on data from South Africa suggested that Omicron has a substantially higher transmission fitness than Delta, especially in individuals recovered from COVID-19 (Pulliam et al., 2022;Viana et al., 2022;Pearson et al., 2021). Experiments measuring plasma neutralization activity from vaccinated individuals demonstrated lower neutralizing antibody activities against Omicron even in fully vaccinated individuals (Schmidt et al., 2022;Collie et al., 2022;Cele et al., 2021;Lu et al., 2021;Planas et al., 2021); however, a booster shot increased neutralizing antibody activity substantially (Schmidt et al., 2022;Nemet et al., 2022;Planas et al., 2021). This suggests that Omicron is able to evade immunity induced in vaccinated individuals, providing a likely explanation for the rapid surge in Omicron in countries where a large proportion of individuals are fully vaccinated. Previous work has also quantified the reproductive number or relative growth advantage of Omicron in particular countries or settings, showing it to exceed that of previous emerging variants (Pearson et al., 2021;Weil et al., 2022) and to be greater among vaccinated individuals (Paton et al., 2022).
Although it is clear that Omicron has higher transmission fitness than Delta, we still lack a global perspective on the magnitude of this advantage and its heterogeneity among countries. Analyses of other SARS-CoV-2 variants have shown large differences in their relative fitness among different countries and US states (van Dorp et al., 2021;Figgins and Bedford, 2021). It is expected that the transmission advantage of Omicron is also heterogeneous across countries, due to the difference in circulating variants in each country, the different immunological landscapes arising from natural infection and vaccination, the different nonpharmaceutical intervention strategies adopted in different countries, and potentially many other factors. Unknown, however, is the extent to which among-country heterogeneity is actually explained by any identified factors.
To quantify the transmission advantage of Omicron across many countries and explore possible causes of its heterogeneity, we analyzed case count and variant frequency data with a renewal model (as Figgins and Bedford, 2021;Park et al., 2022a) that is hierarchical across countries (as Paton et al., 2022;van Dorp et al., 2021) and includes country-level covariates (as Paton et al., 2022), while allowing for a shorter generation time of Omicron (which could otherwise bias the results, cf. Pearson et al., 2021). We found that across forty countries, a portion of the heterogeneity in Omicron's fitness could be explained by immunological-related covariates. Additionally, we found some correlation across countries between the selective advantages of Omicron and Delta, suggesting that fixed country-level effects may exist. More generally, we demonstrate the efficacy of a hierarchical model structure in quantifying the overall variant fitness.

Data
Variant counts per country per day were obtained from the Global Initiative on Sharing Avian Influenza Data (GISAID) metadata (Elbe and Buckland-Merrett, 2017; Global Initiative on Sharing All Influenza Data, 2008), downloaded on 11 March 2022. The known problematic samples were removed according to the Nextstrain 'ncov' quality control pipeline (Hadfield et al., 2018;Nextstrain, 2022). For each analysis we chose one focal variant-Delta or Omicron, as assigned in the GISAID metadata-and grouped all other variants into an 'other' category.
Many countries showed a pattern in which a few early cases of a focal variant were followed by many days without it, before an eventual strong rise in variant frequency. Early stochastic dynamics like this are expected, but they lead to poor fits of our deterministic model. (In particular, the logistic shape of the variant's frequency becomes too shallow, leading to underestimates of the strength of selection.) To avoid this left tail in the data, we began the time series for each focal variant in each country on the first day at which three criteria were met: a variant frequency on that day of at least 10 per cent, at least 10 cumulative variant cases, and at least 0.05 per cent cumulative of the eventual number of variant cases. For Omicron, start dates were all in November and December 2021. For Delta, start dates ranged from January to July 2021. For each country, we ended the time series 9 weeks after it began for Omicron and 15 weeks after it began for Delta; the Delta time series was longer because its rise was slower than Omicron's. (The model is not sensitive to the end date, provided the focal variant has not yet been replaced by another and the whole population has not yet been infected.) We further required that a country has at least 21 days with any sequence data and at least 1,000 sequences of the focal variant and of any other variants. This yielded forty countries for Omicron and fifty-six countries for Delta.
To use the prevalence of Delta as a covariate for Omicron's relative fitness, we computed for each country the proportion of variant cases that were Delta in the week prior to Omicron's arrival. We also considered the vaccination status and two proxies for country-level immunological status: excess deaths and number of cases occurring before the first day of the study period for each country. Vaccination data were obtained from Our World in Data (OWID) (Ritchie et al., 2020), downloaded on 6 January 2022. The percentage of fully vaccinated people ('fullvax' covariate) was defined as the number of fully vaccinated people from OWD divided by the 2020 population size multiplied by 100, and the percentage of boosted people ('boostvax' covariate) was defined as the number of booster shots administered divided by the population size in 2020 multiplied by 100. The excess deaths data were also obtained from OWD (Ritchie et al., 2020) and were normalized by the population size in 2020 multiplied by 100 ('death' covariate). The 'cases' covariate was computed as the sum of reported cases in the period before the first day of the study period for each country. For the immunological status proxy variables, we consider both a short (2 weeks) and long period (25 weeks) preceding the first day of the study period. Distributions of covariates across countries are shown in Fig. S1. Previously (van Dorp et al., 2021), we used a hierarchical logistic regression model for the increase in frequency of a new variant over time. As we noted there, this model may be sensitive to spatial and temporal variation in the effective reproduction number because the region-specific effects represent the difference in growth rates of the two competing variants. This means that when the incidence is growing (effective reproduction number > 1), we would estimate a larger selective advantage than in an otherwise identical region where the incidence is decreasing ( < 1). The effective reproduction number within a region can also vary significantly over time, for instance due to implemented non-pharmaceutical interventions in anticipation of a recently discovered variant. To account for these confounding effects, we make use of regional case count data C t to estimate the time-varying effective reproduction number wt = of the wild-type variant. We assume that the reproduction number of the mutant (focal variant) is given by mt = (1 + ) . The incidence of wild-type ( wt ) and mutant ( mt ) variants is governed by the discrete-time renewal equation

Model
Here wt and mt denote the (discrete) probability mass functions of the generation time T G of wild-type and mutant variants, respectively. By allowing for wt ≠ mt , we take into account that the generation time can differ between variants. In Table 1 we give an overview of the variables and parameters of the model.
The reproduction number of the wild type (R t ) is modeled using a discrete-time geometric Gaussian process where = ∑ ′ =1 ′ is the sum of t independent standard-normal increments 1 , … , . The diffusion coefficient determines the randomness of the process, and the term − 1 2 2 makes sure that R t has constant expectation. Notice that in this notation R 0 is not the basic reproduction number, but just the effective reproduction number at time t = 0.
To link incidence y t with observed cases C t , we have to compute the expected number of observed cases at each observation time t. Let denote the probability that a person infected t days ago with variant ∈ {wt, mt} is counted as positive today. The number of expected cases at time t is then equal to Note that the reporting delay applies to case counts, not sequences; sequence data may be deposited weeks after sampling, but they are dated with the sampling day. As we are primarily interested in the dynamics of the reproduction number and not in the absolute incidence, we will assume that the total reporting probability is equal to one (∑ ∞ =1 = 1). Notice that we ignore potential differences in the reporting rate between the two variants. Furthermore, we make the simplifying assumption that the probability densities for reporting are identical to the probability densities for the generation time ( = , with ∈ {wt, mt}). To model the expected mutant frequency p t , we assume that sequences are taken from individuals that are reported positive at time t. Hence, we simply take We use a beta-binomial distribution for the likelihood of the observed number of wild-type and mutant sequences. This choice allows for over-dispersion in the variant count data due to biased sampling (e.g. over-reporting of Omicron as it was emerging, Scott et al., 2021), with the caveat it does not remove systematic biases. Let wt and mt denote the number of collected sequences at time t for wild-type and mutant variants, respectively. We then have where determines the dispersion of the distribution. In the limit → ∞ this converges to a binomial distribution. We choose a discretized Gamma distribution as generation time distribution with shape parameter = 4 and mean equal to 6, 4, and 3 days for Alpha, Delta, and Omicron, respectively (Hart et al., 2022a;Hart et al., 2022b;Abbott et al., 2022;Park et al., 2022b). The rate parameter is then given by = / . The discretization is as follows: where W is a cutoff value equal to 15 days. The values K t are then normalized such that they sum to 1.
To compute the incidence at times = 1, … , , we require the incidence at times − + 1, − + 2, … , 0. For this, we assume that prior to time t = 1 the effective reproduction number was constant and equal to R v 0 . We can then compute the incidence at these prior time points by making the ansatz − = 0 exp(− ), where r v is the exponential growth rate of variant v. Plugging this into the renewal equation, we get and hence 0 [exp(− )] = 1, which is known as the Lotka-Euler equation. We then ignore the fact that we use a discrete-time model and use the moment-generating function of the continuous Gamma distribution, and we can find r v by solving This leads to = ( √ 0 − 1). This calculation is done for both the wild type and mutant, using variant-specific parameters and . The initial incidence for wild type and mutant is parameterized as follows. We introduce parameters y 0 and b 0 , denoting the total initial incidence, and the fraction due to the mutant, respectively. Then, we set mt 0 = 0 0 and wt 0 = (1 − 0 ) 0 . We use a phenomenological log-normal observation model to fit the predicted case counts ( = wt + mt ) to the observed case counts ∼ Lognormal(log( ), ). To remove weekend patterns in under-reporting in the data, we aggregate the daily observations (case counts and number of sequences collected for both variants) to the week level. The same aggregation is done for the model predictions of these observations, so that we can compute the likelihood of the aggregated observations. For the aggregation of the predicted frequencies, we used a weighted average, thereby accounting for the fact that case counts can vary markedly within a week. More precisely, the aggregated predicted mutant frequency agr at week w is equal to where agr is the aggregated predicted case count at week w.
As we wish to estimate effects of covariates on the selective advantage in different regions, we use a Bayesian mixed-effects model (or hierarchical model) in which region-specific parameters are determined by unobserved random effects and possibly fixed effects dependent on the covariates. The selective advantage, s, can thus take a different value for each region. The z-scores of the covariates described above are collected in the design matrix M, and we write w s for the vector of weights of these covariates. We assume that 1 + has a log-normal distribution with location + and scale s . The dispersion parameter and initial incidence similarly vary among countries, with hyperparameters , 0 , and 0 , but for these parameters only random effects are modeled.
The model is summarized in Tables 1 and 2. We focus our results on the selective advantage, s, of the focal variant in each country and on the mean of its distribution across countries. We implemented the model in Stan (Stan Development Team, 2021).

Results and Discussion
Our model outputs case counts, variant frequencies, and epidemic growth rates over time shown in Fig. 1 for Omicron in selected countries and Fig. S2 and Fig. S3 for Omicron and Delta in all countries. Overall, the model fit the data very well, with, for Omicron, a mean absolute error in the variant frequency of only 2.5 percentage points for data aggregated by week, or 5.3 percentage points for daily data (Fig. S4). Fits for Delta were only slightly worse, with mean absolute errors of 5.8 and 8.6 percentage points by week and day, respectively. Aggregating up to the week scale addressed nearly all the structural anomalies in the epidemiological data, notably non-uniform reporting by day of week, without altering the total number of cases. Even for remaining anomalous data (e.g. the spike in UK case counts due to sudden reporting of past reinfections, Fig. 1), the model predicted a reasonably smooth estimate of the trend.
The selection coefficient inferred for a focal variant is influenced by its generation time (serial interval) relative to the background variants against which it is competing. In particular, other methods that assume the same generation time for the focal and other variants (e.g. van Dorp et al., 2021;Chen et al., 2021) will underestimate the strength of selection when the focal variant has a shorter generation time, as did both Delta and Omicron (Park et al., 2022a). For example, our main results use a mean generation time for Omicron of 3 days, yielding a point estimate of 0.69 for the overall selective advantage of Omicron. Decreasing the mean generation time to 2 days, consistent with some early studies (Abbott et al., 2022), decreases this value to 0.64 (Fig. S5). Note, however, that this is still far larger than zero, arguing that shorter generation times are not solely responsible for the increased fitness of Omicron. Conversely, assuming that the Omicron generation time is the same as the background variants (4 days, because most were Delta) increases the value to 0.84, substantially overstating the selective advantage.

Describing heterogeneity in selective advantage
Comparing the model-fitted estimates of Omicron's selective advantage, s, among countries shows wide variation in their values ( Fig. 2A). This is consistent with previous work for other variants, which also found large heterogeneity among countries and US states (van Dorp et al., 2021;Figgins and Bedford, 2021). Our model assumes that the ratio of reproduction numbers (1 + ) for each country is drawn from a log-normal distribution, which is simultaneously estimated from the data. This hierarchical structure minimizes the influence of structural biases in any one country's data, leading to a more robust and interpretable estimate of both the overall selection effect and the heterogeneity at country level.  To examine the effect of the hierarchical model structure on our results, we additionally fit a nonhierarchical equivalent of our model in which s for each country is not affected by other countries (Fig. 2). We highlight two outcomes from this comparison: estimates for individual countries and conclusions drawn across countries. First, with both the hierarchical and nonhierarchical models, we found substantial heterogeneity in the selection coefficient, s, at the country level for Omicron ( Fig. 2A). In the hierarchical model, more extreme values of s-which might arise from real effects or data quality issues-are reined in by the moderating effects of other countries. Second, there are different ways to summarize the overall selective advantage of Omicron, encompassing the heterogeneity across countries (Fig. 2B). One approach is to provide a typical value. For the hierarchical model this is the mean of the higher-level distribution, which has an associated uncertainty (0.69 [0.63, 0.76] 95 per cent CrI). For the nonhierarchical model, this could be the mean of the country-level point estimates of s, with uncertainty summarized as the standard error of the mean (0.71 [0.68, 0.73] 95 per cent confidence interval [CI]). Another approach is to describe the variation across countries. For the hierarchical model this is the higher-level distribution itself, with ∼ Lognormal( , ) − 1 (Table 2), including the uncertainties on those parameters (0.68 [0.32, 1.08] 95 per cent CrI). For the nonhierarchical model, we summarized s across countries by concatenating the Markov chain Monte Carlo samples from s for all countries (0.65 [0.38, 1.27] 95 per cent CrI). Overall, regardless of the statistics used, Omicron has a very large selective advantage, and its typical value can be estimated much more precisely than the variation among countries.

Explaining heterogeneity in selective advantage
With so much variation in s among countries, we asked whether causes of that variation could be identified. We looked for associations between country-level s and other country-level attributes, both in the model structure and in the model outputs.
Because Omicron has been shown to evade immunity from prior infection or vaccination, we used our mixed-effects model to test whether the extent of immunity in a country at the time of Omicron's arrival explained variation in Omicron's selective advantage. Three of our four proxy variables for natural immunity-based on the numbers of cases or deaths in the weeks leading up to Omicron's arrival in the country-showed a significant effect (Fig. 3). The signs of the effects mean that Omicron had a stronger selective advantage in countries with more natural immunity, consistent with the idea that the part of Omicron's fitness advantage is immune escape. To understand the effect size, for each significant covariate, we compared the predicted s in a hypothetical population where the covariate was 0 against a hypothetical population where the covariate took the highest empirical value in the data (2.11 per cent for 'cases2wk', 10.21 per cent for 'cases25wk', and 0.03 per cent for 'deaths2wk'). This comparison yielded a difference in s of 0.26, 0.30, and 0.33, respectively. Neither past cases or excess deaths are ideal proxies for the population-level immunological status of different countries, due to differences in age-stratified infection and death risk in addition to likely heterogeneity in case detection rate between and within countries. However, the significant correlation of both recent cases and excess deaths with increasing s suggests that country-level immunological status could partially explain the differences in selection effects.
The selective advantage we estimate for each focal variant is by definition relative to the fitnesses of the preexisting variants against which it is competing. Modeling all variants explicitly would substantially expand the complexity of our model, so we instead explored this topic in three simpler ways. First, we hypothesized that Omicron might have a lesser selective advantage when it was competing primarily against Delta (than against other variants, which were outcompeted by Delta in much of the world), but including the prevalence of Delta as a covariate showed no significant effect (Fig. 3). Second, we wondered if the higher selective advantage of Omicron in some countries could be explained by the prevalence of later Omicron subvariants in those countries, but we found no association between selective advantage and the genetic composition of Omicron (Fig. S6). And third, we compared our selection estimates against the amount of diversity in the preexisting, background variant landscapes, as measured by either the number of unique Pango lineages (variant richness) or the probability that any two sequences drawn at random are from different Pango lineages (Simpson's 1 − , which incorporates abundances) (Fig. S7). We found a consistently positive correlation between background diversity and the point estimate of s, although it was only significant in some cases. One interpretation is that higher-diversity background variant communities lack a relatively high-fitness variant and thus pose less competition for the invading focal variant. We also wondered if uncertainty in the s for a  . Estimates of s for each country, for Omicron and Delta. The median and 95% CrI are shown as points and lines respectively. One country not fully shown has s delta = 1.98. Note that even though Omicron universally outcompeted Delta, s omicron can be less than s delta because the selective advantage quantifies the fitness of the focal variant relative to the fitness of the background variant(s). Data are provided in Table S1. country could be driven by diversity in its background variants, but we found no correlation between these quantities (Fig. S7).
Alternatively, variation among countries could be explained not by the landscape specifically faced by Omicron, but instead by national systemic differences. With so many possible fixed country-level covariates that are likely strongly correlated, identifying specific causes is fraught. However, if any such factors do play a consistent role, we would expect their effects to be seen for variants in addition to Omicron. We therefore compared countryspecific median estimates of s for Omicron with those for Delta and found a significant correlation (Pearson's correlation r = 0.37, P = 0.02; Spearman's correlation = 0.40, P = 0.01; Fig. 4). We thus suggest that systematic differences between countries play some role in explaining the selective advantage of new variants.

Conclusion
We found a very large selective advantage for Omicron (growth rate nearly 1.7× that of the previously circulating variants, mostly Delta), but also very large heterogeneity among countries (the 95 per cent CrI spans 1.3× to 2.0×). With selection coefficients this large, it is important to recall that their magnitude is also affected by the absolute growth rate (van Dorp et al., 2021;Chen et al., 2021). We thus used not only variant counts and a model of variant frequency dynamics, but also a renewal model that included case count data (as did Figgins and Bedford, 2021;Park et al., 2022a). Additionally, we found that the selective advantage for Omicron was robust to changes in the generation time assumptions, i.e. the advantage of Omicron is not due solely to a shorter generation time.
Our hierarchical modeling approach provides a natural way to express both the overall advantage of a variant and its distribution across countries. Previous work has used a similar strategy for smaller geographic scales (Paton et al., 2022) and for mutations across lineages (Obermeyer et al., 2022). The hierarchical approach is particularly useful for global-scale analyses, in which one would like to include data from all countries even though they have vastly different data qualities and quantities. It also extends well to multiple levels, allowing, for example, a future analysis that includes heterogeneity among states within countries, in addition to among countries.
Our mixed-effects framework allows one to test hypotheses about country-level properties that might explain differences in selective advantage among countries. That such fixed properties exist is supported by the significant positive correlation between selection strength for Delta and Omicron at the country level. Identifying potential causal systemic differences may require a different approach, however, as there are many possible factorspopulation density, national wealth, type of governance, or healthcare system, to name just a few-and likely strong correlations between them.
Previous work, both laboratory assays (Lu et al., 2021;Cele et al., 2021;Planas et al., 2021) and population-level studies (Pearson et al., 2021;Paton et al., 2022;Pulliam et al., 2022), shows that Omicron better escapes natural and vaccine-induced immunity than do other variants. Consistent with this and despite using covariates that are highly imperfect proxies for true population-level immunity, we found a higher selection coefficient for Omicron in countries with more immunity.
Future studies of SARS-CoV-2 variant dynamics could focus on scientific explanations for heterogeneity in the spread of different variants and why some countries seem to slow the spread of new variants. The modeling framework and code presented in this paper facilitates this type of work by allowing for joint estimation of selection effects and explanatory variables.

Data availability
All data used in this study are publicly available. The scripts used for data processing and statistical inference can be found at https://github.com/eeg-lanl/sarscov2-selection.

Supplementary data
Supplementary data are available at Virus Evolution online.

Funding
This project was funded by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20210887ER and the National Institutes of Health under U01 AI150680 and R01AI135946.