Phylogenetic Analyses of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) B.1.1.7 Lineage Suggest a Single Origin Followed by Multiple Exportation Events Versus Convergent Evolution

Abstract The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) herald a new phase of the pandemic. This study used state-of-the-art phylodynamic methods to ascertain that the rapid rise of B.1.1.7 “Variant of Concern” most likely occurred by global dispersal rather than convergent evolution from multiple sources.

Following phylogenetic and epidemiological investigations, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genetic lineage B.1.1.7 is suspected to be associated with an increase in human-to-human viral transmissibility [1,2] and was classified as a "variant of concern" (VOC B.1.1.7) on 18 December 2020 [3]. The variant was first discovered in Kent, United Kingdom, on 21 September 2020 and has since been identified in over 40 countries across the world, including the United States [3][4][5][6]. We sought to evaluate whether the breadth of VOC B.1.1.7 identification represents convergent evolution [7] or rapid local and global dispersal after this lineage's genesis. On Figure 1).
We combined these B.1.1.7 sequences with a representative set of non-B.1.1.7 sequences (n = 4768) based on sequence homology. All sequences were aligned using MAFFT and highly homoplasic sites were masked [10]. To reduce the data set size while maintaining an appropriate set of epidemiologically relevant background sequences, we used BLAST [11,12] to identify the 50 closest non-B.1.1.7 variants to each of the 17 118 B.1.1.7 genomic sequences in the data set [13,14]. After keeping one copy of duplicated entries that ranked among the 50 best hits, a total of 4768 sequences out of the 316 075 non-B1.1.7 sequences available on GISAID were kept for further analyses and combined with the B.1.1.7 data set. The final set of 21 886 sequences was aligned with MAFFT [15], and a maximum likelihood phylogeny was inferred using IQ-TREE v2.1.2 [16]. The resulting phylogeny showed that all available B.1.1.7 samples clustered together with high support (0.99 Shimodaira Hasegawa [SH] support [17][18][19]). Non-UK VOC B.1.1.7 sequences intermix within those from the United Kingdom ( Figure 1). As convergent evolution can induce incorrect clustering [20], the same approach was repeated after excluding variable positions that define the B.1.1.7. lineage (Supplementary Table 2), which yielded a similar picture. These patterns are in line with the view that this variant successfully spread around the world after it arose in the United Kingdom.
To estimate the timing of introduction of B.1.1.7 variants outside the United Kingdom, we applied a multistep analytic approach, as previously described by our group for human immunodeficiency virus (HIV) [21,22] (see Supplementary  Information). B.1.1.7 clusters of size ≥ 2 including only non-UK sequences were identified from the ML phylogeny in R [23]. For each non-UK clade, the phylogeny was rescaled into units of time with treedater [24], assuming a strict molecular clock with the rate of SARS-CoV-2 genome evolution drawn from an externally estimated distribution, as previously described [25], and the rate was a mean of 9.41 × 10 -4 nucleotide substitutions per site per year with a standard deviation of 4.99 × 10 -5 . To incorporate uncertainty in the estimated clock rate, molecular clock estimation was replicated 100 times for each non-UK B.1.1.7 clade. We identified a total of 90 clades of size ≥ 2 for a total of 513 sequences (ranging from 2 to 135) including only B.1.1.7 variants from outside the United Kingdom. The largest cluster of 135 sequences was identified in Denmark across 5 regions. One third (60/90) were European exclusive clusters (Supplementary Table 1), whereas 12 clusters included sequences from the United States (5 sampled in California).
The earliest estimated seeding of B.1.1.7 from the United Kingdom dates to 9 September 2020 in Denmark, and the most recent to 8 January 2021 in Spain (see Supplementary Table 3 and Supplementary Figure 2). The number of weekly introductions outside the United Kingdom peaked in mid-December ( Figure 2). In the United States, the first introduction was estimated on 14 November in Florida. Five distinct introductions in California were also identified from 3 December to 26 December, including one cluster of 19 sequences. Of note, 6 international non-UK clusters including ≥2 countries were identified of whom 2 did not include European sequences (Supplementary Table 3).
In response to the rapid increase in viral infections and spread, UK officials announced a lockdown on 31 October that came into force on 5 November and ended on 5 December. Given time to the most recent common ancestor (TMRCA) estimates, we determined that 19% (17/90) of the exportation events that gave rise to detectable non-UK VOC B1.1.7 transmission lineages occurred during this period (the remaining 81% occurred before or after these dates). The emergence and rapid dispersal of this new VOC led to the implementation of a new national strict lockdown in the United Kingdom on 4 January 2021 [26].
As previously described by du Plessis et al [14], we next used the TMRCA of each non-UK clade to estimate the genomic "detection lag" for each cluster, which represents the duration that a transmission lineage went undetected before it was first sampled by genome sequencing. The mean detection lag was ~10.6 days (interquartile range [IQR] = 4-15). This largely agrees with detection lag-time estimates from SARS-CoV-2 importation into the United Kingdom in the first months of the pandemic [14], which was on average 8 days (IQR = 3-15, ~10 days for lineages comprising ≤10 genomes and <1 day for lineages of >100 genomes).
Of note, virus genome sequences have been determined for only a fraction of infections. Even in the United Kingdom, where the by far largest sequencing effort is done, only an estimated 4.3% (129 939 available sequences out of 3 039 797 cases reported on 14 January) [27] of infections have been sequenced. For this reason, and also because not all sequenced SARS-CoV-2 genomes are being deposited in the GISAID repository, many B.1.1.7 variants that successfully established transmission chains outside of the United Kingdom likely remain undetected (for now). Our estimated number of B.1.1.7 exportation events from the United Kingdom thus represents an underestimate. The sparse sampling and sequencing also poses limits to the accuracy with which introduction events can be dated (see du Plessis and colleagues [25] for a more detailed explanation).
Our results do not suggest that the canonical mutations of VOC B.1.1.7 evolved independently in different locations. Instead, our analyses point to an origin in and spread of the VOC B.1.1.7 from the United Kingdom. As for the virus' initial [28] and subsequent [29,30] spread, global connectedness and high levels of human mobility undoubtedly facilitated VOC B.1.1.7 dissemination. The swift global spread of VOC B.1.1.7 illustrates that current restrictions are insufficient to prevent the spread of new and emerging variants [31][32][33][34][35][36][37]. Similar to Ebola [38], hepatitis C virus (HCV) [39,40] and HIV [22], countermeasures to SARS-CoV-2 spread should be developed with a broader perspective than the national level. Otherwise, without population immunity, successful local reductions in SARS-CoV-2 burden will be counteracted by imported infections that set off new waves of viral spread, possibly exacerbated by novel phenotypic characteristics of the imported strains. Supplementary Data Supplementary materials are available at Clinical Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.