Introduction, Transmission Dynamics, and Fate of Early Severe Acute Respiratory Syndrome Coronavirus 2 Lineages in Santa Clara County, California

Abstract We combined viral genome sequencing with contact tracing to investigate introduction and evolution of severe acute respiratory syndrome coronavirus 2 lineages in Santa Clara County, California, from 27 January to 21 March 2020. From 558 persons with coronavirus disease 2019, 101 genomes from 143 available clinical samples comprised 17 lineages, including SCC1 (n = 41), WA1 (n = 9; including the first 2 reported deaths in the United States, with postmortem diagnosis), D614G (n = 4), ancestral Wuhan Hu-1 (n = 21), and 13 others (n = 26). Public health intervention may have curtailed the persistence of lineages that appeared transiently during February and March. By August, only D614G lineages introduced after 21 March were circulating in Santa Clara County.

The coronavirus disease 2019 (COVID-19) pandemic from the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged from Wuhan, China, in December 2019 and rapidly spread throughout the world, causing approximately 163 million cases and 3.4 million deaths as of 16 May 2021 [1]. The first confirmed SARS-CoV-2 case in the United States was diagnosed in a resident of Washington State on 20 January 2020 [2]; since then, multiple introductions into the United States been reported [3][4][5][6][7][8][9], resulting in widespread community dissemination nationwide [9]. For outbreaks caused by SARS-CoV-2, health response and action play critical roles in the recognition and isolation of suspected infectious cases. Contact tracing is a classic epidemiologic tool to study outbreaks of infectious disease and track patterns of transmission that can inform public health interventions [10].
Genomic epidemiology using viral whole-genome sequencing (WGS) complements contact tracing during outbreak investigations and can track virus evolution and spread in an epidemic [11]. WGS of SARS-CoV-2 has been used to identify (1) undetected transmission of the WA1 lineage associated with the first reported SARS-CoV-2 case in the United States from Washington State in January 2020 [3], (2) multiple introductions of SARS-CoV-2 lineages into Northern California [4], coast-to-coast transmission [5], and (3) importation of a viral lineage containing a D614G mutation (A23403G singlenucleotide variant [SNV]) in the viral spike protein to New York from Europe [6,8,12], with subsequent dispersion throughout the United States [12]. However, few studies to date have included sampling and analysis of dynamic changes in SARS-CoV-2 genotypes within a single community over time. In the current study we sequenced a demographically representative sampling of SARS-CoV-2 strains circulating in Santa Clara County (SCC) from 27 January to 21 March 2020 and analyzed publicly available viral WGS data to mid-October 2020, to investigate the introduction, transmission, and persistence or disappearance of SARS-CoV-2 lineages in this community.

Ethics
Nasopharyngeal and/or oropharyngeal swab specimens were collected for the purpose of diagnostic testing as part of public health practice during the pandemic response at the SCC Public Health Laboratory (SCCPHL), California Department of Public Health, and the US Centers for Disease Control and Prevention (CDC). Viral WGS was performed at the University of California, San Francisco (UCSF) genomics laboratory with the approval of UCSF's Institutional Review Board (protocol no. 11-05519). Viral WGS studies of samples submitted to the SCCPHL were designated exempt by the Committee for the Protection of Human Subjects (project no. 2020-30; issued under the California Health and Human Services Agency's Federal Wide Assurance no. 00000681 with the Office of Human Research Protections).

Sample Collection, Quantitative Reverse-Transcription Polymerase Chain
Reaction Testing, and Contact Tracing Investigation SARS-CoV-2 nasopharyngeal and/or oropharyngeal samples were collected in SCC from 27 January to 21 March 2020 (Supplementary Methods). For the autopsy cases, formalinfixed paraffin-embedded tissue specimens from 2 persons who died from unknown causes on 6 and 17 February 2020 were submitted to the US CDC for analysis. Quantitative real-time reverse-transcription polymerase chain reaction testing for laboratory diagnosis of COVID-19 was performed initially by the CDC and subsequently by the SCCPHL [13]. Contact tracing was performed according to standardized protocols (Supplementary Methods).

Viral WGS, Assembly, and Phylogenetic Analysis
Viral WGS and Sanger sequencing confirmation of SNVs was performed as described (Supplementary Methods) [4,14]. Complete, high-quality SARS-CoV-2 (n = 19 922) genomes from the global COVID-19 pandemic with collection date information, which had been sequenced from samples obtained from infected persons on or before 23 March 2020, were downloaded from the Global Initiative on Sharing All Influenza Data (GISAID) database (10 August 12020 build) [15,16], expanded to include SARS-CoV-2 genomes, and processed using the Nextstrain bioinformatics pipeline Augur [17]. After addition of the 101 newly sequenced genomes in the current study to the data set, a total of 20 223 genomes were aligned using MAFFT v7.4 software [18] as implemented in Augur, and a maximum-likelihood phylogenetic tree was constructed using IQTREE v1.6 software [19]. Branch locations were estimated using a maximum-likelihood discrete traits model. The resulting tree was visualized in the Nextstrain Web application Auspice [17] and using Geneious v11.1.5 software [20]. Smaller subtrees consisting of viruses in the WA1, SCC1, and SCC3 lineages were also constructed using the Augur pipeline. Multiple sequence alignments of clusters were generated using MAFFT v7.388 software [18] and visualized using Geneious software (Supplementary Methods). Lineage and cluster information extracted from the phylogenetic analyses was merged with the information stored in the California Reportable Disease Information Exchange (CalREDIE) database.

Correlation of Epidemiologic and Genomic Data
To assess whether the COVID-19 cases diagnosed by the SCCPHL were representative of those diagnosed in SCC during the period of our study, we compared by sex, age, race/ethnicity, and home address the information from all cases reported to CalREDIE. For cases classified as travel associated, such as imported cases, we evaluated whether the identified genomic lineage was consistent with the reported travel history. For all other COVID-19 cases that were determined to be locally acquired, we used the genomic data to confirm all links involving ≥2 persons that had been identified by contact tracing and epidemiologic investigation.

Determining the Fate of Circulating SARS-CoV-2 Lineages
After identification of the 17 lineages represented in the study, all genomes from California collected after 23 March 2020, sequenced, and deposited in GISAID as of 18 October 2020 were downloaded from GISAID. Combined with the 101 study genomes and previously analyzed California genomes collected on or before 23 March 2020, this yielded a total of 3660 total genomes. These 3660 longitudinally collected genomes from 27 January to 18 October 2020 were then screened for the presence or absence of the key single-nucleotide polymorphisms defining each of the 16 study lineages, using in-house Linux shell scripts.

Statistical Methods for Group Comparisons
For comparison of individual characteristics between COVID-19-infected persons with sequenced genomes and those for whom samples were unavailable for genomic sequencing or recovered genomes had insufficient coverage, we calculated P values using the χ 2 goodness-of-fit test. Differences were considered statistically significant at P < .05.

RESULTS
From 27 January to 21 March 2020, there were 558 SARS-CoV-2 positive cases diagnosed in SCC (all except 2 were in SCC residents) and reported to the statewide CalREDIE database ( Figure 1). The SCCPHL and the CDC performed diagnostic testing on specimens from 143 of these 558 cases. Specimens from 101 of 143 cases (70.6%) had recoverable SARS-CoV-2 genomes with sufficient breadth of coverage (≥70%) across the genome for phylogenetic analysis. There were no statistically significant differences in sex or race/ethnicity between the 101 sequenced cases with viral WGS and 457 other cases in SCC (Table 1), but there were differences in age, with sequenced cases being older overall (P = .03). There was a higher proportion of deaths (P = .001) among the sequenced cases, a finding consistent with early criteria prioritizing testing of hospitalized persons with serious COVID-19 disease and with disproportionate sequencing of such cases during the January-March time frame of the study (Figure 1  J a n 2 1 J a n 2 3 J a n 2 5 J a n 2 7 J a n 2 9 J a n

International Travel as a Risk Factor for COVID-19
The first 2 cases in January 2020 were identified in international travelers (deposited in the GISAID database as US/CDC-5/2020 and USA/CDC-6/2020 and abbreviated as C-5 and C-6, with cases hereafter also referenced by their individual virus abbreviations) (Figures 2A and 3A and Supplementary Table  1). Consistent with their recent travel history to China, C-5 and C-6 were assigned to the ancestral Wuhan Hu-1 lineage (with 0 SNVs) and an Asian lineage defined by only 1 SNV relative to the Wuhan Hu-1 lineage (C21707T), respectively ( Figure 3B). Of the other 8 persons in our series with a history of international travel, the viral genomes from 6 were positioned in clusters by phylogenetic analysis that included genomes from other cases sequenced from the geographic locations where they traveled. For instance, a couple (UC104 and UC105) were confirmed SARS-CoV-2 positive a few days after returning to California from a trip to the Middle East; their samples yielded genomes assigned to the D614G lineage and positioned within a cluster that included sequences from Egypt and Saudi Arabia (Figures 2A and 3A; Table 2, cluster N). Another couple (UC124 and CZB-1788) traveling aboard a cruise ship [21] became sick after disembarking and tested positive in early March. A third person UC146 also traveling on the cruise had COVID-19 diagnosed at approximately the same time. These 3 virus strains were found to be of the WA1 lineage (Figures 2B and 3A; Table 2, cluster O), sharing 5 SNVs in common with sequenced genomes from passengers and crew aboard the cruise ship and the majority of WA1 lineage viruses circulating in northern California and Washington State in during February and March 2020 [3,4,21]. Another international travel-related case (UC184) occurred in a person who traveled to Asia in mid-March and died after returning home. UC184 was assigned to a lineage characterized by 4 distinct SNVs (C6312A, C13730T, C23929T, and C28311T) ( Figures  2A and 3A) [22]. We found 129 cases of this 4-SNV lineage reported globally as of 21 March 2020, with major clusters in India [22], southeast Asia, and California (n = 10 cases). Two of 10 persons with international travel history (UC135, who traveled to Asia, and UC162, who returned from a trip to Central America but also attended a large party in the San Francisco Bay Area) were found to be infected with viruses of the SCC1 lineage ( Figures 2C and 3A).

Retrospectively Identified COVID-19 Deaths
To assess whether there were cases and deaths associated with COVID-19 in California at a time when testing for COVID-19 was limited and widespread community transmission of COVID- 19 had not yet been recognized, the California Department of Public Health provided recommendations to county medical examiners on 29 April 2020 that persons who died between 17 December 2019 and 16 March 2020 from suspected COVID-19 should have postmortem specimens collected and submitted to the CDC for analysis. CDC confirmation of SARS-CoV-2 infection in postmortem tissue specimens was obtained April 2020 from 2 persons who had died at home in February from an unknown respiratory illness [7]. The viral genomes associated with both cases, C-D1 and C-D2, were determined by the CDC to be part of the WA1 lineage, with 5 and 3 SNVs, respectively ( Figures 2B and 3C) [4], suggesting that infection had likely been acquired locally. In a third medical examiner case in an elderly man who died at home (UC187), clinical samples were tested at the SCCPHL and found to be positive for SARS-CoV-2; the virus was subsequently shown by viral WGS to belong to the D614G lineage (Figures 2A and 3A).

Introduction of New SARS-CoV-2 Lineages
On 26 February 2020, the first case of community transmission of SARS-CoV-2 in California (UC4) was reported [4], and the SCC Public Health Department was notified. One extended family member (UC195) had a positive SARS-CoV-2 test in late February during the 14-day quarantine period (Figure 3A and D; Table 2, cluster A). The genome of his SARS-CoV-2 strain had the same C9924T SNV as UC4 that defines the Solano County lineage ( Figure 3D) [4]. Concomitant with this intercounty transmission event, an elderly woman (UC101) was hospitalized with SARS-CoV-2 infection, and contact tracing eventually identified an additional 4 infections in 2 family members (UC102 and UC106), a healthcare worker (HCW) at the hospital (UC121), and a close contact of the HCW (UC120). All 5 strains were assigned to a previously undescribed lineage containing a G14178 SNV, named SCC2 ( Figure 3A and D; Table 2, cluster B).
A notable example of how the genomic surveillance directly informed contact tracing efforts involved cases UC200, UC197, and UC161. Genomic analysis of samples collected from these 3 individuals in mid-March revealed that all 3 viruses were of the WA1 lineage and shared 5 SNVs (Figures 2B  and 3A; Table 2, cluster D). The genomic linkage guided further contact investigation interviews showing that UC161, previously classified as a community transmission case with unknown source, attended the same church as UC200 and UC197. Another local cluster was identified when a member of a large household became ill in late February, followed by SARS-CoV-2 infection of an additional 8 household members (UC167, UC169, and UC170-UC175). All viral genomes sequenced from this cluster were assigned to a single lineage containing G26591T and C27874T SNV, named SCC3 (Figures 2A, 3A, and 3F; Table 2, cluster C). Phylogenetic analysis revealed an additional SCC3 lineage genome (UC155) containing the C27874T SNV and corresponding to a COVID-19 case diagnosed in an unrelated SCC resident ( Figure 3A and 3F  this person and the large household cluster was identified by contact tracing. On 29 February, the SCC Public Health Department initiated an investigation of a COVID-19 outbreak among workers at San Jose International Airport. Of 11 confirmed cases, all 9 with available viral genomes, sequenced from 5 workers, 2 household contacts, and 2 HCWs, were of the SCC1 lineage that shares the G29711T SNV (Figures 2A, 2C, and 3A; Table 2, cluster G). Overall, 41 genomes of 101 in the current study were assigned to the SCC1 lineage. Epidemiologic links were known a priori in 27 cases (69.5%), grouped into 7 clusters, including the aforementioned San Jose airport cluster [4] that includes a household transmission event and 2 HCWs, a cluster associated with a grocery store that also involved a resident from  Solano County [4], and 5 other household transmission events, of which 2 had a history of domestic travel and 2 a history of international travel (Table 2, clusters G-M). No epidemiologic links were found among the remaining 14 SCC1 lineage genomes, indicating cryptic transmission of this lineage in California beginning in late February 2020 ( Figure 3A and Supplementary Table 1).

Wuhan Hu-1 and D614G SARS-CoV-2 Lineages
Twenty-one of 101 SARS-CoV-2 genomes in this study (21.8%) differed from the ancestral Wuhan Hu-1 lineage by 0 SNVs or 1 non-lineage-defining SNV ( Figure 3A). One of these genomes was sequenced from a young man who died at home in March (UC199; nasal swab/C-D3; postmortem formalinfixed paraffin-embedded lung tissue). Paired household cases harboring the Wuhan Hu-1 lineage were also identified in an elderly couple (UC149 and UC151) and in a parent and child (UC137 and UC159) who attended a school-hosted gathering from which there had been other reported cases ( Table 2, clusters E and F). Phylogenetic analysis identified only 4 of 101 virus genomes (4.0%) containing the D614G (A23403G) spike mutation ( Figure 2A). These included the aforementioned couple (UC104 and UC105) who traveled to the Middle East in February ( Table  2, cluster N), an aforementioned death case (UC187; early March), and a middle-aged man (UC164; mid-March) without a known exposure risk factor.

Introduction of 10 Other SARS-CoV-2 Lineages
In addition to Wuhan Hu-1, WA1, SSC1, SCC2, SSC3, Solano County, and D614G, 10 other lineages were identified among cases in our series, including the aforementioned 4-SNV lineage in returning traveler UC184 (Figures 2A and 3A). For the majority of these lineages (9 of 10 [90%]), only 1 person from SCC was identified as being infected by a virus from each lineage, and these singleton cases were attributed to unknown community exposure. The only exceptions were UC180 and UC185; both male adults were infected with the A12557G lineage, although an epidemiologic link between the 2 cases was not determined.

Dynamic Changes of SARS-CoV-2 Genotypes in SCC Over Time
We performed genotype analysis of all 3660 full-length sequenced genomes from California deposited in the GISAID database that had been collected from 27 January to 30 September 2020. In January 2020, sequenced genomes from SCC corresponded mostly to Asian lineages, with 0-1 SNVs compared with the ancestral Wuhan Hu-1 lineage. The WA1, SCC1, and D614G lineages emerged in February, and SCC1 expanded to become the single dominant lineage in the county in March (accounting for approximately 25% of the sequenced genomes during that month and 40.6% of the complete sample set) ( Figure 4A). The SCC1 and WA1 lineages declined in number and disappeared in March and June, respectively, while the proportion of genomes from the D614G lineage rapidly increased, becoming the single predominant genotype in SCC by June ( Figure 4A and B). The A12557G and C25692T lineages were common in April and May, the latter lineage in part owing to its association with a large skilled nursing facility outbreak (unpublished data) but disappeared afterward ( Figure 4A and B). Similarly, additional lineages that were introduced to SCC from January to March 2020, including those associated with discrete household clusters (Solano County, G14718T, and G26591T), disappeared by August 2020 (Figure 4A and 4C). Overall, similar longitudinal changes in lineage frequency were observed across the state of California ( Figure 4A and B). In September 2020, all sequenced genomes in SCC and California were of the D614G lineage ( Figure 4A-4C), However, an analysis of additional SNVs in the 4 D614G genomes sequenced from SCC from January to March 2020 revealed that these sublineages disappeared from SCC and California by 5 August (Figure 4C, left), indicating that continuation of the D614G lineage in SCC was most likely due to ongoing introduction into the county after March 2020 rather than persistent community transmission.

DISCUSSION
In the current study, we combined the power of genomic epidemiology with public health surveillance using contact tracing to monitor the introduction and community transmission of at least 17 SARS-CoV-2 lineages circulating in SCC, California, from 27 January to 21 March 2020. We identified 2 cases in which the infection was initially thought to have been associated with international travel by contact tracing, but viral genome analysis suggested that the individual had likely been infected by a locally circulating strain (SCC1). Viral WGS also identified a new epidemiologic link at a local church between seemingly unrelated cases. Finally, we were able to elucidate the cause of death in 3 previously unexplained cases as unrecognized SARS-CoV-2 infections, and to determine their phylogenetic placement in the WA1 lineage. Genomic epidemiology has rapidly emerged as an indispensable tool for investigating and monitoring spread of outbreaks such as COVID-19.
The 3 decedent cases in our study highlight the unmet need for expanded SARS-CoV-2 testing during the early stages of the pandemic in the United States, which would have likely revealed cases of cryptic viral transmission not linked to ostensible travel history. They also underscore the value of performing autopsies and postmortem testing early as an additional system for identifying the spread and shortening the time to assess the threat of the virus in a community. A robust public health genomic surveillance system of sufficient scope and scale to address pandemic threats such as SARS-CoV-2 needs access to many different types of samples for testing [23].
The D614G lineage containing a spike protein coding mutation is thought to have arisen in Germany from China in late January 2020 [24], and rapidly spread via travel through Europe, and from there, to the United States, associated with a large outbreak in New York City [6,8,12]. Epidemiologic, in vitro cell culture, and rodent model data to date [12,25,26] support the notion that D614G lineage viruses achieve higher viral loads and are more infectious than other strains, although, notably, there is no evidence of increased pathogenicity. Thus, a potential fitness advantage may explain the persistence and predominance of the D614G lineage in SCC, the United States, and globally [12,27], although some have attributed the rise of D614G lineage to random founder effects [28]. The disappearance of the sublineages from the 4 sequenced D614G viruses in the study indicate that the surge in D614G cases in the county during the summer was mainly fueled by ongoing exogenous introduction.
Our results confirm that SARS-CoV-2 community transmission was already occurring by late January 2020, when available testing was extremely limited and earlier than the first officially reported case in SCC on 27 February [29]. Given the diversity of viral lineages uncovered in this study, it is likely that no local intervention, short of shutting down all travel into and out of the region, could have prevented these repeated introductions into SCC. Thus, given that "stay-in place" mandates were not enacted in SCC until 16 March and statewide until 19 March, community transmission may have been inevitable, although earlier public health interventions, such as social distancing and masking, would likely have reduced the size of the outbreak in SCC. Nevertheless, the disappearance of all of 17 introduced lineages suggest that these public health mandates may have supported local eradication. However, given ongoing introductions of SARS-CoV-2, local control of SARS-CoV-2 transmission becomes impractical without concurrent containment at the state and national levels.

Supplementary Data
Supplementary materials are available at The Journal of Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.