Changes in SARS CoV-2 Seroprevalence Over Time in Ten Sites in the United States, March – August, 2020

Abstract Background Monitoring of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibody prevalence can complement case reporting to inform more accurate estimates of SARS-CoV-2 infection burden, but few studies have undertaken repeated sampling over time on a broad geographic scale. Methods We performed serologic testing on a convenience sample of residual sera obtained from persons of all ages, at ten sites in the United States from March 23 through August 14, 2020, from routine clinical testing at commercial laboratories. We age-sex-standardized our seroprevalence rates using census population projections and adjusted for laboratory assay performance. Confidence intervals were generated with a two-stage bootstrap. We used Bayesian modeling to test whether seroprevalence changes over time were statistically significant. Results Seroprevalence remained below 10% at all sites except New York and Florida, where it reached 23.2% and 13.3%, respectively. Statistically significant increases in seroprevalence followed peaks in reported cases in New York, South Florida, Utah, Missouri and Louisiana. In the absence of such peaks, some significant decreases were observed over time in New York, Missouri, Utah, and Western Washington. The estimated cumulative number of infections with detectable antibody response continued to exceed reported cases in all sites. Conclusions Estimated seroprevalence was low in most sites, indicating that most people in the U.S. have not been infected with SARS-CoV-2 as of July 2020. The majority of infections are likely not reported. Decreases in seroprevalence may be related to changes in healthcare-seeking behavior, or evidence of waning of detectable anti-SARS CoV-2 antibody levels at the population level. Thus, seroprevalence estimates may underestimate the cumulative incidence of infection.


Introduction
In the United States (U.S.), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection causing coronavirus disease 2019  has resulted in over 12 million reported cases and 250,000 deaths as of November 25, 2020 [1]. However, the true number of infections is thought to be far greater, as an estimated 15-40% of infections are asymptomatic [2][3][4], and many persons with symptoms either do not seek medical care or are not tested. Furthermore, case counts reported by state and territorial health departments are affected by changes in testing and reporting capacity and practices, all of which can distort the apparent extent of infections and disease in the population at a given time.
Given that the majority of persons that are infected with SARS-CoV-2 develop antibodies following infection [5][6][7][8][9], the prevalence of anti-SARS-CoV-2 antibodies has been used as an indicator of cumulative infections in a given population since the start of the pandemic [10][11][12][13][14]. Seroprevalence estimates can capture mild or asymptomatic infections that may be missed in reported case counts.
They therefore allow better estimates of the true burden of infection, informing public health interventions. In addition, seroprevalence estimates can indicate population-level immunity, and the coverage of SARS-CoV-2 vaccines when they become available, both of which have implications for future transmission [15,16]. While seroprevalence rates have been used to estimate the proportion of the population that has never been infected, they are not necessarily a perfect indicator. If antibody levels wane and become undetectable over time, as has been suggested by recent reports [5,7,8], this could affect the accuracy of these estimates as the pandemic progresses.
We and others have reported on seroprevalence estimates throughout in the pandemic [10,11,13,17,18], with many seroprevalence studies focused on assessing a narrow time window or single geographic location. In this study, the availability of a large number of residual sera collected from patients during the course of routine blood tests allowed us to estimate seroprevalence in 10 sites across the U.S. at multiple time points. In each of the sites, we compare the seroprevalence A c c e p t e d M a n u s c r i p t trajectories to reported cases for a pandemic that has been heterogeneous with respect to time and geography.

Sampling
The U.S. Centers for Disease Control and Prevention (CDC) partnered with two commercial laboratory companies to select a random sample of deidentified sera remnants from a population of convenience, as previously described [17]. To be included, specimens must have been from routine diagnostic tests, such as metabolic panels or cholesterol levels, blinded to COVID-19 symptoms or diagnosis, that were collected from patients between March 23 and August 14, 2020, and, at minimum, included demographic information on age, sex, zip code of patient, and date of specimen collection.
Our target sample size was 1,800 specimens per site per round with 450 specimens in each of four age groups (0-18, 19-49, 50-64, and 65+ years), based on calculations of statistical power needed to estimate a prevalence within +/-2% in a population with 5% prevalence. The laboratories generated de-duplicated, de-identified sample lists from their internal electronic laboratory information systems of all specimens with sufficient serum volume. Specimens were sampled by stratified random sampling by age group, with the aim of collecting equal numbers of specimens for each of the four age groups.
Each round of specimens covered a one-week period of collections, except for Utah and Minnesota, which covered a two-week period, as fewer specimens were available per week. Specimen collection rounds were a minimum of three weeks apart for each site, and sites were not synchronized.
Between three and five specimen collection rounds occurred in ten sites, one from each of the 10 Health and Human Services regions (Fig e1) This activity was reviewed by CDC and was conducted consistent with applicable federal law and CDC policy. § Informed consent was waived, as de-identified data were used. The results from individual timepoints were previously released on CDC's website [19].

Laboratory Assay
Sera were tested with a CDC assay as previously described [17]. Briefly, sera specimens were tested with an enzyme-linked immunosorbent assay (ELISA) against pan-immunoglobin (pan-Ig) to the antispike protein of SARS-CoV-2. Signal-to-threshold ratios were calculated from optical density results with a reactive cutoff of >1.0. A receiver operator curve was created to maximize overall accuracy.

Analysis
Age-sex-standardized estimates of seroprevalence with confidence intervals that account for serology assay performance were generated using a two-stage bootstrap method described previously [17]. M a n u s c r i p t consistent over time but were updated from the second timepoint onwards in NY, WA, and PA, based on counties that provided the majority of specimens (Supplement).
To assess the relative magnitude and significance of changes in seroprevalence over time, while accounting for uncertainty in sampling and lab assay accuracy, we used Bayesian hierarchical modeling to obtain the joint posterior distribution of our parameters, optimized by Gibbs sampling [21]. Equations are listed in the statistical Appendix; briefly, we modeled the odds of having detectable SARS-CoV-2 antibodies at each site, assuming a binomial distribution for antibody detection and using a logit link, as a function of the following parameters: age, sex, county, and time period (an indicator variable for change between sequential collection rounds). Relative risks were calculated from the posterior probabilities of positivity as described in the appendix. For UT, model diagnostics indicated a poor fit, so we instead used a non-parametric permutation method (Appendix).

Estimated cases and catchment populations
Publicly reported daily case data at the county level were obtained from USAFacts [22]. County-level population projections by age and sex were downloaded from the U.S. Census Bureau [23].
Statistical analyses were programmed in R 3.6.1 and RStudio v1.2.1335 (R Foundation for Statistical Computing).

Sample characteristics
A total of 78,990 specimens were collected from patients between March 23 and August 14, 2020 across ten sites. Of those, 43.7% (34,516) were male. The median age was 55, with 59.0% (46,628) of all specimens from patients over 50 years of age. Table 1 shows the demographic characteristics of patient specimens at each site. A comparison of demographic and geographic information over each A c c e p t e d M a n u s c r i p t timepoint shows that our samples remained generally consistent within sites for known characteristics and are coming from the same source counties except for the first timepoint in NY, WA, and PA (Table e1).

Results by site
In the northeastern U.S., including NY, PA, and CT, reported cases increased markedly in April 2020, around or just before the first time point. New York had the most dynamic changes in seroprevalence and the highest seroprevalence, increasing from 6.9% by April 1 to 23.2% by April 16, as estimated by the bootstrap analysis (adjusted Risk Ratio (aRR) from Bayesian analysis: 2.3, 95% Confidence Interval (CI): 1.4 -4.9, Table 2 and Figure 1a). Daily reported cases in NY then steadily declined; subsequent seroprevalence measurements indicated a small decrease by the first week of May (aRR 0.7, CI: 0.6-0.9). In PA and CT, cases also began to decline gradually starting in May ( Figure   e2b-c), and the aRR of detectable antibodies was not statistically different from 1.0 in CT and PA.
In MN, reported cases first increased in early May, and remained elevated throughout our study period without clear peaks or declines. Seroprevalence climbed from 2.4% by May 12 to 6.1% by July 18. Increases during late June and early July were statistically significant (Figure e2d).
In the southern U.S. (FL, LA), seroprevalence rose gradually in April and May; FL increased to 2.8% and LA to 5.8% by April 23 (Figure e2g). Both sites experienced a marked increase in cases starting in late June through late July. By July 23, seroprevalence in FL rose to 13.3% (aRR 2.5, CI: 1.7 -3.   (table 3).

Case ascertainment ratio
At the first time point, estimated actual infections exceeded reported cases by at least ten-fold in seven of 10 sites (range 6 to 24-fold, Table 2); by the fourth time point, the ratio of estimated infections to reported cases was between two-fold in Missouri to seven-fold in NY.

Discussion
In a large-scale SARS-CoV-2 seroprevalence study, we collected and tested 78,990 specimens from ten U.S. sites that experienced different epidemic curves, over multiple timepoints between March and August 2020. Our use of convenience samples of commercial laboratory residual sera allowed serial measurements of seroprevalence from a large number of specimens drawn from similar sampling frames. During this period, less than 10% of the population had detectable antibodies to SARS-CoV-2 in all sites except NY and FL. Through Bayesian modeling, we determined that seroprevalence increases in NY, FL, UT, and MO, which were observed following an increase in reported cases, were statistically significant. We also observed statistically significant declines in A c c e p t e d M a n u s c r i p t seroprevalence in MO, UT, NY, and WA, following periods of stable, low numbers of cases. This decrease in seroprevalence, even in settings with ongoing transmission, may suggest that waning of assay-detectable antibodies may be occurring; the effects of waning antibodies on overall seroprevalence may be more apparent during periods with fewer new cases, possibly in combination with changes in healthcare-seeking behaviors and changing adherence to stay-at-home orders.
Since the beginning of the COVID-19 pandemic, seroprevalence estimates have been used to estimate cumulative incidence of infection [10][11][12][13][14]18]. These studies can capture a broader range of infections than case reporting, including mild or asymptomatic cases that may go undetected and unreported. Our seroprevalence estimates at specific time points are similar to those of other studies undertaken in similar locations, including in San Francisco, Utah, and Miami-Dade [24][25][26]. In New York, seroprevalence in a state-wide survey of grocery store patrons was 14% in late March [12], and serial samples in a health care setting found a peak of seroprevalence of 19.3% by mid-April [27], both approximating our estimates. In a representative survey in Connecticut, seroprevalence was 3.1%, lower than our estimate of 6.2% at a comparable timepoint, although lower assay sensitivity and different sampling methods may explain some of the difference in results [28].
At all ten sites, across all time periods during the study, the estimated number of infections was much higher than the number of reported cases. This case ascertainment ratio changed over the course of the study, from at least 10-fold in seven sites [17], to between two-to seven-fold. This decrease may be due to several factors, including improved testing availability, changing testing patterns [29] and increased contact tracing, thus potentially resulting in the detection and reporting of a larger proportion of infections. Changes in healthcare-seeking behavior may also have affected both case reporting and seroprevalence estimates in unpredictable ways. The case ascertainment ratio based on seroprevalence is a conservative estimate, however, as some individuals infected near the end of our specimen collection period may have not yet developed antibodies and not all A c c e p t e d M a n u s c r i p t infected persons develop antibodies. We would also be unable to detect antibodies lost due to waning in previously infected persons, which would underestimate the population that has been infected.
The observation that seroprevalence stayed flat, or even decreased, in some sites, while cumulative case counts increased at all ten sites, offers indirect evidence of waning levels of assay-detectable antibodies in a certain proportion of the population. In the absence of waning, repeated serological testing of similar samples from a population in which the virus continues to circulate should find that the number of persons with past infection increases monotonically. Reports on humoral response and antibody kinetics found waning anti-SARS-CoV-2 antibody levels among a subset of recovered patients along timelines that our study would have captured, with declines 40-60 days after the onset of symptoms [7,8,14]. IgG antibodies typically have a half-life of 7-21 days [30], but IgG to the SARS-CoV-2 spike protein, which our ELISA assay can detect, may persist longer [5,31]. Antibody levels following infection also appear positively correlated with disease severity [5,7]. Many infections identified through seroprevalence studies likely had mild or asymptomatic disease, and thus potentially lower initial antibody titers than reported cases. These persons may be more likely A c c e p t e d M a n u s c r i p t

Limitations
Our study has several limitations. While use of residual clinical specimens allows for large sample sizes spanning multiple timepoints, the population from whom these are drawn could differ from the population in multiple ways discussed previously [17]. Although the laboratory sera were sampled from routine tests not directly associated with COVID-19, increases in symptomatic infections may have driven a disproportionate number of COVID-19-infected patients to clinical settings, and the clinical specimens could have been taken as part of their acute or follow-up care.
Pediatric specimens sometimes did not meet our sample size targets in some sites, especially Utah, likely because healthy children infrequently have blood drawn.
In some instances, there was geographic variability within a site over time, with variability especially in suburban counties at some sites. Given local geographic variation in SARS-CoV-2 clusters, this may have affected our seroprevalence estimates, although our multivariable model for relative odds of seropositivity controlled for county. In addition, as noted above, healthcare-seeking behavior likely also changed over time for the general population [35]. This could potentially affect the study population from whom residual clinical samples are available, potentially biasing these seroprevalence estimates in unpredictable ways. Since our data were drawn from serial independent samples and not cohorts, the effects of antibody waning and potential sampling biases are inextricable. In addition, not all persons infected with SARS-CoV-2 mount an antibody response [5][6][7][8][9].

Conclusions
This study shows that as of August 2020, most of the predominantly adult population sampled from clinical laboratories in the 10 sites studied had no evidence of having past infection with SARS-CoV-2.
Moreover, among people who become infected, a majority-many of whom likely had mild or asymptomatic infections-are not captured through case reporting, although the gap between estimated infections based on seroprevalence and reported case has decreased over time. Our  M a n u s c r i p t M a n u s c r i p t M a n u s c r i p t M a n u s c r i p t 1.0 (0.9, 1.2) 0.9 (0.6, 1.1) 0.9, (0.7, 1.1) 1.5 (0.7, 2.7) 1.1 (0.9, 1.5) 1.0 (0.8, 1.1) 0.9 (0.5, 1.