Risk Factors for Community and Intrahousehold Transmission of SARS-CoV-2: Modeling in a Nationwide French Population-Based Cohort Study, the EpiCoV Study

Abstract We assessed the risk of acquiring severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from household and community exposure according to age, family ties, and socioeconomic and living conditions using serological data from a nationwide French population-based cohort study, the Epidémiologie et Conditions de Vie (EpiCoV) Study. A history of SARS-CoV-2 infection was defined by a positive anti-SARS-CoV-2 enzyme-linked immunosorbent assay immunoglobulin G result in November–December 2020. We applied stochastic chain binomial models fitted to the final distribution of household infections to data from 17,983 individuals aged ≥6 years from 8,165 households. Models estimated the competing risks of being infected from community and household exposure. The age group 18–24 years had the highest risk of extrahousehold infection (8.9%, 95% credible interval (CrI): 7.5, 10.4), whereas the oldest (≥75 years) and youngest (6–10 years) age groups had the lowest risk, at 2.6% (95% CrI: 1.8, 3.5) and 3.4% (95% CrI: 1.9, 5.2), respectively. Extrahousehold infection was also associated with socioeconomic conditions. Within households, the probability of person-to-person transmission increased with age, from 10.6% (95% CrI: 5.0, 17.9) among children aged 6–10 years to 43.1% (95% CrI: 32.6, 53.2) among adults aged 65–74 years. Transmission was higher between partners (29.9%, 95% CrI: 25.6, 34.3) and from mother to child (29.1%, 95% CrI: 21.4, 37.3) than between individuals related by other family ties. In 2020 in France, the main factors identified for extrahousehold SARS-CoV-2 infection were age and socioeconomic conditions. Intrahousehold infection mainly depended on age and family ties.


Web Appendix 1. The COVID-19 pandemic in France in 2020
The first wave of the COVID-19 pandemic peaked two weeks after the first national lockdown decreed from March 17 to May 11 2020 (Web Figure 1), in the context of mask shortages and little availability of PCR tests.This first lockdown combined drastic measures, including limited outdoor circulation, travel bans, mandatory teleworking, and the closure of schools, universities, and shops, except for essential supplies, which led to a very low incidence rate.The second wave started slowly at the end of August, despite wide-scale access to masks and free access to tests (both PCR and antigenic tests).Following a curfew period with territorial variations, a second national lockdown was reinstated from October 30 to December 15, 2020.This lockdown was less restrictive than the previous one, with no school closures (although universities were closed) and an extended list of shops authorized to remain open.Throughout the year, incentives for telework and other barrier measures, especially face covering and physical distancing, were maintained.

Web Appendix 2. Technical summary
We adopted an adapted version of chain-binomial models that fit the final size of infections (1), i.e., the distribution of seropositive and seronegative individuals within households, to analyze the transmission process among household members.The model has been previously described by Bi et al. (2020) (2).We adapted their code available in open access to the EpiCov data.The model estimates the risk of infection from: 1) extra-household sources and 2) a single infected household member.

Assumptions
The model's assumptions are as follows: -each household member can be infected either from within a household or from extrahousehold sources -household members mix at random within a household and can infect one another -all household members were initially susceptible to infection to SARS-CoV-2 -the possibility of reinfection for the duration of the study period was neglected In addition, we assumed no misclassification of the serological result, either positive or negative.

Data augmentation
Given that only the serological status of individuals was known, with no additional information about the chronology of infection events within the households, the model considers augmented data with all possible sequences of viral introductions to each household and subsequent transmission events within the household.Possible sequences of viral introduction in the household and subsequent transmission events within the household are defined from the assignment of a generation to each household member.People infected from outside the household are assigned to generation 0. Those that they infect within the household are assigned to generation 1, those infected by generation 1 to generation 2, and so on.Uninfected individuals are assigned to generation infinity.For each household h, one k possible sequence  ℎ, of viral introduction/transmission is one ordered assignment of generation.For example, in a household of 3 individuals, i, j, and k, in which the 2 individuals i and k are positive and j is negative, there are three possible sequences (Web Figure 2).In the first,  1 , both i and k could have been infected outside of the household: the two are assigned to generation 0. In the second,  2 , i could have been infected outside and then infected k within the household: i is assigned to generation 0 and k to generation 1.The third,  3 , is the opposite: k could have been infected outside and then infected i within the household: k is assigned to generation 0 and i to generation 1.In these three sequences, j is assigned to generation infinity.

Likelihood of the model
The likelihood of the model is calculated via the decomposition into the contribution of each possible sequence  ℎ, of each household h.
The likelihood of the sequence  ℎ, is: where Pr(  |HH ℎ, ) is the probability of household member i of household h having an infection generation of   in the sequence HH ℎ, Pr(  |HH ℎ, ) is defined from the two probabilities of interest: 1) The probability  , of a household member i escaping infection from a single infectious household member j, which corresponds to a person-to-person transmission probability.
2) The probability   of a household member i escaping infection from the community, i.e., extra-household exposure, over the course of epidemic. ] If i is infected outside, i.e., assigned to generation 0, it is simplified as Pr(  |HH ℎ, ) = 1 −   If i is infected within the household, i.e., assigned to generation 1 or greater, it is simplified as ] Where:   represents the probability of household member i escaping infection from extra-household exposure ∏  , ≠,  <(  −1) represents the probability of household member i escaping infection from other infected household members up to generation represents the probability of household member i being infected from any infected household members of generation   − 1 For each household h, the likelihood of observing the final infection state is the sum of the probability of all the possible sequences  ℎ, that could lead to this final result.

𝑘
The global log-likelihood of the model is the sum of the contribution of all households:

Covariates of adjustment
In a null model, the probabilities Q and B were fixed and equal for all individuals.Then, they were adjusted for individual and household characteristics. , was estimated as a function of the exposed household member individual's characteristics   , the potential infectors' characteristics   , and some shared characteristics of their household   as follows: ( , ) = β 0 +   β +   α +  ℎ    , was estimated as a function of the exposed household member individual's characteristics   and his household's characteristics   .(  ) = β 0 ′ +   β′ +  ℎ ′ The probabilities of being infected from the community and from one single infected household member were then obtained as 1 − ((  )) and 1 − (( , )) respectively, the expit function being the inverse of logit.
We consider the following covariates.

Model selection
Associations of all the covariates mentioned with   and  , , respectively, were tested one by one in univariate models.In the final multivariate model, we adjusted for covariates for which a decrease was observed in the widely applicable information criterion (WAIC) and the leaveone-out cross-validation information criterion (LOOIC) in univariate analyses (3).

Inference and implementation
Posterior distributions of parameters were estimated via MCMC using the rstan package.The default algorithm in rstan is the No-U-Turn Sampler (NUTS), which is a tuning-free Hamiltonian-based Monte Carlo sampler (4).
We set weakly informative priors on all parameters to be normally distributed on the logit scale with a mean of 0 and a standard error of 1.5.We ran four chains of 1,500 iterations each, with 500 warm-up iterations, and assessed convergence visually and using the Gelman-Rubin Convergence Statistic (R-hat).

Handling of missing variables
Given the very low percentage of missing data for the considered variables (<4%), models were run using the complete dataset.

Validation
In order to evaluate the ability of our framework to estimate B and Q, we conducted a simulation study in which synthetic data generated on the basis of known values of B and Q were analyzed.
The different steps of the framework evaluation were as follows (Web Figure 3): Step 1: Synthetic data generation: We constructed a synthetic population of 500 households with the same household structure than our study population i.e., same proportions of households of various household size from 1 to 8.
For given fixed values of B and Q, we simulated the final distribution of cases, i.e. the number k of infected persons in each household of size n.When the algorithm stops, the household members who remained uninfected are assigned to generation g∞.
Step 2: Estimation of B and Q from the synthetic data We ran our modeling framework to estimate B and Q from the synthetic data.Estimates and credible intervals (95% Crl) were then compared estimated values of B and Q with the known values set for generating the data.
These validation steps were applied for different sets of parameters B and Q. Results were very consistent, with all true values included in the estimated 95% CrI (Web Table 1).

Step 3: Adequacy of the simulation
We evaluated the adequacy of the simulated data with the final distribution of cases for the original data of the 8,165 households of the study population, for which serological status was available for all members and with no child aged ≤5 years.
We ran 1,000 simulations according to the algorithm presented above with the fixed parameters B = 0.955 and Q = 0.821, which were the B and Q values estimated from the main analysis.
The expected distribution of the number of infected individuals by household size is shown in Web Figure 4 and replicated the distribution of the EpiCov household data quite well.

Simulation of source of infection
For each household with at least one seropositive individual, we drew one sequence of viral introduction and subsequent within-household transmission from the probability distribution of all possible sequences of the household.We then estimated the number of infections acquired from extra-household exposure and the number of within-household transmission events in the drawn scenario.We simulated chains of transmission for the 8,165 households, for which serological status was available for all members and with no child aged ≤5 years, of the study population.

Web
Simulations were conducted 1,000 times with the fixed parameters B = 0.955 and Q = 0.821, which were the B and Q values estimated from the main analysis.We then combined the estimated mean number of infected individuals per household by household size.

Figure 1 .Web Figure 3 .Web Figure 4 .
Abbreviations: MCMC, Markov chain Monte Carlo; NUTS, No-U-Turn Sampler with the individual i (categorical): partner/spouse, mother, father, child < 12 years old, child ≥ 12 years old, grandparent, grandchild < 12 years old, grandchild ≥ 12 years old, sibling < 12 years old, sibling ≥ 12 years old, other family link, no family link -household size (categorical): 1, 2, 3, 4, ≥ 5 individuals -region (categorical): the 13 administrative regions of mainland France -immigration history (categorical): majority population, 1st-generation immigrant from Europe, 2nd-generation immigrant from Europe, 1st-generation immigrant from outside Europe, 2nd-generation immigrant from outside Europe.As the migration history was available only for the respondent member of the household and not all household members, this information was treated as a household-level covariate

Table 4 . Comparison of model performance and estimated parameters: adjustment for characteristics of the susceptible individual
Multiple models were run.They included key individual-level factors (i.e., age and sex of exposed individuals) that may be associated with risk of infection from extra-household exposures ('extrahousehold') and from a single infected household member ('intra-household').Lower WAIC and LOOIC scores indicate better model fit.

Table 5 . Comparison of model performance and estimated parameters: adjustment for characteristics of the potential infector and family ties
Multiple models were run.They included the age and sex of potential infectors, and family links between individuals that may be associated with risk of infection from a single infected household member ('intra-household').Lower WAIC and LOOIC scores indicate better model fit.

Table 6 . Comparison of model performance and estimated parameters: adjustment for socioeconomic characteristics
Multiple models were run.They included household characteristics that may be associated with risk of infection from extra-household exposures ('extrahousehold') and from a single infected household member ('intra-household').Lower WAIC and LOOIC scores indicate better model fit.
p_loo and p_waic, effective number of parameters for estimation of LOOIC and WAIC, respectively.a Difference in WAIC and LOOIC compared to the null model Web

Table 7 . Comparison of model performance and estimated parameters: adjustment for living conditions
Multiple models were run.They included household characteristics that may be associated with risk of infection from extra-household exposures ('extra-household') and from a single infected household member ('intra-household').Lower WAIC and LOOIC scores indicate better model fit.Leave-one-out information criterion; WAIC, Watanabe-Akaike information criterion; p_loo and p_waic, effective number of parameters for estimation of LOOIC and WAIC, respectively a Overcrowded housing defined as at least two people living in less than 18 m2 per person b Difference in WAIC and LOOIC compared to the null model Web

Table 8 . Comparison of model performance and estimated parameters: adjustment for region and immigration history
Multiple models were run.They included household characteristics that may be associated with risk of infection from extra-household exposures ('extra-household') and from a single infected household member ('intra-household').Lower WAIC and LOOIC scores indicate better model fit.

2 nd Generation Immigrant from Outside Europe
Leave-one-out information criterion; WAIC, Watanabe-Akaike information criterion; p_loo and p_waic, effective number of parameters for estimation of LOOIC and WAIC, respectively a Difference in WAIC and LOOIC compared to the null model