Association between household composition and severe COVID-19 outcomes in older people by ethnicity: an observational cohort study using the OpenSAFELY platform

Abstract Background Ethnic differences in the risk of severe COVID-19 may be linked to household composition. We quantified the association between household composition and risk of severe COVID-19 by ethnicity for older individuals. Methods With the approval of NHS England, we analysed ethnic differences in the association between household composition and severe COVID-19 in people aged 67 or over in England. We defined households by number of age-based generations living together, and used multivariable Cox regression stratified by location and wave of the pandemic and accounted for age, sex, comorbidities, smoking, obesity, housing density and deprivation. We included 2 692 223 people over 67 years in Wave 1 (1 February 2020–31 August 2020) and 2 731 427 in Wave 2 (1 September 2020–31 January 2021). Results Multigenerational living was associated with increased risk of severe COVID-19 for White and South Asian older people in both waves [e.g. Wave 2, 67+ living with three other generations vs 67+-year-olds only: White hazard ratio (HR) 1.61 95% CI 1.38–1.87, South Asian HR 1.76 95% CI 1.48–2.10], with a trend for increased risks of severe COVID-19 with increasing generations in Wave 2. There was also an increased risk of severe COVID-19 in Wave 1 associated with living alone for White (HR 1.35 95% CI 1.30–1.41), South Asian (HR 1.47 95% CI 1.18–1.84) and Other (HR 1.72 95% CI 0.99–2.97) ethnicities, an effect that persisted for White older people in Wave 2. Conclusions Both multigenerational living and living alone were associated with severe COVID-19 in older adults. Older South Asian people are over-represented within multigenerational households in England, especially in the most deprived settings, whereas a substantial proportion of White older people live alone. The number of generations in a household, number of occupants, ethnicity and deprivation status are important considerations in the continued roll-out of COVID-19 vaccination and targeting of interventions for future pandemics.

the address fields -Set address IDs on the land registry house sales data using the same Address table from 2) joining on house name/number + road + post code -End all patient addresses at properties where the patient address start date was before the property was sold (under the assumption that in the majority of cases that means the occupants moved out) -If a patient address has an unset start date (imported data), use the registration start date for this step 5) Get the latest "active" patient address per patient -Filter to those which were active on 01 Feb 2020 or recorded since (need to do this again now we've ended some based on property sales) -Filter to the latest per patient 6) Create the HouseholdMember table -Insert a row for each patient address from 5), using the Address_ID as the Household_ID (one household per address) 7) Create the Household table -Select the distinct Address_IDs as the Household_ID into the Household table (one household per address) -Set the size based on the number of members -Use the PotentialCareHomeAddress table to set the CareHome flag -Check the address fields to identify NFA / Unknown addresses (e.g. postcode "ZZ99 ...", house name "NFA", road "Unknown", ...)

Analysis of the impact of household size (number of occupants)
In our main analysis of household composition we did not adjust for household size (number of occupants) as conceptually household size is a mediator of household composition. We also noted that not all categories of the household composition variable had a corresponding household size category (Supplementary Table S5), meaning that adjusting for household size would artificially reduce estimates downwards.

Household TPP coverage
In England, not all residents of a household are necessarily registered with the same general practice, which means it is possible that not all residents of a household would appear in the same software system. This means that the number of residents attributed to a household in the TPP register is not necessarily equal to the true total number of people in the household. While all households in our study will have included at least one 67+ year old registered at a practice using TPP software, it is possible that some of the other people in some of the households may be registered with other practices that use software other than TPP. For these households, the calculated TPP household size (i.e. the count of records in TPP under the same household ID) will be different from the household size defined in the Master Patient Index (MPI) for the address covered by the TPP household ID. Those people who are registered to non-TPP practices will not have been counted when we created our household composition exposure variable, and this measurement error has the potential to bias results. A "TPP coverage" flag is provided for each household, which compares the TPP household size with the number of records in the Master Patient Index for the same address and is used to indicate the % of occupants of that household who are registered with general practices that use TPP.
For our main analysis, we included all households, irrespective of TPP coverage. In order to assess the impact of this design choice on our results, we performed a sensitivity analysis where we only included households with 100% TPP coverage. Our primary outcome in the analysis was a combined outcome of COVID death or hospitalisation. In the protocol we specified these as separate outcomes but not as a combined outcome.
This additional outcome definition was included in order to increase power to detect effects in all ethnic groups analysed.
We included a step to compare the distribution of household sizes by ethnicity with ONS figures from the 2011 census.
This was to ensure that the TPP method for assigning people to households was correct.
We added a number of ways of identifying people in care homes to the final analysis (vs only a single method in the protocol).
This change was made based on best practice for identifying care homes (Schultze et al 2021) that was published during preparation of our own analysis.
In the analysis we looked for evidence of interaction between ethnicity and any other variable, and if there were interactions found between ethnicity and any household-level variable (i.e. deprivation or housing density) presented strata for these, whereas in the protocol we only said we would stratify by deprivation.
This was to ensure we did not miss any potentially important interactions with ethnicity.  Categorised as 1-2, 3-5, 6 or more.

Comorbidities
Presence of either 0, 1 or 2 or more comorbidities (see Table S3 below for further details)  Multiple 67+ year olds (max 4*) *largest allowable size of house with only 67+ year olds in in this study

Software and Reproducibility
This analysis was delivered through the OpenSAFELY platform: codelists and code for data management and data analysis were specified using the OpenSAFELY tools; then transmitted securely to the OpenSAFELY-TPP platform within TPP's secure environment, where they were executed against local patient data; summary results were then reviewed for disclosiveness, released, and formatted for the final outputs. All code for the OpenSAFELY platform for data management, analysis and secure code execution is shared for review and reuse under open licenses at GitHub.com/OpenSAFELY. Data management and analysis was performed using Python 3.8 and Stata 16.1. Code for data management and analysis as well as codelists archived online https://github.com/opensafely/hh-classification-research.

Supplementary material for Results
Figure S1: Flow diagram of cohort with numbers excluded at different stages for wave 1 and wave 2    (UTLA) and adjusted for sex, number of comorbidities, categories of housing density (rural or urban setting), smoking status, socio-economic status and includng an interaction between ethnicity and age (as well as the interaction between household composition and ethnicity presented here). Note 5: Stratified on UTLA and adjusted for: sex, smoking, housing density and number of comorbidities and including interactions between ethnicity and: IMD, age and obesity (as well as the interaction with household composition presented here).

Results for the separate severe COVID-19 outcomes (death and hospitalisation)
Results for analysis of the separate severe COVID-19 outcomes (death and hospitalisation) for wave 1 and wave 2 are provided in Figure S2. Associations between household composition and both of the component outcomes were generally similar to the association between household composition and the (combined) severe COVID-19 outcome for all ethnicities, with the most notable differences for White and South Asian 67+ for the death due to COVID-19 outcome in wave 2. In comparison to the White severe COVID-19 analysis, there was no trend for increasing hazard of the outcome with increasing multigenerational living for COVID-19 death (and all HRs were lower), while for South Asian people there was a steeper trend and larger HRs across the multigenerational categories.

Figure S2. Association between household composition and (1) hospitalisation due to COVID-19 and (2) death due to COVID-19 by ethnicity for wave 1 and wave 2 of the pandemic in England
Wave 1 models stratified by location (UTLA) and adjusted for sex, number of comorbidities, categories of housing density (rural or urban setting), smoking status, socio-economic status and including an interaction between ethnicity and age (as well as the interaction between household composition and ethnicity presented here). Wave 2 models stratified on UTLA and adjusted for: sex, smoking, housing density and number of comorbidities and including interactions between ethnicity and: IMD, age and obesity (as well as the interaction with household composition presented here).    Figure S3.

Association between household composition and severe COVID-19 (death or hospitalisation due to COVID-19) by the component White and South Asian ethnicity categories for wave 2 of the pandemic in England
Model stratified on UTLA and adjusted for: sex, smoking, housing density and number of comorbidities and including interactions between ethnicity and: IMD, age and obesity (as well as the interaction with household composition presented here).

Death from a cause other than COVID-19
IMD (p test for interaction=0·610 2 ) 1 (affluent) 1·41 (0·85-2·33) Note 1: Stratified on UTLA and adjusted for: sex, smoking, housing density and number of comorbidities and including interactions between ethnicity and: household composition, age and obesity (as well as the interaction between ethnicity and IMD presented here). Note 2: LRT test Table S11: Results of senstivitiy analyses (for the death or hospitalisation from COVID-19 combined outcome) relating to (1) including a 5 year buffer between the 67+ year old generation and the next youngest generation (2) only including people who lived in households with 100% TPP coverage (3) using multiple imputation to account for missing ethnicity and (4) a complete records analysis for BMI and smoking.