Body mass index and risk of COVID-19 diagnosis, hospitalisation, and death: a cohort study of 2 524 926 Catalans

Abstract Context A comprehensive understanding of the association between body mass index (BMI) and COVID-19 is still lacking. Objective To investigate associations between BMI and risk of COVID-19 diagnosis, hospitalisation with COVID-19, and death after a COVID-19 diagnosis or hospitalisation (subsequent death), accounting for potential effect modification by age and sex. Design Population-based cohort study. Setting Primary care records covering >80% of the Catalan population, linked to region-wide testing, hospital, and mortality records from March to May 2020. Participants Adults (≥18 years) with at least one measurement of weight and height. Main outcome measures Hazard ratios (HR) for each outcome. Results We included 2 524 926 participants. After 67 days of follow-up, 57 443 individuals were diagnosed with COVID-19, 10 862 were hospitalised with COVID-19, and 2467 had a subsequent death. BMI was positively associated with being diagnosed and hospitalised with COVID-19. Compared to a BMI of 22kg/m 2, the HR (95%CI) of a BMI of 31kg/m 2 was 1.22 (1.19-1.24) for diagnosis, and 1.88 (1.75-2.03) and 2.01 (1.86-2.18) for hospitalisation without and with a prior outpatient diagnosis, respectively. The association between BMI and subsequent death was J-shaped, with a modestly higher risk of death among individuals with BMIs ≤19kg/m 2 and a more pronounced increasing risk for BMIs ≥40kg/m 2. The increase in risk for COVID-19 outcomes was particularly pronounced among younger patients. Conclusions There is a monotonic association between BMI and COVID-19 diagnosis and hospitalisation risks, but a J-shaped one with mortality. More research is needed to unravel the mechanisms underlying these relationships.


Introduction
The coronavirus disease 2019 , the illness caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), was declared a global pandemic in March 2020.(1) A high body mass index (BMI) has previously been associated in a linear and non-linear fashion with an increased risk of multiple health outcomes such as metabolic and cardiovascular conditions, cancer, viral infections, and mortality. (2)(3)(4)(5) A better understanding of the relation between BMI and the progression of COVID-19 is essential for clinical management of patients and implementation of preventive strategies.
A review and meta-analysis of 75 studies indicated obesity (BMI ≥30kg/m 2 ) as a risk factor for severe COVID-19 and related mortality. (6) Additionally, two studies with data from a subsample of the UK Biobank and a New York hospital found that BMI was associated in a dose-response manner with an increased risk of testing positive for SARS-CoV-2 and in a J-shaped fashion with the risk of intubation or death, respectively. (7,8) These studies have provided relevant insights into this association. However, they have certain limitations that include being restricted to tested or hospitalised populations (increasing the risk of collider bias), having a small sample size, limitedly accounting for potential confounding, or dichotomizing BMI (with/without obesity).  This region has a universal taxpayer-funded primary care-based health system in which general practitioners have been the first point of contact for care throughout the pandemic. Electronic health records (EHRs) from primary care encompassing demographic, historical lifestyle information and disease diagnoses linked to SARS-CoV-2 Reverse Transcription Polymerase Chain Reaction (RT-PCR) test results, hospital records, and regional mortality data offer a unique opportunity to study the role of BMI in the course of COVID-19. We aimed to investigate the associations between BMI and risks of COVID-19 diagnosis, hospitalisation with COVID-19, and death after a COVID-19 diagnosis or hospitalisation (subsequent death), accounting for potential effect modification by age and sex, using EHR data from Catalonia.

Study design, setting and data sources
We conducted a cohort study from the 1st March to the 6th May 2020. We used prospectively collected primary care records from the Information System for Research in Primary Care (SIDIAP; www.sidiap.org) in Catalonia, Spain. SIDIAP includes data from the Institut Català de la Salut (ICS, Catalan Health Institute), the largest public primary healthcare provider of Catalonia (covering 5.8 million people, 80% of the population of Catalonia) since 2006 and is representative of the Catalan population in terms of age, sex, and geographic distribution.(11) SIDIAP includes high-quality data on anthropometric measurements, disease diagnoses, laboratory tests, demographic and lifestyle information. SIDIAP has been linked to COVID-19 RT-PCR test results, hospital records, and regional mortality data, and mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). (12) The latter allowed to structure the data in a standardised format, and apply A c c e p t e d M a n u s c r i p t 9 analytical tools developed by the open-science Observational Health Data Sciences and Informatics (OHDSI) network. (13) Multistate framework We addressed our objectives using a multi-state framework. Multi-state models allow for a description of the progression from a time origin until the occurrence of several events, extending on competing risk models by also describing transitions to intermediate events. (14) In the context of COVID-19, outpatient diagnoses and hospitalisations with the disease can be considered as intermediate events between not being (identified as) infected on one end to death on the other.
We structured our multi-state model in four states: general population, diagnosed (with COVID-19), hospitalised (with COVID-19), and death ( Figure 1). The following transitions were possible: general population to either diagnosed, hospitalised or death; diagnosed to either hospitalised or death; hospitalised to death. This approach is valuable in the context of the first wave of COVID-19 because it provides a more detailed overview of the interaction between individuals and the health system, respective of their BMI. This framework allows disentangling the association between BMI and risk of hospitalisation with COVID-19 differentiating direct hospitalisations (among the community) from indirect ones (among people already diagnosed with COVID-19 in primary care). Similarly, this approach distinguishes the risk of death related to BMI among individuals who interacted exclusively with primary care (only had an outpatient diagnosis) and those who interacted with secondary care (were hospitalised) before dying. Furthermore, this approach can reduce the risk of collider bias that can be induced by just assessing one transition of interest. (9) A c c e p t e d M a n u s c r i p t 10 Participants We identified all adults (aged ≥18 years) registered in the SIDIAP as of the 1st March 2020 with a BMI recorded at an age ≥18 years. We excluded individuals with more than one year of prior history available (to have sufficient time to capture participants' characteristics before study entry) and with a previous clinical diagnosis or positive test result for COVID-19. We also excluded those who were hospitalised or living in a nursing home on the 1st March 2020, because the transmission dynamics and frequency of testing/diagnosing of these sites differed from the community population, which was the focus of this study. (15,16) Finally, individuals without information on smoking and socioeconomic status were also excluded. The flow chart of inclusion and exclusion criteria for this study is presented in Figure S1 of the Supplementary Material.(17) The descriptive characteristics of the individuals excluded due to living in nursing homes is available in Table S1.(17) Individuals' follow-up period began on the 1st of March 2020 (index date) and ended for any given transition due to exit from the database (which refers to individuals moving out of the catchment area of SIDIAP), the occurrence of the event of interest or a competing event, or the end of the study period.

Variables
The exposure of interest was BMI as a continuous variable (kg/m 2 ). BMI was calculated using the weight and height of patients assessed in a standardized manner by general practitioners or nurses. (18) The exposure was assigned as the closest valid BMI (≥15kg/m 2 and ≤60kg/m 2 ) to the index date recorded between January 1st 2006 and February 29th 2020.
The characteristics of interest were sex, age, smoking status, socioeconomic status, and comorbidities. We extracted participants' sex (female, male), age (in years) at index date and smoking status (never, former or current smoker). We assessed socioeconomic status using the A c c e p t e d M a n u s c r i p t 11 Mortalidad en reas pe ue as spa olas y Desigualdades Socioeconómicas y Ambientales (MEDEA) deprivation index, which is calculated at the census tract level in urban areas of Catalonia. (19) This measure is categorized into quintiles for anonymization purposes, the first quintile represents the least deprived group of the population and the fifth the most deprived one. It also includes a rural category since the MEDEA index is not available for participants living in those areas. We identified the following comorbidities using the individual's medical history: autoimmune condition, chronic kidney disease, chronic obstructive pulmonary disease (COPD), heart disease, hyperlipidemia, hypertension, malignant neoplasm (excluding non-melanoma skin cancer) and type 2 diabetes. We selected these conditions based on their relevance to the obesity and COVID-19 research fields and their availability in the OMOP-CDM mapped version of the SIDIAP database and were defined as in a previous study conducted using SIDIAP data. (20)(21)(22)  We compared the baseline characteristics of the included individuals to those of the excluded due to unavailability of BMI, smoking status and/or the MEDEA deprivation index information using standardized mean differences (SMDs). We considered an |SMD| >0.1 indicate meaningful differences in the distribution of a given characteristic between the two groups. (24) We described the participants' time at risk at each state and the absolute number of outcomes observed for each transition, by WHO categories of BMI. We assessed the relationship between BMI and the risk of transitioning to a subsequent state in the multistate model by estimating causespecific hazard ratios (HRs) and 95% confidence intervals (CIs) using Cox proportional hazard regressions. We estimated three types of models: 1) with BMI as the sole explanatory variable (unadjusted models); 2) adjusted for age and sex; 3) adjusted for age, sex, smoking status and the MEDEA deprivation index (fully adjusted models). We used a directed acyclic graph to guide decisions on the control for confounding ( Figure  We calculated the Bayesian Information Criterion (BIC) and we favoured the model with the lowest BIC values. We compared the model where BMI was fitted with a non-linear term against a linear model using a likelihood ratio test. We fitted age in the adjusted models using the same strategy as for BMI. We checked the proportional hazard assumptions for the variables included in the models by visual inspection of log-log survival curves. We did not model the transition from the general population to death because we were interested in deaths related to COVID-19 (subsequent deaths) which we captured by having gone through the diagnosed or hospitalised states ( Figure 1). However, we considered death among the general population as a competing risk by censoring people at their death.
We assessed effect modification by introducing interaction terms (one at a time) between BMI and age and sex. We stratified the models in three categories of age (18-59, 60-79, and ≥80 years) and sex. As secondary analyses, we re-estimated the models fitting BMI in WHO categories and we assessed the effect of obesity-related comorbidities (hypertension, type 2 diabetes and hyperlipidemia) in the studied associations by introducing interaction terms (one at a time) between BMI and each comorbidity.
For the main analyses, we conducted a complete case analysis (we only included individuals with complete information on BMI and the covariates of interest). To explore the possibility of selection bias due to excluding those with missing data, in a sensitivity analysis we re-estimated the main models after multiple imputations (using predictive mean matching, with 5 imputations drawn) of missing data on BMI, smoking status, and/or the MEDEA deprivation index.  and subsequent death for all studied transitions in the fully adjusted models (all p for non-linearity ≤0.001) (Figure 2). Results for the crude and adjusted for age and sex models are shown in Figure S3 and    (Table 3).
BMI was strongly associated with an increased risk of hospitalisation with COVID-19, either with or without a prior outpatient diagnosis (Figure 2 Table   3).
The association between BMI and risk of death either after an outpatient diagnosis or a hospitalisation with COVID-19 was J-shaped ( Figure 2 There was evidence of effect modification by age and sex for four out of five studied transitions (p for interaction <0.001) (Figure 3). The risk of COVID-19 outcomes related to increased BMI was higher for those aged ≤59 years, compared to those in older age groups ( Figure 3,  (Figure 3, Table S5 A of the Supplementary Material).(17) BMI was not associated with mortality after an outpatient diagnosis for those in the oldest age group, but there was a pronounced U-shaped association for those aged ≤59 years and a J-shaped association for those aged between 60 and 79 years ( Figure 3, The assumption of proportionality was violated for age in the first transition. To account for this, we stratified the main model by calendar month. The risk of COVID-19 diagnosis related to increased BMI was slightly higher for those diagnosed in March compared to April ( Figure S4, Table S6

Discussion
In this large cohort study that included 2 524 926 participants from the general population in Catalonia, we found a monotonic association between BMI and COVID-19 diagnosis and hospitalisation risks and a J-shaped one with mortality. The associations between BMI and COVID-19 outcomes were stronger for those aged ≤59 years and similarly shaped among females and males, with specific exceptions.
A c c e p t e d M a n u s c r i p t 19 The strengths of this study include being a large longitudinal study that investigates the association between BMI and the course of the COVID-19 disease containing individual detailed BMI information and incident COVID-19 outcomes recorded in diverse healthcare settings from a large and representative population. Also, the possibility to investigate COVID-19 trajectories in a single and sufficiently powered dataset, including systematic investigation of non-linearity and effect modification, is a major strength. Further, our results were robust when we explored the violation of the models' assumptions, the possibility of selection bias and exposure misclassification.
This study also has weaknesses. The exposure was captured using a 14-year window, which for certain individuals relied on the assumption that BMI measurements were constant for a long period. However, we observed that the median of time elapsed since the BMI measurement was 1.7 years (interquartile range: 0.6 to 4.0) for the included participants. Moreover, in the sensitivity analyses where we used BMI measurements that were no older than five or two years the obtained results were very similar to those of the main analysis. We defined COVID-19 cases as individuals who had a clinical diagnosis of the disease. Although this could have resulted in false positives, we decided not to require a confirmation of an RT-PCR positive test because testing was mainly restricted to severe cases of COVID-19 and specific at-risk populations during the first wave of the pandemic. This decision resulted in including only COVID-19 diagnoses of individuals who interacted with the health system, missing asymptomatic patients or individuals who did not seek medical care.
However, Catalonia has a tax-funded almost universal healthcare system. Further, the results of this study are not generalizable to people living in nursing homes since we decided to exclude this subgroup of the population. We did not have the cause of death (only death after being diagnosed/hospitalised with COVID-19) which prevented us from attributing deaths to the disease.
However, subsequent deaths were more frequent and happened more quickly than the deaths among the general population. The cumulative incidence of death was 0.2% in the general A c c e p t e d M a n u s c r i p t 20 population, compared to 2.4% and 19.2% in those diagnosed and hospitalised with COVID-19, respectively ( Table 2). The median time to death after a COVID-19 diagnosis or hospitalisation was much shorter (35 and 37 days, respectively) than for those in the general population (67 days) ( Figure S12), which suggests subsequent deaths were COVID-19 related.(17) Additionally, we will have missed individuals who died with COVID-19 but who were not identified as having been diagnosed or hospitalised with the disease. The likelihood of this outcome misclassification was probably reduced with the exclusion of nursing homes' residents. We did not have data on hospital visits that did not lead to an overnight stay nor admission to intensive services units; this data can be useful to further study the progression of COVID-19 in detail. We did not have information on individual socioeconomic status nor the type of occupation of the participants; we tried to minimize this limitation by including the MEDEA deprivation index. Finally, the use of routinely collected data for research can raise concerns about data quality; however, BMI and COVID-19 data from the SIDIAP have successfully been repurposed for research. (22,33,34) The mechanisms by which higher BMI can increase COVID-19 severity include physical mechanisms (e.g., altered ventilation due to reduced diaphragm excursion), chronic inflammation and impaired immune function. (6) Higher BMI is also a risk factor for several medical conditions that could mediate the association between adiposity and the risk of COVID-19 severity such as type 2 diabetes or hypertension (which were also common in this study among patients with obesity). (6,21) Our findings support the latter hypothesis: the positive association between BMI and the risk of being hospitalised with COVID-19 was attenuated among people with hypertension or type 2 diabetes (compared to those without). This suggests that shared biological mechanisms between obesity, hypertension and type 2 diabetes might partially explain the higher susceptibility to COVID-19 hospitalisation among individuals living with these conditions. Other proposed explanations include delayed seek for medical care among individuals with obesity due to fear of stigmatization (e.g., 26% A c c e p t e d M a n u s c r i p t 21 and 39% of those diagnosed and hospitalised without an outpatient diagnosis of COVID-19, respectively, had obesity) and the difficulty of care in hospital settings for supportive therapies. (35,36) Obesity has been associated with the risk of SARS-CoV-2 infection and COVID-19 diagnosis. Our findings revealed a much stronger association between BMI and COVID-19 diagnosis among those aged ≥80 years and a modestly higher risk among males. While our findings are congruent with another study of the UK Biobank regarding sex differences in risk, no effect modification by age group (younger vs older than 70 years) was reported there. (38) The underlying age distribution of those participants could explain this discrepancy; unfortunately, this information was unavailable.
Our findings of a strong positive association between BMI and risk of COVID-19 hospitalisation are in line with a large meta-analysis and a population-based study conducted in another Spanish region (Navarra). (6,39) Our results also suggest the necessity to lower BMI cut-offs to establish risk groups for disease severity.
The risk of hospitalisation with COVID-19 was systematically higher for those aged ≤59 years which is congruent with two hospital-based studies from the US. One reported a negative correlation between BMI and age among COVID-19 patients in six hospitals and another a positive association while other studies also found this trend, these were not significant, likely due to their smaller sample sizes. (7,45,46) We also found that mortality risk related to an increased BMI was higher among individuals aged ≤69 years compared to older adults. Four previous studies are much in line with our findings, while a meta-analysis reported the opposite. (7,38,42,45,46) The risk of death after a hospitalisation with COVID-19 associated with BMI was higher among females which is congruent with a UK Biobank A c c e p t e d M a n u s c r i p t 23 study. (38) However, a study performed in a New York hospital found a higher risk among males and others found opposite or null differences by sex. (7,42,45,46) We provided a comprehensive analysis of the association between BMI and the course of COVID-19 during the first wave of the pandemic in Catalonia. Our analyses revealed that BMI is positively associated with being diagnosed and hospitalised with COVID-19, and in a J-shaped fashion with the risk of death following a COVID-19 diagnosis or hospitalisation; the associations were particularly   M a n u s c r i p t 34      Tables   Table 1. Descriptive statistics of the study population by body mass index categories Table 2. Time at risk, absolute event rates, and cumulative incidence over time by body mass index categories Table 3. Hazards ratios of COVID-19 outcomes related to body mass index, with 95% CIs M a n u s c r i p t 25 M a n u s c r i p t 26 A c c e p t e d M a n u s c r i p t