Interplay between demographic, clinical and polygenic risk factors for severe COVID-19

Abstract Background We aimed to identify clinical, socio-demographic and genetic risk factors for severe COVID-19 (hospitalization, critical care admission or death) in the general population. Methods In this observational study, we identified 9560 UK Biobank participants diagnosed with COVID-19 during 2020. A polygenic risk score (PRS) for severe COVID-19 was derived and optimized using publicly available European and trans-ethnic COVID-19 genome-wide summary statistics. We estimated the risk of hospital or critical care admission within 28 days or death within 100 days following COVID-19 diagnosis, and assessed associations with socio-demographic factors, immunosuppressant use and morbidities reported at UK Biobank enrolment (2006–2010) and the PRS. To improve biological understanding, pathway analysis was performed using genetic variants comprising the PRS. Results We included 9560 patients followed for a median of 61 (interquartile range = 34–88) days since COVID-19 diagnosis. The risk of severe COVID-19 increased with age and obesity, and was higher in men, current smokers, those living in socio-economically deprived areas, those with historic immunosuppressant use and individuals with morbidities and higher co-morbidity count. An optimized PRS, enriched for single-nucleotide polymorphisms in multiple immune-related pathways, including the ‘oligoadenylate synthetase antiviral response’ and ‘interleukin-10 signalling’ pathways, was associated with severe COVID-19 (adjusted odds ratio 1.32, 95% CI 1.11–1.58 for the highest compared with the lowest PRS quintile). Conclusion This study conducted in the pre-SARS-CoV-2-vaccination era, emphasizes the novel insights to be gained from using genetic data alongside commonly considered clinical and socio-demographic factors to develop greater biological understanding of severe COVID-19 outcomes.


Introduction
Coronavirus disease 2019 , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first reported in December 2019 1 and rapidly became a threat to public health, medical systems and economies worldwide. In 2020, $13% of patients diagnosed with COVID-19 (suspected or confirmed) in England required hospitalization, with a mortality rate of $5%. 2 Risk factors for severe COVID-19, involving hospitalization or death, include older age, male sex, Black and minority ethnicity, current and prior smoking, high socio-economic deprivation and obesity. [3][4][5] Morbidities that have been associated with poor COVID-19 outcomes in both population and hospital studies, include: cardiovascular disease (CVD), hypertension, diabetes, chronic respiratory disease (CRD), chronic kidney disease (CKD), malignancies and neurological disease, including dementia. 3,4,6,7 It remains unclear whether patients with autoimmune disease are at increased risk of contracting SARS-CoV-2 infection or developing severe COVID-19 outcomes, which, in light of the need for immunosuppressants, is crucial to inform the optimal management of these conditions. 4 More granular data on the risks associated with individual immunosuppressants have come from the Global Rheumatology Alliance. 8 As healthcare systems increasingly embrace genomic medicine, there are opportunities to incorporate polygenic risk scores (PRSs) that summarize an individual's genetic propensity to adverse treatment outcomes and disease complications into clinical decision-making. To date, published PRSs have primarily been developed in populations of White European ethnicity, limiting their ability to predict genetic risk in other ethnicities. Differences in linkage disequilibrium (LD) structure and allele frequencies between ethnic groups may reduce the efficacy of such PRSs in non-European individuals and careful optimization and testing are required. 9 Several genome-wide association studies (GWASs), performed on both White European and trans-ethnic cohorts, have identified genome-wide significant (P-value < 5 Â 10 -8 ) associations with 27 genetic loci (to date) for a range of COVID-19 phenotypes. [10][11][12][13] Many of these loci are associated with autoimmune diseases and lung function, corroborating a potential role for these pathways in an individual's response to SARS-CoV-2.
In this study, we examined risk of hospitalization, critical care admission or death of UK Biobank participants diagnosed with COVID-19. We identified associated clinical and socio-demographic risk factors, including comorbidities and historic immunosuppressant use. We then constructed a number of PRS using the largest COVID-19 susceptibility GWAS data sets (Release 5), provided by the COVID19-hg consortium, 10 and evaluated the performance of our optimized PRS in independent UK Biobank data in conjunction with the clinico-demographic risk factors.

Methods
The study was reported following STROBE guidelines (Supplementary Methods and Supplementary Table S1, available as Supplementary data at IJE online). 14

Key Messages
• We derived and optimized a polygenic risk score (PRS) that was found to be associated with severe COVID-19 and death in both White European and trans-ethnic populations, after adjusting for clinico-demographic factors reported at UK Biobank enrolment (2006)(2007)(2008)(2009)(2010). • We identified antiviral and immunoregulatory immune pathways that were associated with COVID-19 disease severity.
• The magnitude of risk for the highest PRS quintile was equivalent to that reported for well-known risk factors, such as living in the most deprived areas or having cardiovascular disease.
• In addition to known risk factors, risk of severe COVID-19 was increased in immunosuppressant users, patients with autoimmune diseases, those with a higher co-morbidity count, as well as a positive association with each additional 5-kg/m 2 increment of body mass index between 25 and 40 kg/m 2 .
• Genetic variation in immune pathways offers the opportunity for guiding therapeutic strategies in severe COVID-19 and ultimately may improve risk stratification for patients requiring immunosuppressant therapy.

Data source
This study used individual patient data from UK Biobank, linked to COVID-19 data sets from laboratories (test results), hospitals (inpatient and critical care admissions) and death certificates. UK Biobank is a population-based prospective study linking individual genetic, biomarker, survey and electronic health record data from >500 000 UK participants, aged 40-69 years at recruitment (2006-2010) when self-report questionnaires and biological measurements were undertaken. 15

Study population
The study population included UK Biobank participants who provided baseline assessment data, were alive at the start of the study period and had not withdrawn consent. Participants from assessment centres outside England were excluded, as COVID-19 linked data were unavailable. Study follow-up commenced at the study start date (1 January 2020) and ended at the earliest of the study end date (31 December 2020) or upon death.
COVID-19 diagnosis was defined by having a positive laboratory test result or an ICD-10 code U071 or U072 recorded in hospital or death certificate data. Cases of COVID-19 diagnosed <7 days prior to the end of the study period were excluded, to enable a minimal follow-up period for outcome recording.
The White European subpopulation was defined as those who self-reported 'White' ethnicity at baseline and fell within the European cluster based on principal components analysis of genotypic data (Supplementary Methods, available as Supplementary data at IJE online).

Study outcomes
The primary outcome was severe COVID-19, a composite outcome defined as the earliest of hospital or critical care admission within 28 days of COVID-19 diagnosis or death within 100 days following COVID-19 diagnosis. Hospitalizations or critical care admissions reported 1-3 days prior to COVID-19 diagnosis were included, to allow for delays in laboratory testing (i.e. weekends). The secondary outcome was death within 100 days following COVID-19 diagnosis. In a 'post-hoc' analysis, a composite of non-fatal severe COVID-19 disease (i.e. the earliest of hospital or critical care admission within 28 days of COVID-19 diagnosis) was evaluated.
Risk factors studied included demographics, immunosuppressant use and co-morbidities, including autoimmune disease, co-morbidity count and PRS. Age was determined at the date of first COVID-19 diagnosis for the COVID-19 sub-cohort and at the study start for the other UK Biobank patients. Demographics were measured or self-reported by participants at their first UK Biobank assessment (sex: female or male; ethnicity: White, Black or other ethnic group; smoking status: never, former or current; Townsend deprivation index 16 (derived by UK Biobank from self-reported postcode): quintiles; BMI: <18.5, 18.5 to <25, 25 to <30, 30 to <35, 35 to <40 or !40 kg/m 2 ). Immunosuppressant use was self-reported and defined as oral glucocorticoid, diseasemodifying anti-rheumatic drug or other immunosuppressant exposure at enrolment (Supplementary Table S2, available as Supplementary data at IJE online). Autoimmune disease and other co-morbidities [CVD, CRD, CKD, diabetes, hypertension, chronic liver disease (CLD) and neurological disease] were defined using self-reported assessment data (Supplementary Table S3, available as Supplementary data at IJE online). The count of other co-morbidities was reported as 0, 1 or !2.
Details of genotyping, imputation and quality control (QC), including marker-based and sample-based filters, are reported in Supplementary Methods (available as Supplementary data at IJE online). First, three PRSs (PRS e1 , PRS e2 and PRS e3 ) were constructed using effect sizes from the European-only COVID-19 vs population susceptibility GWAS, conducted by COVID19-hg (Release 5, excluding UK Biobank samples). 10 PRS e1 was built using the clumping and thresholding approach implemented by PRSice v2.3.3. 17 PRS e2 was built using SNPs previously associated (P < 1 Â 10 -5 ) with COVID-19 susceptibility/severity phenotypes (Supplementary Table S4, available as Supplementary data at IJE online). PRS e3 combined PRS e1 and PRS e2 in a single risk score, removing duplicate loci. PRSs were tested for association with the severe COVID-19 composite, compared with non-severe COVID-19 (positive RT-PCR test result, with no hospitalization or death) in both the White European and trans-ethnic UK Biobank cohorts (ensuring no overlap between the PRS optimization and testing cohorts), using logistic regression to determine which of PRS e1, PRS e2 and PRS e3 was the best severity predictor (Supplementary Methods, available as Supplementary data at IJE online).
This procedure was repeated using effect sizes from the trans-ethnic COVID-19 vs population susceptibility GWAS (COVID19-hg, Release 5) to produce PRS t1 , PRS t2 and PRS t3 . These PRS were tested for association with severe COVID-19 in the White European and trans-ethnic UK Biobank cohorts to determine the best predictor of COVID-19 severity. We also constructed a previously reported PRS (PRS d ) 18 based on COVID19-hg Release 2, for comparative purposes. To investigate PRS association in the UK Biobank trans-ethnic cohort in more detail, a meta-analysis was performed, testing association of the

Statistical analyses
We described characteristics of UK Biobank participants, the COVID-19 sub-cohort and sub-cohorts with study outcomes. The median duration of follow-up (days) for the COVID-19 sub-cohort was calculated from the date of COVID-19 diagnosis.
Time to event (primary and secondary outcomes) was defined as the number of days between the date of COVID-19 diagnosis and the date at which the event occurred. Where participants were hospitalized and/or admitted to critical care either for 1-3 days prior to, or on, the date of COVID-19 diagnosis, the time to event was set to <1 day. Survival analyses were (i) performed on the transethnic UK Biobank COVID-19 cohort and (ii) restricted to the White European subpopulation. Cumulative probabilities of study outcomes up to 100 days were calculated using Kaplan-Meier survival methods, both overall and stratified by demographics, immunosuppressant use, autoimmune disease, co-morbidities, co-morbidity count and PRS quintile (except for co-morbidities reported by fewer than five patients in each category).
We used logistic regression to estimate the risk of the severe COVID-19 composite, and Cox regression to estimate the risk of death within 100 days post-diagnosis. In Cox regression, the proportional hazards assumption was assessed for each variable based on the Schoenfeld residuals. 19,20 Age was modelled as a continuous variable in logistic regression and as cubic splines (three knots) in Cox regression (hazard ratios reported for ages 55, 60, 65, 70 and 75 years). The Akaike information criterion informed model selection and how continuous variables were modelled. Risk factors with overall likelihood ratio test P-value of <0.1 in age-adjusted models were included in models adjusted for (i) clinico-demographic factors (either using co-morbidities or co-morbidity count) and (ii) clinicodemographic factors (using co-morbidities) and PRS. In PRS-adjusted models, the PRS was modelled as a continuous variable. Hazard ratios and odds ratios are also reported for PRS quintiles, using PRSs constructed for UK Biobank COVID-19 cases in preceding genetic analyses. Patients with missing demographics or PRSs were excluded from the regression analyses.
All analyses were performed using R Version 4.0.4 and Microsoft SQL 2017.
Pathway analysis FUMA v1.3.6a software 21 was used to provide functional annotations of the SNPs in PRS e2, which was used in downstream analyses (Supplementary Methods, available as Supplementary data at IJE online).
Of the seven PRSs investigated, PRS e2 (based on associated SNPs from previous studies and optimized to European data) was the most strongly associated with COVID-19 severity in UK Biobank and explained the greatest estimated variance in both the White European (P ¼ 6.23 Â 10 -4 , R 2 ¼ 2.3 Â 10 -3 ) and trans-ethnic (P ¼ 9.81 Â 10 -4 , R 2 ¼ 1.87 Â 10 -3 ) cohorts (Supplementary Results, Supplementary Table S5, available as Supplementary data at IJE online). We conducted a metaanalysis of the association between PRS e2 and COVID-19 severity across individual ethnic groups in UK Biobank. The total coefficient for the fixed effects model was 0.08 Figure S1 and Supplementary Table S6, available as Supplementary data at IJE online). PRS e2 was used in all subsequent analyses.
The cumulative probability of severe COVID-19 was 27.1% (95% CI 26.1-28.0%) and estimates for demographics, immunosuppressant use, PRS e2 quintile and co-morbidity count are shown in Supplementary Tables S7-S13 and Supplementary Figure S2 (available as Supplementary data at IJE online). Cumulative risk was also increased in patients with autoimmune disease and with individual co-morbidities. The results were similar for the cumulative probability of death within 100 days   Age was associated with risk of death within 100 days following COVID-19, even in adjusted models (Table 3  and Supplementary Table S17, available as Supplementary data at IJE online). In clinico-demographic and PRS e2 -adjusted models, risk was higher in men than in women [adjusted hazard ratio (AHR) 1.69, 95% CI 1.43-2.01], in Black than in White ethnicity (AHR 2.31, 95% CI 1.51-3.53), in former or current smokers (AHR 1.20, 95% CI 1.01-1.43; AHR 1.79 95% CI 1.41-2.28, respectively) than in never smokers. The clinico-demographic and PRS e2 -adjusted risk increased with increasing levels of obesity compared with a BMI of 18.5-24.9 and was higher in immunosuppressant users (AHR 1.51, 95% CI 1.08-2.11). A positive association was found for CVD in the ageadjusted model (AHR 1.34, 95% CI 1.12-1.60) but not the clinic-demographic or clinic-demographic and PRS e2adjusted models. The clinico-demographic and PRS e2 -adjusted models showed associations for diabetes (AHR 1.40, 95% CI 1.12-1.75) and hypertension (AHR 1.30, 95% CI 1.10-1.53) but not for autoimmune disease or other co-morbidities. There was also a positive association with increasing co-morbidity count (e.g. !2 compared with 0: AHR 1.68, 95% CI 1.36-2.08). Similar associations were found in the White European subpopulation, except for socio-economic deprivation, BMI and immunosuppressant use (Supplementary Table S18, available as Supplementary data at IJE online). In both cohorts, the estimated effects of clinico-demographic risk factors showed little change when adjusted for PRS e2 .

Pathway analyses
Genetic markers of PRS e2 were physically mapped (within 10 kb) to 134 genes by Ensembl VEP and an additional 23 genes by FUMA, including the previously reported OAS1-OAS3 gene cluster (Supplementary Table S19, available as Supplementary data at IJE online). 11 Furthermore, the MAGMA gene-based test (executed in FUMA) found six genes that were enriched with SNPs from PRS e2 at the Bonferroni-corrected significance level (P < 2.6 Â 10 -6 ; Supplementary Methods, available as Supplementary data at IJE online). These genes included LZTFL1, OAS1, OAS3, FYCO1, XCR1 and SLC6A20 (Supplementary Clinico-demographic-adjusted model included age (as continuous), sex, ethnicity, smoking status, Townsend deprivation quintile, body mass index, immunosuppressant use, autoimmune disease and co-morbidities. In this model, co-morbidity count is adjusted for these variables excepting co-morbidities.
Clinico-demographic and PRS e2 -adjusted model included PRS e2 (as continuous) in addition to the variables included in the clinico-demographic-adjusted model (listed above). In this model, PRS e2 (as continuous) had a significant P-value (<0.001). In this model, co-morbidity count is adjusted for PRS e2 (as continuous) in addition to the variables included when modelling co-morbidity count in the clinico-demographic-adjusted model (listed above). a Count of the following co-morbidities: cardiovascular disease, chronic respiratory disease, chronic kidney disease, diabetes, hypertension, chronic liver disease, neurological disease. P-value from the overall likelihood ratio test for association. BMI, body mass index; CKD, chronic kidney disease; CLD, chronic liver disease; CRD, chronic respiratory disease; CVD, cardiovascular disease; NA, not applicable; OR, odds ratio; PRS e2 , White European polygenic risk score 2. Table 3 Age-adjusted, clinico-demographic-adjusted and clinico-demographic and polygenic risk score (PRS)-adjusted hazard ratios of death in patients diagnosed with COVID-19 (N ¼ 8325)

Discussion
This observational study was the first to investigate the association of clinico-demographic and genetic risk factors with severe COVID-19 and death in patients diagnosed with COVID-19. In this pre-COVID-19 cohort of 502 489 individuals we have, to our knowledge, derived and optimized the best-performing, publicly available PRS for the prediction of severe COVID-19 in both White European and trans-ethnic populations, which is independent of known clinico-demographic factors. PRS e2 consisted of 133 SNPs, was associated with severe COVID-19 in both populations and was enriched for SNPs in antiviral and immune-response pathways.
In this pre-SARS-CoV2 vaccination cohort, the risk of severe COVID-19 was associated with PRS e2 even after adjustment for clinico-demographic factors, suggesting it captures a different component of COVID-19 severity risk. Based on effect sizes, the magnitude of risk associated with the highest PRS e2 quintile for severe COVID-19 was equivalent to that afforded by well-known risk factors, such as living in the most deprived areas and having CVD.
The increased risk of severe COVID-19 with age, male sex, Black ethnicity, high BMI, CVD, CRD, diabetes, hypertension, CLD and neurological disease corroborates other studies. 3,22 We also found increased risk of severe COVID-19 from being a former or current smoker, immunosuppressant use and higher co-morbidity count. We examined the association of BMI with severe COVID-19 and death with more granularity than previous studies, 4,23 showing the increasing risk associated with each additional 5 kg/m 2 of BMI between 25 and 40 kg/m 2 .
The increased risk of death with age, male sex, Black ethnicity, former or current smoking, high BMI or having CVD, diabetes or hypertension corroborates the findings of an earlier UK Biobank study. 24 We also found increased risk of death with socio-economic deprivation, immunosuppressant use and higher co-morbidity count. The smaller sample size limited the power to detect associations with risk of death for less prevalent co-morbidities, but trends towards increased risk were broadly comparable to those observed with severe COVID-19. To date, the relative safety of different immunosuppressants in the setting of a global pandemic has not been considered in health economic analyses or treatment decisions. PRS e2 performed well in both a European and trans-ethnic severe COVID-19 cohort, although its association with individual ethnicities demonstrated some variation in effect size, likely due to smaller sample sizes and differences in LD structure between ethnic populations. This work attests to the validity of combining GWA evidence from multiple studies and related traits (e.g. COVID-19 severity, susceptibility) to collectively predict genetic risk and illustrates the value of investigating SNP/trait associations using a more liberal threshold in summary statistics (in this case, P < 1 Â 10 -5 ), highlighting the utility of PRSs in summarizing these variants.
Our bioinformatic analyses highlighted several biological pathways enriched for SNPs in PRS e2 , including the Clinico-demographic-adjusted model included age (cubic spline with three knots), sex, ethnicity, smoking status, Townsend deprivation index quintile, body mass index, immunosuppressant use, CVD, diabetes and hypertension. In this model, co-morbidity count is adjusted for these variables excepting CVD, diabetes and hypertension.
Clinico-demographic and PRS e2 -adjusted model included PRS e2 (as continuous) in addition to the variables included in the clinico-demographic-adjusted model (listed above). P-value for PRS e2 (as continuous) was 0.067. In this model, co-morbidity count is adjusted for PRS e2 (as continuous) in addition to the variables included when modelling co-morbidity count in the clinico-demographic-adjusted model (listed above). a Count of the following co-morbidities: cardiovascular disease, chronic respiratory disease, chronic kidney disease, diabetes, hypertension, chronic liver disease, neurological disease. P-value from the overall likelihood ratio test for association. BMI, body mass index; CKD, chronic kidney disease; CLD, chronic liver disease; CRD, chronic respiratory disease; CVD, cardiovascular disease; HR, hazard ratio; NA, not applicable; PRS e2 , White European polygenic risk score 2.
Reactome 'OAS antiviral response' pathway responsible for the mediation of antiviral innate immunity through the regulation of ribonucleic acid (RNA) degradation. 25 Furthermore, SNPs of the OAS1-OAS3 gene cluster, IFNAR2 and AP000295.9 were also enriched in the 'interferon alpha-beta signalling' pathway, which involves the induction of type I interferons in virus-infected cells, representing another stage in the human antiviral response. 26 This suggests host genetic variation in antiviral immune pathways may contribute to the disparity in disease severity amongst individuals. Given the positive results of recent clinical trials for antiviral drug molnupiravir, 27 such immune pathways may be of interest for further investigation and therapeutic targeting in severe COVID-19. 'Interleukin-10 signalling' pathway variants in IL10RB, CCR2 and CCR5 were also enriched in PRS e2 and may contribute to the development of severe COVID-19. Interleukin-10 (IL-10) is an anti-inflammatory cytokine and signalling of IL-10 is known to be important in limiting host immune response to pathogens. 28

Strengths and limitations
UK Biobank recruited older participants, which enabled this study to specifically examine risk in older patients, who are most at risk of COVID-19 infection and severe outcomes. The study time frame covered the pre-COVID-19 vaccination era, to avoid potential bias from the prioritized vaccine rollout programme. It incorporated deaths up to the end of December 2020, thereby extending the findings of a previous UK Biobank study of risk of death up to the end of September, covering the second wave of the pandemic. 24 We conducted multivariable regression analyses to investigate the relationship between covariates, highlighting the important contribution of PRSs, beyond that associated with clinicodemographic factors. The analyses were performed in a White European subpopulation to ensure that the European effect sizes used to defining PRS e2 did not introduce bias in how it performed between ethnicities: results from the subpopulation were consistent with the main analysis.
Study limitations included a modest number of cases of COVID-19 (n ¼ 9560) when compared with national cohorts. Further, UK Biobank participants may not be representative of the UK population, though the investigated assessment of risk factors does not require a representative population. 29 The clinico-demographic characteristics of the cohort were collected at the time of the baseline assessment ($10 years earlier) and self-reported. Given the study period and sample size, we could not investigate temporal or seasonal patterns in the association between risk factors and the outcomes. The use of European effect sizes in the construction of PRS e2 may have biased the PRS to perform more effectively in Europeans than other subpopulations.
Further improvements in trans-ethnic PRS performance can be expected as more data are generated from non-European populations, including ethnic-specific PRSs.

Conclusion
Using a trans-ethnic cohort of >500 000 UK Biobank participants in the pre-COVID-19 vaccination era, we derived an optimized PRS that associated with risk of severe COVID-19 and death in COVID-19 patients, even after adjustment for well-known clinico-demographic risk factors. We highlighted the risk associated with co-morbidities, immunosuppressant use and obesity. This study emphasizes the novel contribution to be gained from using genomic data alongside phenotyping in developing understanding of COVID-19 and its management.

Ethics approval
The study was approved by UK Biobank (project 24559). UK Biobank has ethical approval from the National Research Ethics Committee (REC reference 11/NW/0382) and obtained informed electronic consent from all participants. There was no patient-public involvement in the study, which used non-identifiable data.

Data availability
UK Biobank data were provided under a licence that does not permit sharing. The code-lists used in definitions and the derived results are published in the manuscript and supporting material.

Supplementary data
Supplementary data are available at IJE online.

Author contributions
Study conception and design: A.W.M., M.P.R. and M.I.; analysis planning, data collection, verification and data analysis: S.S.R.C. and N.J.M.C.; all authors contributed to data interpretation, drafting and critical revision of the article and approved the final submitted version.

Funding
This work was funded by a Medical Research Council Confidence in Concept award (grant number MC_PC_19042) and was additionally supported a National Institute for Health Research (NIHR) Senior Investigator award to A.W.M. and by the NIHR Leeds Biomedical Research Centre and Diagnostic Evaluation Cooperative. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. The funders had no role in the study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.