The causal effects of health conditions and risk factors on social and socioeconomic outcomes: Mendelian randomization in UK Biobank

Abstract Background We aimed to estimate the causal effect of health conditions and risk factors on social and socioeconomic outcomes in UK Biobank. Evidence on socioeconomic impacts is important to understand because it can help governments, policy makers and decision makers allocate resources efficiently and effectively. Methods We used Mendelian randomization to estimate the causal effects of eight health conditions (asthma, breast cancer, coronary heart disease, depression, eczema, migraine, osteoarthritis, type 2 diabetes) and five health risk factors [alcohol intake, body mass index (BMI), cholesterol, systolic blood pressure, smoking] on 19 social and socioeconomic outcomes in 336 997 men and women of White British ancestry in UK Biobank, aged between 39 and 72 years. Outcomes included annual household income, employment, deprivation [measured by the Townsend deprivation index (TDI)], degree-level education, happiness, loneliness and 13 other social and socioeconomic outcomes. Results Results suggested that BMI, smoking and alcohol intake affect many socioeconomic outcomes. For example, smoking was estimated to reduce household income [mean difference = -£22 838, 95% confidence interval (CI): -£31 354 to -£14 321] and the chance of owning accommodation [absolute percentage change (APC) = -20.8%, 95% CI: -28.2% to -13.4%], of being satisfied with health (APC = -35.4%, 95% CI: -51.2% to -19.5%) and of obtaining a university degree (APC = -65.9%, 95% CI: -81.4% to -50.4%), while also increasing deprivation (mean difference in TDI = 1.73, 95% CI: 1.02 to 2.44, approximately 216% of a decile of TDI). There was evidence that asthma decreased household income, the chance of obtaining a university degree and the chance of cohabiting, and migraine reduced the chance of having a weekly leisure or social activity, especially in men. For other associations, estimates were null. Conclusions Higher BMI, alcohol intake and smoking were all estimated to adversely affect multiple social and socioeconomic outcomes. Effects were not detected between health conditions and socioeconomic outcomes using Mendelian randomization, with the exceptions of depression, asthma and migraines. This may reflect true null associations, selection bias given the relative health and age of participants in UK Biobank, and/or lack of power to detect effects.


Introduction
Poor health has the potential to affect an individual's ability to engage with society. [1][2][3][4] For example, illnesses or adverse health behaviours could influence the ability to attend and concentrate at school or at work and hence affect educational attainment, employment and income. Illness and health behaviours may also affect an individual's ability to maintain well-being and an active social life.
From an individual perspective, maintaining good health can therefore have considerable social and socioeconomic benefits. 5 Similarly from a population perspective, improving population health could lead to a happier and more productive population. 6 Understanding the causal impacts of health on social and socioeconomic outcomes can help demonstrate the potential broader benefits of investing in effective health

Key Messages
• Studies have shown associations between poor health and adverse social (e.g. well-being, social contact) and socioeconomic (e.g. educational attainment, income, employment) outcomes, but there is also strong evidence that social and socioeconomic factors influence health.
• These bidirectional relationships, as well as confounding, make it difficult to establish whether health conditions and health risk factors have causal effects on social and socioeconomic outcomes.
• Mendelian randomization is a technique that uses genetic variants robustly related to an exposure of interest (here, health conditions and risk factors for poor health) as a proxy for the exposure, and is typically less prone to both reverse causation and confounding, allowing us to estimate more causal effects of health conditions and risk factors on social and socioeconomic outcomes.
• This study suggests causal effects of higher body mass index, smoking and alcohol use on a range of social and socioeconomic outcomes, implying that population-level improvements in these risk factors may, in addition to the well-known health benefits, have social and socioeconomic benefits for individuals and society.
• There was evidence that: asthma increased deprivation and decreased household income and the chance of having a university degree; depression increased loneliness and decreased happiness; and migraine reduced the chance of having a weekly leisure or social activity, especially in men. There was little evidence for causal effects of cholesterol, systolic blood pressure or breast cancer on any social and socioeconomic outcome. policy, thereby strengthening the case for crossgovernmental action to improve health and its wider determinants at the population level. 7 Furthermore, patients require accurate information about how their lives might be affected by their health, for example on returning to work after cancer. 8 However, studying the social and socioeconomic consequences of ill health ('social drift') is challenging because of social causation, i.e. the strong role of social and socioeconomic circumstances in disease causation. Social causation means that associations between health and social and socioeconomic outcomes are likely to be severely biased by confounding and reverse causality. Methodological approaches strengthening causal inference in this field are therefore essential.
Mendelian randomization is a technique that uses genetic variants robustly related to an exposure of interest (here, health conditions and risk factors for poor health) as proxies for the exposure (instrumental variables). 9,10 Since genetic variants are randomly allocated at conception, conditional on parental genotypes, results from Mendelian randomization studies are much less likely to suffer from confounding and reverse causality than traditional observational studies. 11 In this paper, we apply Mendelian randomization within a large study of UK individuals aged between 39 and 72 years, to estimate the causal effects of health conditions and risk factors with the greatest burden on UK adults on a range of social (e.g. social contact, wellbeing and cohabitation status) and socioeconomic (e.g. education, employment, income) outcomes.

Methods
Population UK Biobank is a population-based health research resource consisting of approximately 500 000 people, who were recruited between the years 2006 and 2010 from 22 centres across the UK. 12 Participants provided medical history and socioeconomic information via questionnaires, interviews and anthropometric measures at recruitment. Medical data from hospital episode statistics (HES) and the cancer registry have been linked to participants. The study design, participants and quality control methods have been described in detail previously. [13][14][15] UK Biobank received ethics approval from the Research Ethics Committee (REC reference for UK Biobank is 11/NW/ 0382).
We restricted analyses to unrelated individuals of White British ancestry. Full details of inclusion criteria and genotyping are in Supplementary Information Section 1, available as Supplementary data at IJE online. After exclusions, 336 997 participants remained in the dataset.

Measures of health conditions and risk factors (exposures)
We used the Global Burden of Disease Study 2010 16 to identify health conditions and risk factors that contributed 100 or more disability-adjusted life years lost per 100 000 adults in the UK. From this list, we restricted our analysis to health conditions and risk factors with known genetic determinants and a prevalence of !2% among UK Biobank participants. This resulted in the inclusion of eight health conditions: asthma, breast cancer, coronary heart disease, depression, eczema, migraine, osteoarthritis and type 2 diabetes; and five risk factors: alcohol consumption, body mass index (BMI), cholesterol, smoking and systolic blood pressure (Supplementary Figure 1, 17 where participants were considered to have depression if they self-reported seeing a GP or psychiatrist for nerves, anxiety or depression and reported at least a 2-week duration of depression or unenthusiasm, or had the relevant ICD-9 or ICD-10 codes for depression. Participants were considered to not have depression if they did not report ever visiting a GP or psychiatrist for nerves, anxiety or depression, did not self-report having depression and did not have an ICD code for depression. Only 10 centres asked the questions related to depression, so only participants from these centres were considered in the depression analyses. The measurement of health risk factors is described in Box 1.

Polygenic risk scores (instrumental variables)
We searched previous genome-wide association studies (GWASs) for single nucleotide polymorphisms (SNPs) with strong evidence of associations for each health condition and risk factor, defined as having a P-value at genomewide significance (P 5 Â 10 -8 ) (further details in Supplementary Information Section 2 and Supplementary  Tables 2 and 3, available as Supplementary data at IJE online). The polygenic risk scores (PRSs) for each health condition and risk factor were then calculated as the sum of the effect alleles for all SNPs associated with the health condition or risk factor, with each SNP weighted by the regression coefficient from the GWAS from which the SNP was identified.

Covariates
Age, sex and UK Biobank recruitment centre were reported at the baseline assessment, and genetic principal components (used to control for population stratification 19 ) were derived by UK Biobank.

Social and socioeconomic measures (outcomes)
We selected social and socioeconomic outcomes measured at the UK Biobank baseline assessment centre. Where possible, we dichotomized outcomes to simplify interpretability and comparability across outcomes. Box 2 contains a list of all outcomes; Supplementary Information Section 3, and Supplementary Table 4, available as Supplementary data at IJE online, give further information on how each outcome was measured.
We considered breast cancer, coronary heart disease, osteoarthritis, cholesterol or systolic blood pressure unlikely to have plausible causal effects on the chance of obtaining a university degree, given that these health conditions usually occur later in life; the Mendelian randomization effect estimates for these associations were thus used as negative controls (i.e. where no effect should be expected). 20,21 Main Mendelian randomization analysis We used Mendelian randomization to estimate the causal association between each health condition and risk factor and each outcome, using the PRS as an instrumental We removed self-reported former drinkers, participants with a very high number of units per week (>200 units), and participants who did not report they were never drinkers but who answered none of the questions about weekly alcohol intake, leaving 252 585 participants (75%).

Body mass index
BMI was estimated as measured weight in kilograms divided by measured height in metres squared.

Cholesterol
Cholesterol was measured by UK Biobank at baseline (measured by CHO-POD analysis on a Beckman Coulter AU5800).

Smoking
We used two measures of self-reported smoking: Lifetime smoking index: a composite (continuous) measure of relevant smoking variables with a simulated half-time constant representing the decreasing effect of smoking on health outcomes over time. This variable was created by Wootton et al. and used in a paper studying smoking and depression/schizophrenia. 18 Smoking initiation: a binary measure indicating whether participants had ever versus never smoked, based on whether the lifetime smoking index value had a non-zero value.
Systolic blood pressure Systolic blood pressure was measured using an automated device, and two measurements were taken a few moments apart. If the standard automated device could not be employed, two manual readings were taken instead. variable, with age at baseline assessment, sex, UK Biobank recruitment centre and 40 genetic principal components as covariates. We used the ivreg2 package in Stata (version 15.1) with robust standard errors, and tested for weak instrument bias (using Kleibergen-Paap Wald rk F statistics) to assess whether the PRSs were sufficiently predictive of the exposures. 23 This Mendelian randomization analysis estimates mean and risk differences for continuous and binary outcomes, respectively, using additive structural mean models. [24][25][26] Mean differences are interpreted as the average change in the outcome over all participants for having the exposure, and risk differences are interpreted as the absolute percentage point change in proportion of participants with the outcome for having the exposure (as in a linear probability model). For health conditions, we are measuring the effects of genetic liability to the health condition. 27 The analysis of breast cancer as an exposure was restricted to women. Despite the limitations of an approach based on statistical significance, 28 the number of results generated in these analyses necessitated a decision about which results to present in the main paper. Therefore, in the main table of results, we report results with a P-value less than 0.0026 (a Bonferroni-corrected P-value of 0.05 divided by 19 outcomes, with no correction for multiple exposures), and full results are reported in Supplementary Tables, available as Supplementary data at IJE online. However, we considered the public health implications of all effect estimates when interpreting results.
To compare the Mendelian randomization results with associations from non-genetic analysis, we estimated the multivariable-adjusted associations between the exposures and outcomes using linear regression, with age, sex, recruitment centre and 40 genetic principal components as covariates, i.e. observational analyses without genetic variables. These are linear probability models for binary outcomes (rather than logistic regression models), which were necessary to be able to compare with the Mendelian randomization analyses, as they are equivalent to additive structural mean models. We also performed endogeneity tests 29 to test whether the Mendelian randomization and multivariable-adjusted association estimates differed, where a low P-value indicates there was evidence that the Mendelian randomization and multivariable effects were different.

Sensitivity analyses
The robustness of Mendelian randomization analyses is reliant on the assumption that the SNPs, and therefore PRSs, do not affect the outcome except through the exposure, i.e. the SNPs are not pleiotropic. We tested this assumption by conducting sensitivity Mendelian randomization analyses, including inverse-variance weighted (IVW), MR Egger (an indicator of directional pleiotropy), weighted median, weighted mode and simple mode analyses. [30][31][32] We also measured Cochran's Q statistic from the IVW analyses (a measure of heterogeneity in the effects of individual SNPs on the outcome), an indicator of pleiotropy 33 or problems with modelling assumptions. 34 From these analyses, we determined: (i) whether the results were consistent with the main Mendelian randomization analysis, which would indicate that the results of the main analysis were robust; and (ii) whether there was evidence of pleiotropy from both the Egger regression constant term and Cochran's Q statistic. We also visually inspected plots of the sensitivity Mendelian randomization analyses, which would indicate possible bias in the results of the main analysis. Sensitivity Mendelian randomization analyses could only be performed when there were three or more SNPs included in each PRS.
We also conducted split-sample GWAS and Mendelian randomization analysis using UK Biobank data, in which we randomly split UK Biobank into halves, and for each half conducted a GWAS for each health condition and risk factor using the MRC IEU UK Biobank GWAS pipeline. 35 The results of the two GWASs were used to create PRSs for the other half of UK Biobank avoiding sample overlap, 36 and we repeated the Mendelian randomization analysis with the two PRSs separately, then combined the two results with fixed-effect meta-analysis to give a single estimate. The split-sample analysis: (i) allowed us to analyse lifetime smoking, as this has only been generated in UK Biobank, and thus no previous GWAS could have been used to inform the PRS; (ii) allowed us to potentially increase the size and power of the GWASs, possibly improving the predictive ability of the PRSs; and (iii) guaranteed homogeneity of the GWASs and analysis populations, which removes the potential bias from using data from an external GWAS to inform the creation of the PRSs, for example, through differences in populations giving different effects of SNPs. We also performed sensitivity Mendelian randomization sensitivity analyses on each split to check the robustness of the split-sample results.
Supplementary Table 5, available as Supplementary data at IJE online, shows a summary of all PRSs created and used in the split-sample analyses, and all GWAS significant SNPs from the split-sample GWASs are detailed in Supplementary Table 6, available as Supplementary data at IJE online.

Secondary analyses
We conducted secondary analyses to check the robustness of results, looking at whether: (i) results are different by sex and deprivation at birth; (ii) results for household income are affected by household size (income equivalization); (iii) results for employment outcomes are different when restricting to working age participants; (iv) results for household income are different when restricting to participants who have not retired; and (v) results for smoking are robust when only looking at the SNP rs1051730, known to affect smoking heaviness. 37 Additionally, we estimated the correlation between each of the PRSs in both the main analyses and within each split in the split-sample analyses, to determine whether any of the PRSs share genetic information. Further information for the secondary analyses and results are in Supplementary Information Section 4, available as Supplementary data at IJE online.

Patient and public involvement
This study was conducted using UK Biobank. Details of patient and public involvement in the UK Biobank are available online [www.ukbiobank.ac.uk/about-biobankuk/] and [https://www.ukbiobank.ac.uk/wp-content/uplo ads/2011/07/Summary-EGF-consultation.pdf? phpMyAdm in¼trmKQlYdjjnQIgJ%2CfAzikMhEnx6]. No patients were specifically involved in setting the research question or the outcome measures, nor were they involved in developing plans for recruitment, design or implementation of this study. No patients were asked to advise on interpretation or writing up of results. There are no specific plans to disseminate the results of the research to study participants, but the UK Biobank disseminates key findings from projects on its website.

Data and code availability
The empirical dataset will be archived with UK Biobank and made available to individuals who obtain the necessary permissions from the study's data access committees. The code used to clean and analyse the data is available as Supplementary Materials at IJE online, and here: [https:// github.com/sean-harrison-bristol/Effects-of-Health-Condi tions-and-Risk-Factors-on-Socioeconomic-Outcomes].

Results
Summary demographics, including prevalence of health conditions, risk factors and all outcomes, are presented in Table 1. The mean age of participants was 56.9 years (standard deviation: 8.0 years), mean household income (estimated from household income category midpoints) was £44 409 (standard deviation: £33 181) and 46% of participants were male. Results from the main Mendelian randomization analysis are displayed in a heat map of the P-values, where the P-value of each analysis is displayed in a cell, with the colour of the cell increasing in intensity as the P-value of the analysis decreases, Figure 1. Table 2 shows results from the main Mendelian randomization, split-sample Mendelian randomization and multivariable adjusted analyses for all outcomes where the main or split-sample Mendelian randomization analysis had a P-value less than 0.0026. All health conditions (except osteoarthritis) and risk factors in the main Mendelian randomization analysis had a low risk of weak instrument bias, and 75% of regressions had F statistics above 1000.
Forest plots showing the results for the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses for health conditions and risk factors on household income are shown in Figures 2 and 3, although there was evidence of heterogeneity between SNPs in sensitivity Mendelian randomization analyses for some exposures on income (Cochran's Q statistic P <0.01 for alcohol intake, BMI, breast cancer, depression, smoking initiation, systolic blood pressure), indicating possible pleiotropy. As such, results for income for these exposures should be interpreted with some caution, although there was little evidence of directional pleiotropy from MR Egger analyses of these exposures on income. As additional examples, the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses for health conditions and risk factors on loneliness are shown in Figures 4 and 5; plots for all other analyses are presented in the Supplementary Materials, available as Supplementary data at IJE online.

Asthma
In the main Mendelian randomization analysis, asthma was estimated to reduce household income [mean difference ¼ -£13 474, 95% confidence interval (CI): -£18 749 to -£8199], the chance of obtaining a university degree [absolute percentage change [APC] ¼ -17.1%, 95% CI: -25.4% to -8.7%] and the chance of cohabiting (APC ¼ -11.0%, 95% CI: -17.9% to -4.0%). There was little evidence that asthma affected other outcomes. Split-sample Mendelian randomization analysis estimates similarly showed detrimental estimates of effects of asthma on obtaining a university degree and income, but not on cohabiting, and there was only evidence of pleiotropy in sensitivity Mendelian randomization analyses for obtaining a university degree. The multivariable-adjusted association estimates tended to be weaker than the Mendelian randomization estimates, and in some cases (e.g. the chance of obtaining a university degree) in the opposite direction.

Type 2 diabetes
In the main and split-sample Mendelian randomization analyses, there were no strong associations for type 2 diabetes with any outcome. Directions of effects were inconsistent across outcomes. Multivariable-adjusted association estimates tended to be larger than Mendelian randomization estimates, and associations were apparent with several outcomes, most notably satisfaction with health (APC for multivariable adjusted association estimate ¼ -19.1%, 95% CI: -20.1% to -18.2%).

Other health conditions
The CIs in Mendelian randomization analyses for breast cancer, coronary heart disease and osteoarthritis were very wide for all outcomes, and as such, these analyses were inconclusive. For breast cancer and coronary heart disease, there was no clear pattern of the direction of effects across outcomes, and CIs were wide. The CIs for osteoarthritis were very wide for all outcomes. As expected, given life course temporal relationships, there was little evidence from the main or split-sample Mendelian randomization analyses that breast cancer, coronary heart disease or osteoarthritis were associated with the chance of obtaining a university degree (included as negative controls). In the multivariable-adjusted analysis, breast cancer was not associated with the chance of obtaining a university degree, whereas coronary heart disease and osteoarthritis were (APC ¼ -8.1%, 95% CI: -9.0% to -7.1% and APC ¼ -6.3%, 95% CI: -6.9% to -5.6%, respectively), indicating, together with the null estimates from the Mendelian randomization analyses, possible social causation of the health conditions rather than vice versa. Osteoarthritis was excluded from the sensitivity Mendelian randomization analysis as there were fewer than three GWAS-significant SNPs in the osteoarthritis GWAS.
In the multivariable-adjusted analysis, breast cancer was only associated with increased chances of being non-employed and retired and a decreased satisfaction with health, whereas coronary heart disease and osteoarthritis were negatively associated with all economic outcomes and most social outcomes, though not satisfaction with friendships or work nor with weekly friend visits.

Risk factors
Alcohol intake All results are expressed for a 5 units per week increase in alcohol intake.
In the main Mendelian randomization analysis, alcohol was estimated to reduce household income (mean difference ¼ -£2446, 95% CI: -£3362 to -£1530) and the chance of owning accommodation (APC ¼ -1.8, -2.4% to -1.2%) and to increase deprivation (mean difference in TDI ¼ 0.18, 95% CI: 0.11 to 0.25, approximately 23% of a decile of TDI). In the split-sample Mendelian randomization analysis, alcohol was estimated to reduce the chance of cohabiting (APC ¼ -1.5%, 95% CI: -2.4% to -0.6%) and owning accommodation (APC ¼ -1.2%, 95% CI: -1.7% to -0.6%) and to increase deprivation (mean difference in TDI ¼ 0.14, 95% CI: 0.08 to 0.19, approximately 18% of a decile of TDI). There was no evidence of causal effects on other outcomes. There was evidence of heterogeneity in SNP effects for being happy, household income and receiving a university degree, but no evidence of directional pleiotropy in Egger regression. The multivariable-adjusted analysis estimated that alcohol increased (rather than reduced) household income (mean difference ¼ £442, 95% CI: £400 to £484, P-value from endogeneity test ¼ 1.6 x 10 -10 ), and no associations were seen with other outcomes.

Body mass index
All results are expressed for a 5 kg/m 2 increase in BMI.

Cholesterol
All results are expressed for a 1-mmol/litre increase in cholesterol.
In the main and split-sample Mendelian randomization analyses, there was no evidence of effects of cholesterol on any outcome. In the multivariable-adjusted analyses, cholesterol was beneficial for all socioeconomic outcomes and most social contact and well-being outcomes. Together with the null estimates from the Mendelian randomization analyses, this could imply confounding or reverse causation in the multivariable-adjusted association estimates. Cholesterol was excluded from the sensitivity Mendelian randomization analysis as there were fewer than three GWAS-significant SNPs in the cholesterol GWAS.

Lifetime smoking
All results are expressed for a one standard deviation increase in the continuous lifetime smoking index value. We Figure 2 Forest plot showing effects of health conditions on household income for the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses (note: confidence intervals are so narrow for the multivariable adjusted analyses that they cannot be seen).
did not perform a main Mendelian randomization analysis, as there was no previous GWAS for lifetime smoking.

Smoking initiation
In the main Mendelian randomization analysis, smoking initiation was estimated to reduce household income (mean Figure 3 Forest plot showing effects of risk factors on household income for the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses (note: confidence intervals are so narrow that they cannot be seen for most associations).
difference ¼ -£22 838, 95% CI: -£31 354 to -£14 321), the chance of owning accommodation (APC ¼ -20.8%, 95% CI: -28.2% to -13.4%), being satisfied with health (APC ¼ -35.4%, 95% CI: -51.2% to -19.5%) and obtaining a university degree (APC ¼ -65.9%, 95% CI: -81.4% to -50.4%), and to increase deprivation (mean difference in TDI ¼ 1.73, 95% CI: 1.02 to 2.44, approximately 216% of a decile of TDI). All effects were also seen in the splitsample analysis. Smoking initiation was also estimated to increase the chance of having a skilled job (APC ¼ -37.0%, 95% CI: -50.0% to -23.9%) and to reduce the chance of being non-employed, both including and excluding retired participants (APC ¼ 13.3%, 95% CI: 6.3% to 20.2% and APC ¼ 19.0%, 95% CI: 9.0% to 29.0%, respectively), and of having weekly friend visits (APC ¼ 19.8%, 95% CI: 9.2% to 30.5%), but only in the main Mendelian randomization analysis. Additionally, smoking initiation was estimated to reduce the chance of being satisfied with one's financial situation (APC ¼ -22.7%, 95% CI: -36.0% to -8.9%) in the split-sample Mendelian randomization analysis, with a similar effect size in the main Mendelian randomization analysis. CIs were wide for all outcomes. There was evidence of heterogeneity in SNP effects for most outcomes, but no evidence of directional pleiotropy from Egger regression. Multivariable-adjusted association estimates tended to be closer to the null than the MR analyses. Figure 4 Forest plot showing effects of health conditions on being lonely for the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses (note: confidence intervals are so narrow for the multivariable-adjusted analyses that they cannot be seen).

Systolic blood pressure
All results are expressed for a 10-mmHg increase in systolic blood pressure (BP).
In the main and split-sample Mendelian randomization analyses, there was no evidence of effects of systolic BP on any outcome.

Further analyses
Full results from main Mendelian randomization, sensitivity Mendelian randomization, split-sample Mendelian randomization and split-sample sensitivity Mendelian randomization analyses are shown in Supplementary Tables 7-10, available as Supplementary data at IJE online, respectively, with secondary and sensitivity analyses results in Supplementary Tables 11 and 12, available as Supplementary data at IJE online. For all health conditions and risk factors, forest plots showing results for the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses (presented both as each exposure on social and socioeconomic outcomes, and for each outcome on health conditions and risk factors) are available in the Supplementary Materials, available as Supplementary data at IJE online, along with forest plots of SNPs and plots showing IVW, MR Egger, simple mode, weighted median and weighted mode Mendelian randomization analyses. There was little evidence of correlation between any PRSs in the main analysis (all R 2 values below 0.01); however, for the split-sample Figure 5 Forest plot showing effects of risk factors on being lonely for the main Mendelian randomization, split-sample Mendelian randomization and multivariable-adjusted analyses (note: confidence intervals are so narrow for the multivariable-adjusted analyses that they cannot be seen).
PRSs, there was evidence of correlations between asthma and eczema (r ¼ 0.17 for both splits combined), and between smoking initiation and lifetime smoking (r ¼ 0.37 for both splits combined) (see Supplementary Table 13, available as Supplementary data at IJE online).

Discussion
We estimate the putative causal effects of a variety of health conditions and risk factors on socioeconomic and social outcomes using Mendelian randomization, a genetically informed methodology typically less affected by confounding and reverse causality than observational analyses that adjust for measured confounders. 11 Our results indicate that higher BMI, greater alcohol intake and smoking all negatively affect socioeconomic outcomes, and depression negatively affects many social outcomes. We do not observe an effect of cholesterol or systolic BP on any outcome, which may reflect effective treatments for high cholesterol and hypertension protecting participants from adverse consequences. For breast cancer, coronary heart disease, migraine and osteoarthritis, the confidence intervals for all Mendelian randomization analyses were all very wide, meaning that it is not possible to draw firm conclusions about the social and socioeconomic consequences of these conditions from our analyses. However, we estimated that migraine reduced the chance of going to a pub or social club weekly, possibly as alcohol increases the risk of migraines. 38,39 Potential reasons for adverse effects of high BMI, alcohol use and smoking on social and socioeconomic outcomes include increased disease burden, social stigma (e.g. bias against obese people, smokers etc.) or behaviours which make employment, retention of employment or social interaction challenging. Our previous analyses of UK Biobank have shown evidence of effects of BMI on social and socioeconomic outcomes in both Mendelian randomization and non-genetic within-sibling analyses. 4 Here, we build on these previous analyses by including a broader set of social and socioeconomic outcomes, conducting additional sensitivity and secondary analyses and facilitating comparisons across a range of health conditions and risk factors.
Higher genetic propensities towards asthma and eczema were estimated to reduce household income (mean difference ¼ -£13 519, 95% CI: -£18 794 to -£8243 for asthma, and mean difference ¼ -£46 987, 95% CI: -£71 048 to -£22 925 for eczema). However, it is possible that these estimates are susceptible to bias from pleiotropy, given the extreme size of the effects. Asthma and eczema share many genetic loci, along with inflammatory bowel disease and other autoimmune conditions. 40 Therefore, the Mendelian randomization results for eczema and asthma may reflect an underlying genetic predisposition toward autoimmune condition susceptibility, rather than asthma or eczema specifically. This would not be detectable with Mendelian randomization sensitivity analyses if all SNPs included in the PRSs were affecting autoimmune susceptibility rather than the conditions themselves (directional unbalanced pleiotropy). Additionally, the PRS for smoking initiation may capture impulsivity and risk taking as well as a propensity to smoke.
For some health conditions (asthma, breast cancer, eczema, migraine), we saw little evidence for observational (multivariable-adjusted) associations with either socioeconomic or social outcomes, despite previous evidence often showing strong associations. For example, breast cancer has been associated with lower income, 41 but there was no observational association between breast cancer and household income in UK Biobank. This could result from selection bias in UK Biobank, 42 with participants potentially liable to have less severe/advanced forms of the condition or quicker recovery than all breast cancer patients across a population, and also to have greater financial support and better employment conditions than the general population. The effects of health conditions may also diminish over time; there is some evidence that the negative effect on income among breast cancer survivors reduces over time. 41 It is therefore possible that our study does not have the correct time frame to capture the effects of each health condition, or that well-functioning insurance markets and pension provision could mitigate socioeconomic effects of health conditions, at least within this generally affluent UK population. 43 Additionally, if a participant developed any health condition after baseline, we would only know if the participant had a hospital episode which mentioned the condition.
There was evidence that depression was detrimental to multiple social outcomes, including reduced happiness and reported satisfaction rates and increased loneliness. Given these are common features of depression, this result was expected and gives us confidence that the PRS for depression was suitably predictive of depression.

Strengths and limitations
The main strengths of this analysis are that Mendelian randomization analyses are generally less affected by confounding and reverse causation than multivariableadjusted (observational) analyses, 44 and that UK Biobank is a very large sample with sufficient data to enable us to examine multiple health exposures and multiple socioeconomic and social outcomes. For some associations, there were marked differences between the Mendelian randomization and multivariable-adjusted association estimates, which could result from reverse causation or confounding in the multivariableassociation adjusted estimates. For example, coronary heart disease was associated with a decreased chance of obtaining a university degree (APC ¼ -8.1%, 95% CI: -9.0% to -7.1%) in the multivariable-adjusted analysis, which is implausible given that coronary heart disease usually occurs later in life than attending university, and this association was not seen in the Mendelian randomization analysis. Additionally, the SNPs contributing to the PRSs were drawn from GWASs that excluded UK Biobank to avoid biases caused by sample overlap, 36 and all reached genome-wide significance. Finally, the results from the main and split-sample analyses were largely consistent across exposures and outcomes, reducing the possibility of bias from differences in SNP effects between the GWAS and UK Biobank populations.
However, Mendelian randomization rests on assumptions that cannot be proven to be true. 44 Assessing pleiotropy was difficult or impossible for many exposures, due to the low number of SNPs and wide CIs, but there was evidence for heterogeneity between SNPs for some associations (e.g. for income), and directional pleiotropy from Egger regression for a limited number of associations (e.g. for BMI on obtaining a university degree). As the outcomes were social and socioeconomic, not biological, the exclusion restriction assumption would be strong for any genetic variant (i.e. that the genetic variant affects the outcome only through the exposure). For example, we cannot assume that an SNP associated with income affects any health condition or risk factor solely through income. We therefore did not perform bidirectional Mendelian randomization, 45 and so cannot rule out reverse causation for any analysis.
The PRSs represent lifetime exposure to or risk for the health condition or risk factor, and interventions to reduce the exposure or risk of the exposure at different time points in a person's life may have different effects; effects at specific points in life cannot be explored with the methodology used in this paper. As we used linear prediction models for all analyses, some effect estimates may also be impossibly large (i.e. over 100%), which could occur when precision is very low, though this was rare. Although Mendelian randomization is generally less affected by confounding and reverse causality than multivariable regression analyses, an important potential source of bias in these analyses is family-level effects. Recent evidence suggests that assortative mating and dynastic effects can lead to bias in Mendelian randomization effect estimates, 46 with estimates of the effect of BMI on educational attainment being consistent with the null in within-family Mendelian randomization models using data from UK Biobank and the Norwegian HUNT study. In our previous analysis of UK Biobank, 4 within-family Mendelian randomization models in UK Biobank alone were too imprecise to draw conclusions about whether the estimated effects of BMI on social and socioeconomic outcomes are robust to potential confounding by family-level factors. Since BMI is the exposure for which we have greatest statistical power (due to the strength of the genetic instrumental variable), we have not repeated the within-family analyses for our other exposures, as power will be extremely limited. However, as more datasets are available that include genetic information for multiple family members, examination of whether these effects can be detected with a within-family Mendelian randomization design will be a high priority.
UK Biobank, although large, is not representative of the UK population as participants tend to be wealthier and healthier compared with the country as a whole, which may impart bias to our analyses. 47 It is likely that this biased some estimates towards the null, as wealthier and healthier people may be more resistant to any detrimental effects of health conditions and risk factors. Additionally, there is evidence of a geographical structure in the UK Biobank genotype data which cannot be accounted for using adjustment for principal components, which may also have biased our analyses. 48 However, recent evidence suggests that whereas geographical structure may be present after controlling for principal components in the PRSs for BMI, coronary heart disease, smoking and alcohol consumption (and these may all be related to educational attainment), there was little evidence for geographical structure in the PRSs for other health conditions. 49 Additionally, a recent GWAS of income showed that only 8% of the inflation in the GWAS test statistics was due to residual stratification or confounding, indicating that population structure is unlikely to severely bias many of our results. 50 Some outcomes were dichotomized, which may have reduced our ability to detect associations (e.g. satisfaction with health).
For health conditions, the uncertainty around the Mendelian randomization effect estimates was large. As many health conditions had small associations with outcomes on multivariable-adjusted analyses, this often meant the Mendelian randomization estimates were larger than the observed estimates or had a different sign, but this can be explained by the imprecision in the Mendelian randomization estimates. The uncertainty is due in part to the relatively poor ability of the PRS to predict some health conditions. There were minimal differences in prevalence between UK Biobank and the UK for most health conditions studied (apart from migraine and depression, which were less and more prevalent in UK Biobank respectively), but it is possible the health conditions were milder or better managed in UK Biobank participants compared with the population as a whole. 51 Therefore, null results should be interpreted as a lack of evidence for a causal effect, not evidence of a lack of a causal effect.

Conclusion
The results of this study imply that higher BMI, smoking and alcohol consumption are likely detrimental to socioeconomic outcomes. Whereas the prevalence of smoking is decreasing in the UK, 52 the average BMI has risen and is continuing to rise worldwide. 53 Reducing average BMI levels, and further reducing smoking and alcohol intake, in addition to health benefits may also improve socioeconomic outcomes for individuals and populations.
There was little evidence of causal effects of health conditions on socioeconomic outcomes, which may reflect true absence of causal effects or bias due to the characteristics of UK Biobank participants, or the low precision of our estimates for health condition effects.

Supplementary data
Supplementary data are available at IJE online.