Bias in two-sample Mendelian randomization by using covariable-adjusted summary associations

Two-sample Mendelian randomization (MR) is increasingly used to strengthen causal inference using observational data. This method allows the use of freely accessible summary association results from genome-wide association studies (GWAS) for a range of traits. Some GWAS studies adjust for heritable covariables in an attempt to estimate direct effects of genetic variants on the trait of interest. One, both or neither of the genetic instrumental variables (IVs)-exposure association or genetic IVs-outcome association may have been adjusted for heritable covariables (referred to as GWAS covariables). However, it is unclear how this may affect two-sample MR analysis. We evaluated this in a simulation study comprising different scenarios that could motivate covariable adjustment in a GWAS. Our results indicate that the impact of covariable adjustment is highly dependent on the underlying causal structure. In the absence of residual confounding between exposure and covariable, between exposure and outcome, and between covariable and outcome, using covariable-adjusted summary associations for two-sample MR may eliminate bias due to horizontal pleiotropy. However, the presence of residual confounding (especially between the covariable and the outcome) leads to bias upon covariable adjustment, even in the absence of horizontal pleiotropy. Bias was lower when the true causal effect of the exposure on the outcome was zero compared to a non-zero causal effect. In an analysis using real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium and UK Biobank, the direction of the causal effect estimate of waist circumference on blood pressure changed upon adjustment of waist circumference for body mass index. Our findings indicate that using covariable-adjusted summary associations in MR should generally be avoided. When that is not possible, careful consideration of the causal relationships underlying the data (including potentially unmeasured confounders) is required to direct sensitivity analyses and interpret results with appropriate caution.


Abstract
Two-sample Mendelian randomization (MR) is increasingly used to strengthen causal inference using observational data. This method allows the use of freely accessible summary association results from genome-wide association studies (GWAS) for a range of traits. Some GWAS studies adjust for heritable covariables in an attempt to estimate direct effects of genetic variants on the trait of interest. One, both or neither of the genetic instrumental variables (IVs)-exposure association or genetic IVs-outcome association may have been adjusted for heritable covariables (referred to as GWAS covariables). However, it is unclear how this may affect two-sample MR analysis. We evaluated this in a simulation study comprising different scenarios that could motivate covariable adjustment in a GWAS. Our results indicate that the impact of covariable adjustment is highly dependent on the underlying causal structure. In the absence of residual confounding between exposure and covariable, between exposure and outcome, and between covariable and outcome, using covariable-adjusted summary associations for two-sample MR may eliminate bias due to horizontal pleiotropy. However, the presence of residual confounding (especially between the covariable and the outcome) leads to bias upon covariable adjustment, even in the absence of horizontal pleiotropy. Bias was lower when the true causal effect of the exposure on the outcome was zero compared to a non-zero causal effect. In an analysis using real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium and UK Biobank, the direction of the causal effect estimate of waist circumference on blood pressure changed upon adjustment of waist circumference for body mass index. Our findings indicate that using covariable-adjusted summary associations in MR should generally be avoided. When that is not possible, careful consideration of the causal relationships underlying the data (including potentially unmeasured confounders) is required to direct sensitivity analyses and interpret results with appropriate caution.

Introduction
Mendelian randomization (MR) uses genetic variants to assess the influence of modifiable exposures on health outcomes (1,2). As germline genetic variants are generally independent of confounding factors and are determined at conception, MR offers a more robust approach to confounding and reverse causation than other methods used in observational studies (3). Two-sample MR is an extension to the one-sample MR design, where estimates for the association of genetic variants with exposure and with outcome are derived from different (non-overlapping) samples from the same underlying population (4). These estimates are combined to obtain the causal effect estimate of exposure on outcome (5). Given that genetic variants typically explain a small proportion of the variation in the exposure of interest, large sample sizes are required for adequately-powered MR studies. Therefore, in recent years, twosample MR has substantially grown in popularity (6) since it capitalizes on the use of publiclyavailable summary association results from large genome-wide association studies (GWAS).
In GWAS, estimates for the association of genetic variants with the trait of interest are often conditioned on covariables. As an example, GWAS of waist-to-hip ratio have adjusted estimates for body mass index (BMI) (7), GWAS of lung function have adjusted estimates for height and stratified analysis by smoking status (8), and GWAS of birth weight has excluded pre-term births (9). Typically, the aim of conditioning on covariables is estimating the direct effect of genetic variants on the trait (i.e. effects independent of the covariable) or to improve statistical power by reducing residual variance. However, this strategy could introduce bias in GWAS association estimates if the covariable is a collider (or a descendant of a collider) on a pathway linking the genetic variant to the trait of interest. It has previously been shown that conditioning on heritable covariables can bias GWAS association estimates and that the magnitude of this bias is a function of the effect of the genetic variant on the covariable and the correlation structure between the covariable and the trait of interest (10,11).
Several two-sample MR studies have used summary association data from GWAS that have estimated the effect of genetic variants on the trait of interest conditioned on heritable covariables (e.g. (12)(13)(14)(15)(16)). The use of such GWAS data might have biased the findings of these MR studies for the reasons outlined above. In addition, it is challenging to predict the impact of such bias since data from two independent GWAS (one for the exposure and other for the outcome) are used in two-sample MR studies, meaning that the conditional estimates could be restricted to the association of genetic instruments with the exposure, with the outcome, or with both exposure and outcome. Despite the widespread use of covariable-adjusted summary associations (e.g. (12)(13)(14)(15)(16)), few considerations have been made about how this could affect the validity of results (e.g. (12-14, 17, 18)), particularly in the context of two-sample MR. In this paper we explore how covariable adjustment in GWAS affects two-sample MR findings using simulated and real data in scenarios that could motivate conditioning on a heritable covariable in a GWAS.

Simulation study
We performed a series of simulations to evaluate the consequences of covariable adjustment in MR. We were interested in evaluating situations where the genetic instrument is marginally associated with both the exposure ( ) and a covariable ( ). In these situations, the GWAS analyst might decide to adjust for as an attempt to estimate the effect of genetic variants on independent of , thus generating covariable-adjusted summary association results that will be used by two-sample MR analysts, who do not have access to covariable-unadjusted results.
We performed simulations under two main contexts: i) all genetic variants have directionallyconsistent direct effects on the same traits (hereafter referred to as "homogeneous genetic variants"); ii) some genetic variants have direct effects only on some traits (hereafter referred to as "heterogeneous genetic variants"). By direct effect, we mean that there is an arrow from the genetic variant directly into the trait in the causal diagram.
Scenarios simulated under situation i) are illustrated in Figure 1. The measured variables are the genetic instrument , the exposure variable , the outcome variable and the covariable that has been adjusted for in a GWAS. In all situations, and are genetically correlated (ie, both are marginally associated with ). The aim is to estimate the causal effect of on using summary GWAS results in a two-sample MR framework. Therefore, there are four possible situations: no adjustment for ; adjusted -association but unadjustedassociation; unadjusted -association but adjusted -association; or both -andassociations adjusted for . Figure 1 depicts the assumed causal structures that we evaluated in the simulations. In scenarios A1-A5, fully mediates the effect of on . In scenarios B1-B5, independently affects both and (in other words, is a confounder of the -association). In scenarios C1-C5, fully mediates the effect of on . In scenarios D1-D5, the effect of on and is mediated by a common cause , so that the effect of on is correlated with the effect of on , even though there is no causal effect from to or vice-versa. In all scenarios A1-5 to D1-5, the instrumental variable assumptions hold, so that is a valid instrument to estimate the causal effect of on . However, this is not the case in scenarios E1-E5 and F1-F5, which are identical to B1-B5 and D1-D5 (respectively), except that has a direct effect on (ie, horizontal pleiotropy). More specifically, the -and -associations are independent in scenarios E1-E5, meaning that in these scenarios there is horizontal pleiotropy, but the InSIDE (Instrument Strength Independent of Direct Effects) assumption holds. However, in scenarios F1-F5, the InSIDE assumption is violated because Z has an effect on a common cause of and .
To simulate data under situation ii), the same underlying causal structure used for situation i) was used, with the exception that, in all simulations, there were four non-overlapping subgroups of genetic variants: some with direct effects on only, some on only, some on both and (but not ), and some on only.
Even though Figure 1 depicts as a having a non-null causal effect on , all of these scenarios were also simulated for both a null and non-null causal effect from X to Y. A detailed description of the data generating model is provided in the Supplementary Material.
For analyses using unadjusted variant-exposure summary association results (regardless of whether the variant-outcome associations were adjusted for ), we selected variants with an unadjusted association with the exposure achieving a statistic of 10 or more. The same applies for analysis using adjusted variant-exposure summary association results.

Real data example: assessing the causal effect of waist circumference on blood pressure
We conducted an illustrative analysis to explore the impact of covariable adjustment in MR in a real data setting. The exposure of interest was waist circumference (WC), the outcome variables were systolic (SBP) and diastolic blood pressure (DBP), and the covariable was BMI.
We selected genetic instruments of unadjusted WC and BMI-adjusted WC from the Genetic Investigation of ANthropometric Traits (GIANT) consortium (7) and calculated the corresponding instrument-BP summary association results using an interim release of UK Biobank data (19). Details on the data sources are provided in the Supplementary Material.
BMI was used as a covariable due to its strong correlation with WC, meaning that variants that affect WC might also affect BMI due to their effect on overall adiposity. Here, we assume that two distinct causal structures ( Figure 2) are plausible. In panel A, the genetic variant has a direct effect on a latent variable (which we refer to as adiposity), which manifests itself in measurable constructs such as WC and BMI. Panel B depicts a scenario where the genetic variant has a direct effect on WC rather than on adiposity. Those mechanisms are not mutually exclusive, since different genetic variants can present those different direct effects, or even a single genetic variant can have direct effects on both. Of note, WC, BP, BMI and adiposity are analogous to , , and (respectively) in Figure 1.
We aimed at replicating scenarios in which the summary association results (either unadjusted or adjusted for the covariable) are already available. Therefore, we selected two sets of WC genetic instruments: one using the unadjusted GWAS results; and another using the BMIadjusted GWAS. In each case, independent genetic variants were selected as WC instruments if P-value <5×10 -8 . After quality control (described in detail in the Supplementary Materials), 37 single nucleotide polymorphisms (SNPs) and 60 SNPs were retained as genetic instruments for BMI-unadjusted and BMI-adjusted WC, respectively. Prior to analysis, data was harmonised following the steps described in Hartwig et al (6) (as detailed in the Supplementary Material).
As in the simulation study, four situations were considered: i) unadjusted instrument-WC and unadjusted instrument-BP associations; ii) adjusted instrument-WC and unadjusted instrument-BP associations; iii) unadjusted instrument-WC and adjusted instrument-BP associations; and iv) adjusted instrument-WC and adjusted instrument-BP associations.

Statistical analyses
Causal effect estimates were obtained using multiplicative random effects inverse-variance weighting (IVW) (5,20). In the simulation study, coverage and average causal effect estimates were obtained across 5 000 simulated datasets. Coverage was defined the proportion of times that 95% confidence intervals included the true causal effect.

Simulation study
Supplementary Table 1

Homogeneous genetic instruments
For homogeneous genetic instruments, we only evaluated scenarios A5-F5 because in other scenarios adjustment for yielded in no variants achieving the instrument strength criterion.
As shown in Supplementary Table 2, using covariable-adjusted associations did not eliminate bias due to horizontal pleiotropy in presence of unmeasured confounders (scenarios E5 and F-5), and introduced bias even in the absence of horizontal pleiotropy. The bias was mainly driven by the use of covariable-adjusted instrument-associations, and was generally higher when the true causal effect was non-zero. Similar trends were observed in the coverage of the 95% confidence intervals.

Discussion
Our results indicate that the impact of covariable adjustment in two-sample MR depends on the causal relation and confounding structure between genetic instruments, exposure, covariable and outcome. In addition, the magnitude and direction of bias will vary depending on whether associations between instrument-exposure, instrument-outcome or both are adjusted for the same covariables. In an analysis using real data from GIANT consortium and UK Biobank, the estimated causal effect of waist circumference on blood pressure changed substantially upon adjustment for BMI.
The strong dependence of the results on the underlying causal structure was expected. In the absence of unobserved common causes (confounders) between exposure-covariable, exposure-outcome and covariable-outcome, covariable adjustment eliminates bias due to horizontal pleiotropy mediated by such covariable (scenarios E and F). However, absence of unobserved confounding is unrealistic in observational studies and is one of the primary motivations for performing MR. In the presence of unobserved confounding, mainly between the covariable and the outcome, covariable adjustment will likely lead to bias even in the absence of horizontal pleiotropy due to collider bias where genetic instruments are marginally associated with the covariable. Therefore, minimising horizontal pleiotropy is generally an invalid justification for using covariable-adjusted summary association results for two-sample MR.
Bias was generally weaker when the true causal effect of the exposure on the outcome is null, and this was consistent among the several scenarios we evaluated. This suggests that covariable adjustment has a lower impact on testing the null hypothesis than on estimating a non-zero causal effect. Indeed, even in the absence of any unmeasured confounding, covariable adjustment may lead to bias when the true causal effect is not null. Therefore, null results from a MR analysis using covariable-adjusted summary association results are generally more reliable (in the sense of being less likely to be a consequence of covariable-adjustment bias) than non-null results, assuming that it is generally unlikely that bias due to using covariable-adjusted summary associations perfectly balances out a given non-null true causal effect.
Using covariable-adjusted instrument-outcome summary associations was also consistently related to more bias compared to using covariable-adjusted instrument-exposure summary associations. The exception was scenarios where there was horizontal pleiotropy. However, since attempting to minimise horizontal pleiotropy via covariable adjustment is generally unjustified due to the likely presence of unmeasured confounders, we are primarily concerned with the scenarios where there is no horizontal pleiotropy. Another result that was generally consistent across the different scenarios was that selecting genetic instruments after covariable adjustment (i.e., using covariable-adjust summary instrument-exposure association results) presented less bias than selection before adjustment. This also minimises other possible problems, such as running into lack of association between the instrument and the exposure upon covariable adjustment, which could compromise precision and lead to weak instrument bias.
To our knowledge, this is the first study to systematically assess the impact of covariable adjustment in (two-sample) MR. Conditioning on a heritable covariable can introduce bias when the covariable is a collider in the pathway between instrument-exposure and/or instrument-outcome. Collider bias can also be introduced in MR studies in other settings, such as in disease progression studies which include a selected (i.e., case-only) group of individuals (17).
It is important to emphasise that using covariable-adjusted GWAS summary association results in two-sample MR studies differs from applying multivariable MR (MVMR) on unadjusted GWAS summary association results to estimate the effect of two or more exposures on an outcome (18). The focus of our paper is to investigate bias in two-sample MR due to using covariable-adjusted summary association results, especially in the case when those are the only available option (e.g., when using summary data from GWAS consortia that only performed covariable-adjusted analyses). We are not primarily concerned with whether using covariable-adjusted summary associations reduces bias due to horizontal pleiotropy, even though our results indicate that this is unlikely to be the case in the presence of unmeasured confounding. When one aims to use covariable data for this aim, MVMR should be preferred, because MVMR uses genetically predicted variations in both the exposure and covariable(s), and is therefore less susceptible to collider bias (21).
One of the strengths of our study was that our simulations covered a wide range of scenarios, thus allowing a detailed evaluation of covariable-adjustment bias in a variety of situations. The simulation study was also complemented with a real data example illustrating the strong influence that covariable-adjustment may have not only on the magnitude, but also on the presence and direction of the causal effect estimate. However, any simulation study is a simplification of a reality that is likely to be much more complex. It is impossible to simulate all possible scenarios that might be of relevance to this topic. Moreover, some results, especially quantitative estimates of bias and coverage, are highly dependent on the data-generating model. Therefore, our results should be interpreted qualitatively, as general indications of some of the main aspects related to covariable-adjustment in MR. It was reassuring that some results (as described above) were consistent across scenarios, which indicates that they may apply generally (although not universally) to covariable-adjusted MR.
In conclusion, our findings indicate that using summary association results adjusted for heritable covariables may lead to bias in two-sample MR due to unmeasured confounding. We recommend avoiding adjustment for such covariables in the context of MR. When only covariable-adjusted data is available, it is important to carefully consider the causal structure underlying the research question to understand the potential impact on the results. In such cases, we recommend that researchers refrain from interpreting the causal estimate too literally (which indeed requires parametric assumptions in addition to the core instrumental variable assumptions even in the absence of covariable adjustment) and focus on testing the causal null hypothesis. To account for horizontal pleiotropy due to measured covariables, MVMR should be preferred over non-genetic covariable adjustment.    : True causal effect of the exposure ( ) on the outcome ( ). Scenarios A-F assume different causal relationships among , , the instrument ( ), the covariate ( ) and a common cause of and affected by ( ). In scenarios A1-F1, there are no unmeasured confounders. In scenarios A2-F2, there is an unmeasured common cause of and . In scenarios A3-F3, there is an unmeasured common cause of and . In scenarios A4-F4, there is an unmeasured common cause of and . In scenarios A5-F5, all these three unmeasured confounders are present. The scenarios are illustrated in Figure 1 and described in detail in the "Simulation study" section. Effect estimates are expressed as mean difference, and 95% CI, of SBP or DBP (in mmHg) per standard unit increase in WC. 37 SNPs and 60 SNPs were used as instruments for waist circumference unadjusted and adjusted for BMI, respectively. Effect estimates are expressed as mean difference, and 95% CI, of SBP or DBP (in mmHg) per standard unit increase in WC. 37 SNPs and 60 SNPs were used as instruments for WC unadjusted and adjusted for BMI, respectively. After removing instruments associated with BMI (p-value < 0.05), 2 SNPs and 45 SNPs were retained as instruments for unadjusted and BMI-adjusted WC, respectively.