Contextualizing selection bias in Mendelian randomization: how bad is it likely to be?

Abstract Background Selection bias affects Mendelian randomization investigations when selection into the study sample depends on a collider between the genetic variant and confounders of the risk factor–outcome association. However, the relative importance of selection bias for Mendelian randomization compared with other potential biases is unclear. Methods We performed an extensive simulation study to assess the impact of selection bias on a typical Mendelian randomization investigation. We considered inverse probability weighting as a potential method for reducing selection bias. Finally, we investigated whether selection bias may explain a recently reported finding that lipoprotein(a) is not a causal risk factor for cardiovascular mortality in individuals with previous coronary heart disease. Results Selection bias had a severe impact on bias and Type 1 error rates in our simulation study, but only when selection effects were large. For moderate effects of the risk factor on selection, bias was generally small and Type 1 error rate inflation was not considerable. Inverse probability weighting ameliorated bias when the selection model was correctly specified, but increased bias when selection bias was moderate and the model was misspecified. In the example of lipoprotein(a), strong genetic associations and strong confounder effects on selection mean the reported null effect on cardiovascular mortality could plausibly be explained by selection bias. Conclusions Selection bias can adversely affect Mendelian randomization investigations, but its impact is likely to be less than other biases. Selection bias is substantial when the effects of the risk factor and confounders on selection are particularly large.


Appendix -Additional simulations
We provide additional simulations to further investigate which aspects of a Mendelian randomization study affect the magnitude of selection bias and the performance of inverse probability weighting.

A1 Direction of selection bias
We explored the relationship between the direction of confounder effects on the risk factor and outcome, and the direction of selection bias. For the baseline simulation of Scenario 1, where selection depends only on the risk factor, we varied the signs of these two parameters in our simulations, letting α U = ± √ 0.5 and β U = ± √ 0.5. Results are reported in Supplementary   Table A1. The simulation results indicate that the causal effect is biased downwards if the directions of the confounder effects on the risk factor and the outcome are the same, and upwards otherwise.
[Supplementary Table A1] Note that we have made the simplifying assumption that the confounder U represents the cumulative effect of all possible sources of confounding for the risk factor-outcome association, so α U and β U represent the total effect of all confounders on the risk factor and the outcome. In practice, the signs of these parameters may be difficult to determine if different confounders have opposite effects on the risk factor or the outcome.
We also performed additional simulations, summarized in Supplementary Table A2, to assess the direction of selection bias when selection depends on both the risk factor and the confounder. For simplicity, we focus only on the direction of bias and ignore its magnitude.

A1
[Supplementary Table A2] A change in the direction of bias (a "±" or a "∓" sign) is observed when the signs of γ U and γ X α U are different. Intuitively, these parameters express the direct (γ U ) and indirect (γ X α U , mediated by the risk factor) effect of the confounder on selection. If these two effects act on the same direction, the direction of selection bias is determined again by the effects α U , β U of the confounder on the risk factor and the outcome, as in Supplementary Table A1.
When γ U and γ X α U have opposite signs, the confounder affects selection in two opposite ways. Selection bias due to the confounder effect as mediated by the risk factor acts in the direction dictated by the α U , β U coefficients, as discussed previously, while selection bias due to the direct effect of the confounder on selection acts in the opposite direction.
The relative magnitudes of γ X α U and γ U determine which effect is stronger, and hence the direction of bias. In the simulations in the main body of the paper (Scenario 5), γ X was the only parameter whose value we varied, so the direction of bias depended on that parameter.

A2 Selection bias for a non-null causal effect
To investigate whether selection bias depends on the true value of the risk factor-outcome causal effect, we reproduced the simulations of Table 1 with the causal effect parameter set to β X = 0.5 instead of β X = 0. Supplementary Table A3 contains the results of this simulation. The magnitude of selection bias was very similar to that reported in Table 1.
This implies that when selection only depends on the risk factor, the magnitude of selection bias is independent of the value of the causal effect β X . Similar results (not reported here) were obtained for a range of different β X values, as well as for a negative causal effect (β X = −0.5).
[Supplementary Table A3] A2 A3 Outcome-dependent selection mechanism The selection mechanism used in the simulations of Tables 1 and 2 depended only on the risk factor, except in Scenario 5 where selection also depended on the confounder. Here, we consider an alternative selection procedure, in which selection depends on the outcome and possibly on the confounder (see Figure 1.b of the main document). Applications in which selection depends on the outcome are not uncommon in practice. For example, consider an analysis studying a disease outcome, where data are collected from hospital admission registries. Selection bias on the outcome will occur, since hospitalized individuals are more likely to suffer from the disease studied. Survivor bias can also arise as a result of selection on the outcome, for example if a study samples individuals at random from an elderly population and the outcome studied is all-cause mortality or relates to a life-threatening disease such as cancer.
To implement the simulations, we modified the data-generating model by letting the probability of selection depend on the outcome and the confounder: Simulations were performed by varying the strength of the outcome-selection parameter γ Y , allowing it to take values −2, −1, −0.5, −0.2, 0, 0.2, 0.5, 1, 2, and the confounder-selection parameter γ U , allowing it to take values 0 and +1. All other parameters were the same as in Scenario 1.
As illustrated in Supplementary Table A4, there is no selection bias under the null causal hypothesis (β X = 0). Additionally in this case, nominal Type 1 error rates are maintained.
Therefore a Mendelian randomization study in which selection depends only on the outcome (and possibly on the confounder) will not lead to false positive results. It is possible that a null finding may be a false negative result due to selection bias, but it is somewhat less likely -it would only occur if selection bias was of the same magnitude as the causal effect and acted in the opposite direction.
On the other hand, when the causal effect parameter β X is non-zero, causal effect estimates exhibit noticeable bias for strong selection effects. The simulation results of Table A4 illustrate that the direction of selection bias is the same as in simulations with selection on A3 exposure. The magnitude of bias is slighly reduced compared to the selection-on-exposure simulations (Table 1) and the corresponding standard errors are also lower.
[Supplementary Table A4] A4 Binary outcomes So far in our simulations, we have focused on quantifying the effect of selection bias in Mendelian randomization studies with a continuous outcome variable. Studying a binary outcome (such as disease status) is also common in practice, so we briefly investigate this case here. We note that in the context of genetic association studies, a few authors have already suggested that the impact of selection bias may only be modest when a binary outcome is studied (see [9] and references therein).
We performed a set of simulations using a logistic-linear model to simulate the binary outcome, as in the lipoprotein(a) application. In this case, the causal effect represents the log odds ratio for the outcome per unit increase in the risk factor. In our simulations, we set the causal effect equal to β X = 0 and let the remaining parameters take the same values as in Scenario 1. We then varied the constant term β 0 , which dictates the prevalence of the disease outcome in the population. We allowed β 0 to take values 0, −1.4 and −3, corresponding approximately to an average disease prevalence of 50%, 20% and 5% respectively.
[Supplementary Table A5] A4 Results are reported in Supplementary Table A5. Selection bias is present here, and the magnitude of bias is similar to that for a continuous outcome. The disease prevalence parameter β 0 has practically no effect on the magnitude of selection bias, but a rare disease (small β 0 ) is associated with an increased standard error for the causal effect estimate.
In general, the standard error will be minimized when disease frequency is about 50%, as happens in a case-control setting.
It is perhaps worth discussing case-control studies in more detail, since they are a common example of epidemiological studies with a binary outcome and Mendelian randomization is sometimes performed on case-control data. In principle, selection into a case-control study depends on the outcome; however, this dependence will not necessarily introduce bias. In a well-designed case-control study, the cases will constitute a random subsample of the population of cases (or even the entire population, if data is available) and the controls will be a random subsample of the population of controls. The only atypical aspect of such a sample compared to the overall population is the frequency of cases, and this is not enough to cause selection bias as Supplementary Table A5 illustrates. Similar findings have been reported outside the context of Mendelian randomization (for example, [9]).
Finally, additional simulations (not reported here) suggested that the performance of inverse probability weighting with a binary outcome is similar to that with a continuous outcome.
A5 Inverse probability weighting with a misspecified weighting model Inverse probability weighting can yield biased estimates if the model for computing the weights is misspecified. Nevertheless, for the simulations in this paper, selection depended on the risk factor and the confounder but a reasonable approximation to the true causal effect was obtained via weighting by the risk factor only.
The behaviour of inverse probability weighting can be significantly worse if the confounder only has a weak influence on the risk factor. We illustrate this by conducting a simulation similar to that of Table 3, with a weak confounder effect on the risk factor. We set γ U = 1 and α U = √ 0.1 and leave the other parameters unchanged. Results are presented in Supplementary Table A6.

A5
In this simulation, causal effect estimates are subject to significant bias when the risk factor-selection effect is strong. This is the case even when using the inverse probability weighting approach. Again, trimming weights was of little consequence in this example.
[Supplementary Table A6] A6 Simulation Tables  Table A1: Median, standard deviation (SD), median standard error and 5% empirical Type 1 error rate of-causal effect estimates, for varying directions of the confounderexposure (α U ) and the confounder-outcome (β U ) effects. Table A2: Direction of selection bias of causal effect estimates when selection depends on the risk factor and the confounder. "+": upward bias, "−": downward bias, "±": upward bias for moderate X-S associations, downward bias for strong associations, "∓": downward bias for moderate X-S associations, upward bias for strong associations.  Table A3: Mean, median, standard deviation (SD), median standard error and empirical power to reject the null causal hypothesis at a 5% significance level for causal effect estimates in Scenario 1, with the true causal effect set to β X = 0.5.  Table A4: Median, standard deviation (SD), median standard error and empirical power to reject the null causal hypothesis at a 5% significance level (for β X = 0, this is equal to the empirical Type 1 error rate) for causal effect estimates where selection depends only on the outcome (γ U = 0) or on the outcome and the confounder (γ U = 1).  Table A5: Median, standard deviation (SD), median standard error and 5% empirical Type 1 error rate for risk factor-outcome causal effect estimates, in simulations with a binary outcome and a varying outcome frequency (50%, 20% and 5%, for β 0 = 0, −1.4, −3 respectively) for different values of the selection effect (γ X ).   Table A6: Median, standard deviation (SD), median standard error (med SE) of estimates and empirical Type 1 error rate (%) for risk factor-outcome causal associations with a misspecified inverse probability weighting model (γ U = 1) and a weak confounder-risk factor effect (α U = √ 0.1), for different values of the selection effect (γ X ).