Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways

Background: Mendelian randomization uses genetic variants, assumed to be instrumental variables for a particular exposure, to estimate the causal effect of that exposure on an outcome. If the instrumental variable criteria are satisfied, the resulting estimator is consistent even in the presence of unmeasured confounding and reverse causation. Methods: We extend the Mendelian randomization paradigm to investigate more complex networks of relationships between variables, in particular where some of the effect of an exposure on the outcome may operate through an intermediate variable (a mediator). If instrumental variables for the exposure and mediator are available, direct and indirect effects of the exposure on the outcome can be estimated, for example using either a regression-based method or structural equation models. The direction of effect between the exposure and a possible mediator can also be assessed. Methods are illustrated in an applied example considering causal relationships between body mass index, C-reactive protein and uric acid. Results: These estimators are consistent in the presence of unmeasured confounding if, in addition to the instrumental variable assumptions, the effects of both the exposure on the mediator and the mediator on the outcome are homogeneous across individuals and linear without interactions. Nevertheless, a simulation study demonstrates that even considerable heterogeneity in these effects does not lead to bias in the estimates. Conclusions: These methods can be used to estimate direct and indirect causal effects in a mediation setting, and have potential for the investigation of more complex networks between multiple interrelated exposures and disease outcomes.

A controlled direct effect is defined as the effect of a change in the exposure keeping the mediator fixed at a given level, say Z = z [4,5]. The controlled direct effect may depend on the choice of z: A natural direct effect is defined as the effect of a change in the exposure with the mediator fixed at the level it would naturally take if the exposure were fixed at a given level, say X = x: A natural indirect effect is defined as the effect of a change in the mediator from the value it would naturally take if the exposure were unchanged to the level it would take if the exposure were changed. The exposure itself is kept fixed at a given level, say X = x + 1: In the linear case, the natural direct and indirect effects represent a decomposition of the total effect, in that T E(x, x + 1) = N DE(x;x,x + 1) for all values of Z = z 1 , z 2 , and for all individuals, the controlled direct effect is equal to the natural direct effect [4]. The natural direct effect has a clearer intuitive interpretation as a measure of mediation than the controlled direct effect, which can be interpreted even if Z is not a mediator. However, it is not possible to conceive of an experiment which would produce the natural direct effect, as the quantity requires the outcome if the exposure were set at two different levels (for example, in N DE(x; x, x+ 1), Y (x + 1, Z(x)) requires X = x + 1 for Y , but X = x for Z). This is known as a "cross-world" quantity, as setting the exposure to two different values is only possible in two different worlds [6]. More generally, in a non-parametric context, evaluation of natural direct and indirect effects requires the distribution of Y (x, Z(x ′ )). This can only be evaluated under the assumption that Y (x, z) is independent of Z(x ′ ) for x ̸ = x ′ . This is a cross-world assumption and cannot be empirically verified. Even if the distributions of Y (x, z) and Z(x) can be estimated, for example using instrumental variables, it is not possible to express an estimate of the natural direct or indirect effect without making the cross-world assumption. In contrast, estimation of the controlled direct effect does not require any cross-world assumption, and can be obtained directly at a given value of X = x and Z = z from estimates of the distributions of Y (x, z) and Z(x).

A.2 Impact of interactions on estimates of the direct and indirect effect
To assess the impact of an interaction between X and Z in their effect on Y on estimates of the direct and indirect effects, we perform further simulations. Data were simulated on 5000 individuals indexed by i from the following data-generating model: This model is the same as that considered in the main paper, except that an additional term (γ XZi x i z i ) has been added to the data-generating model for Y to allow for an interaction between X and Z. We consider three scenarios for the parameter values (µ γ XZ , ψ 2 ), the mean and variance of γ XZi : 1. µ γ XZ = 0, ψ 2 = 0.3 2 : interaction is present at an individual level, but absent on average. The average direct and indirect effects of X on Y controlling for Z are µ γ X = 1 and µ β X µ γ Z , as before. An equivalent model could be achieved by allowing omitting the additional term (γ XZi x i z i ) and allowing the γ Xi and γ Zi parameters to be correlated in their distributions.
In both the second and third scenarios, the average direct and indirect effects depend on the interaction between X and Z, and the individual-level direct effects will depend on the value of Z. All other parameters take the same values as in the simulation study in the main paper.
For scenario 1, we present estimates of the direct and indirect effect, and compare these with the theoretical values (Web Table A1). For scenarios 2 and 3, we present estimates of the direct effect only, and compare this with the average direct effect, calculated by adding one to the exposure for each individual in the data-generating model for the outcome but keeping the mediator constant (Web Table A2).
We see that estimates of the direct and indirect effects, which are similarly estimated by regression-based and SEM methods, are not substantially biased by the presence of a zero mean interaction term. However, with non-zero mean interaction, estimates of the direct effect differ somewhat from the average direct effect. If an interaction between the exposure and mediator is expected, this can be modelled explicitly using the multiple-stage least squares approach [7].  Web Table A1: Mean estimates of the direct and indirect effects of X on Y controlling for Z from regression-based and structural equation model (SEM) methods in simulation study with zero mean interaction between X and Z (Scenario 1) Scenario 2: Non-zero mean interaction, homogeneous across individuals Regression-based SEM µ β X µ γ Z Average direct effect τ 2 = 0 0.2 2 0.4 2 τ 2 = 0 0.2 2 0. Web Table A2: Mean estimates of the direct effect of X on Y controlling for Z from regression-based and structural equation model (SEM) methods in simulation study with non-zero mean interaction between X and Z (Scenarios 2 and 3)

A.3 Impact of heterogeneity in the genetic effects on estimates of the direct and indirect effect
To assess the impact of heterogeneity in the genetic effects of G X on X and of G Z on Z on estimates of the direct and indirect effects, we perform further simulations. Data were simulated on 5000 individuals indexed by i from the following data-generating model: This model is the same as that considered in the main paper, except that the fixed coefficients α G and β G are replaced with draws from normal distributions α Gi and β Gi for each individual i. The mean values of these distributions are set at µ α G = 0.3 and µ β G = 0.5 when µ β X = 1 and µ β G = 0.36 when µ β X = −1. These are the same as the values of α G and β G in the original set of simulations. All other parameters take the same values as in the simulation study in the main paper.
Results are given in Web Table A3. No material differences are observed from those in the original simulation study in the main paper. We repeated the simulation except modelling the coefficients α Gi and β Gi by a multivariate normal distribution with correlation 0.4 and −0.4; almost identical results were obtained, with differences between mean values of estimates compatible with chance variation (results not shown).  Web Table A3: Mean estimates of the direct and indirect effects of X on Y controlling for Z from regression-based and structural equation model (SEM) methods in simulation study with heterogeneous genetic effects on X and Z

A.4 Impact of correlations in the causal effect parameters
In the simulations in the main paper, the causal effect parameters β Xi , γ Xi , and γ Zi were allowed to vary between individuals, but they were assumed to vary independently. We perform a further simulation to consider estimates of direct and indirect effects when the parameters vary dependently. Specifically, the vector (β Xi , γ Xi , γ Zi ) T for each individual i is drawn from a multivariate normal distribution with mean (µ β X , µ γ X , µ γ Z ) and variance-covariance matrix consisting of diagonal elements τ 2 and off-diagonal elements ρτ 2 , where ρ is taken to be +0.4 and −0.4. This means that the correlation between each pair of β Xi , γ Xi , and γ Zi is ρ. All other aspects of the simulation (including the data-generating model and the parameter values) are taken as in the original set of simulations in the main paper. Results are given in Web Table A3. No material differences are observed from those in the original simulation study in the main paper for estimates of the indirect effect. Slightly increased estimates of the direct effect are observed with ρ = +0.4, and slightly decreased estimates with ρ = −0.4, with bias increasing as the heterogeneity parameter τ increases. Web Table A4: Mean estimates of the direct and indirect effects of X on Y controlling for Z from regression-based and structural equation model (SEM) methods in simulation study with correlations (ρ = ±0.4) in causal effect parameters of X on Z, X on Y , and Z on Y