MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy

Abstract The proliferation of genome-wide association studies (GWAS) has prompted the use of two-sample Mendelian randomization (MR) with genetic variants as instrumental variables (IVs) for drawing reliable causal relationships between health risk factors and disease outcomes. However, the unique features of GWAS demand that MR methods account for both linkage disequilibrium (LD) and ubiquitously existing horizontal pleiotropy among complex traits, which is the phenomenon wherein a variant affects the outcome through mechanisms other than exclusively through the exposure. Therefore, statistical methods that fail to consider LD and horizontal pleiotropy can lead to biased estimates and false-positive causal relationships. To overcome these limitations, we proposed a probabilistic model for MR analysis in identifying the causal effects between risk factors and disease outcomes using GWAS summary statistics in the presence of LD and to properly account for horizontal pleiotropy among genetic variants (MR-LDP) and develop a computationally efficient algorithm to make the causal inference. We then conducted comprehensive simulation studies to demonstrate the advantages of MR-LDP over the existing methods. Moreover, we used two real exposure–outcome pairs to validate the results from MR-LDP compared with alternative methods, showing that our method is more efficient in using all-instrumental variants in LD. By further applying MR-LDP to lipid traits and body mass index (BMI) as risk factors for complex diseases, we identified multiple pairs of significant causal relationships, including a protective effect of high-density lipoprotein cholesterol on peripheral vascular disease and a positive causal effect of BMI on hemorrhoids.


Validity of instrumental variables
Denote G ∈ R n×p the matrix for p SNPs among n independent individuals, x ∈ R n×1 and y ∈ R n×1 the corresponding exposure and outcome variable, respectively, and U ∈ R n×q the matrix for q confounding factors among n samples. The SNP-exposure and SNP-outcome true effects are denoted as γ ∈ R p×1 and Γ ∈ R p×1 , respectively, for all p SNPs. Then, we can depict the causal model through a directed acyclic graph in Figure S1, where the dashed line represents the horizontal pleiotropy of genetic variants on outcome, denoted as α ∈ R p×1 . We start from the classical assumption that all IVs are valid (namely, core assumptions include: all γs are non-zero, SNPs are independent of the confounding factors, and no horizontal pleiotropy holds), where the associations of the exposure and the outcome can be framed as the following linear structural model [5]: where β 0 is the effect size of the exposure on the outcome, g j is the j-th column of G, η x and η y are effects on confounding factors for exposure and outcome, respectively, and x and y are independent random noises. Importantly, in this model (S1, β 0 can be interpreted as the causal effect of the exposure on the outcome as long as the core assumptions for IV are satisfied [9,4,5]. Clearly, the effect size of genetic variant g j on the outcome y can be explicitly expressed as Γ j = β 0 γ j . Thus Γ j is linear to γ j , and β 0 can be interpreted as the causal effect between exposure and outcome in the study [19]. In practice, horizontal pleiotropy is abundant in complex traits and violation of such an assumption can induce severe bias in MR analysis. To investigate the impact of horizontal pleiotropy, one may consider a modified linear structural model [19]: x = p j=1 g j γ j + Uη x + x , y = p j=1 g j α j + β 0 x + Uη y + y , where genetic variants have direct effects on the outcome, denoted as α = [α 1 , . . . , α p ] T . Implicitly, model (S2) also assumes a linear relationship between γ j and Γ j but with a intercept deviating from origin: Γ j = α j + β 0 γ j , which is essential to model causality in the presence of horizontal pleiotropy. The linear structural models (S1) and (S2) are not useful in practice as confounding factors are usually not observed in observational studies. However, these two models shed a fundamental insight into modeling causality using summary statistics, namely two-sample MR analysis. Figure S1: Causal diagram representing three IV assumptions.

Model for MR-LD
Suppose that all genetic variants satisfy with the core assumptions for an IV model. Invoking by [9,19], the linear structural model (S1) suggests that we can model the causal effect in the absence of pleiotropy as where β 0 is the causal effect of interest. We assign a Gaussian prior on each γ k , that is γ ∼ N (0, σ 2 γ I p ). The Gaussian prior is widely used in genetics studies due to polygenicity [16,20,21], which offers a great computational advantage over the complicated ones. Combining Equations (3) and (S3), the likelihood for summary statistics from SNP-outcome can be writen as Taking γ as the latent variables, the completely-data likelihood can be written as follows where θ = {β 0 , σ 2 γ } is the collection of model parameters. The marginal likelihood can be obtained through integrating out the latent variable γ, that is Note that we further expand MR-LD (S5) as the following equation.
We could obtain the expression of log q(γ j ) as follows with the help of equation (10.9) in [2] log q(γ j ) = where the notation E q −j [·] denotes an expectation with respect to the q distributions over all variables γ i , for i = j. Hence the posterior distribution log(γ j ) should be Gaussian N (µ j , σ 2 j ) with parameters where γ k def = E q (γ k ).

Variational M-step:
As illustrated before, the ELBO of the marginal log-likelihood can be written as follows given the current estimates of all parameters θ.
The second term is independent with the parameters θ, the first term is the expectation of log-complete-data likelihood evaluated under the variational distribution, which can be written as The second equation is due to the fact: if x ∼ N (x|µ, Σ), then E(x T Ax) = µ T Aµ+T r(AΣ) for any symmetric matrix A.

Statistical Inference for MR-LD
As VB searches within a factorizable family for posterior distribution, one can only obtain an approximation for the posterior distribution of latent variables. Earlier works showed that VBEM provides useful and accurate posterior mean estimates [3]. Despite its computational efficiency and accuracy for estimating posterior mean, VB suffers from under-estimating the variance of target distribution Thus, the ELBO from VB-type algorithm cannot be used directly as a proxy to log-likelihood. In this paper, we follow Yang et al. [17] and adopt the similar strategy to calibrate ELBO as well as mitigate the bias of variance. We first set up the following hypothesis test to formally examine the significance between the exposure and the outcome: A likelihood ratio test (LRT) statistic is given by where θ and θ 0 are parameters obtained by maximizing the marginal likelihood, under both the null hypothesis and alternative hypothesis, respectively. As the marginal likelihood cannot be obtained but the ELBO with accurate posterior means, we calibrate the ELBO(denoted by L(θ, µ γ )) as follows where µ γ and Σ γ are the posterior mean and posterior variance for latent variable γ and the form of Σ γ is from EM/PX-EM by plugging the posterior mean estimates and parameter estimates from VBEM/PX-VBEM which can be expressed as In summary, we recalibrate the ELBO for MR-LD as follows: 1. Obtain parameters together with variational means under both H 0 and H 1 .
3. Calculate the test statistic (S4) using the recalibrated ELBO as a proxy to the marginal log-likelihood.
As demonstrated in Yang et al. [17], this procedure can produce the calibrated ELBO that is highly correlated with the marginal log-likelihood obtained from EM algorithm. Moreover, our empirical results in validation studies show that this procedure works well.
Therefore, we can denote the posterior of γ by

Statistical Inference for MR-LDP
The procedure of statistical inference for MR-LDP is similar with MR-LD, we first calibrate the ELBO(denoted by L(θ, µ γ , µ α )) as follows where µ γ and Σ γ are the posterior mean and posterior variance for latent variable γ, µ α and Σ α are the posterior mean and posterior variance for latent variable α, the forms of Σ γ ) and Σ α ) are from EM/PX-EM by plugging the posterior mean estimates and parameter estimates from VBEM/PX-VBEM which can be expressed as Similarly, we recalibrate the ELBO for MR-LDP as follows: 1. Obtain parameters together with variational means under both H 0 and H 1 .
3. Calculate the test statistic (S4) using the calibrated ELBO as a proxy to the marginal log-likelihood.

More simulation results for different settings
The results of type-I error and point estimates for the dense pleiotropy with n 3 = 2, 500; 4, 000 are shown in Figure S2 -S3, respectively. For sparse horizontal pleiotropy, we consider the sparsity level at 0.2 and 0.4. Figure S4 and To show robustness of proposed methods, we conducted additional simulations using non-generative distributions. Specifically, for each j = 1, · · · , p, both γ j and α j are from t-distributions with degrees of freedom either at 5 or 10. The result can be found in Figure S9.
As one can see, our method still controls type-I error at nominal level 0.05 while point estimates are roughly unbiased.
Moreover, we conducted simulations to make comparisons of statistical power for all methods in Figure S10, where we varied heritability of indirect effect γ with h 2 γ ∈ {0, 0.01, 0.02, 0.03, 0.04, 0.05}. As statistical power is only meaningful when type-I error is under control, we conducted SNPs pruning for alternative methods to ensure that their type-I error is controlled at 0.05 level for null. As shown in Figure S10, we controlled type-I error at its nominal level 0.05 under the null when h 2 γ = 0. Moreover, one can observe that when there is no horizontal pleiotropy, power for MR-LD and MR-LDP are similar. When their exists horizontal pleiotropy, MR-LD cannot control type-I error and thus the power is inflated but MR-LDP still controls type-I error at nominal level with improved power. Finally, statistical power from MR-LDP outperforms those from alternative methods especially when LD is high.
To mimic the real world, we adopted a screening dataset to select the significant SNPs in the following simulation study. In details, the screening dataset are generated in the way as the exposure dataset, where G s , γ s , U s , η s and e s are independent and identically distributed with their corresponding counterparts, G 1 , γ, U x , η 1 and e 1 . The genotype matrix is G s ∈ R n 0 ×p with n 0 = 20, 000, p = 20, 000. The effect size γ s is a p-dimensional vector, where 90% of γ are zero and the rest nonzero entries of γ were simulated from N (0, σ 2 γ ). This setting is realistic as [6] argued that complex traits are highly polygenic, e.g., for human height, there are around 10% of all SNPs associated with it. After conducting the single-variant analysis between G s and x s , we selected the genetic variants with p-value less than 0.01. The corresponding summary statistics for the selected genetic instruments in both SNP-exposure and SNP-outcome were chosen to complete the subsequent MR analysis. As shown in Figure S11, MR-LD and MR-LDP perform equally well in type-I error control in the case of no horizontal pleiotropy. When horizontal pleiotropy is getting larger (h 2 α = 0.05 or 0.1), MR-LDP is still capable of controlling type-I error at its nominal level. In addition, both our methods and GSMR are unbiased in point estimates while our methods have smaller standard errors.
In addition, we consider the binary outcome in the following simulations. In details, we generated data for binary outcome using the following logistic model: where H(t) = 1/(1 + exp(−t)).
The population prevalence was set to be 0.1. We first generated a large population pool of outcomes and sampled 10, 000 cases and 10, 000 controls for the following analysis. Since the odds ratio is not collapsible, the original estimation of casual effect is biased [10], in light of [7] and [18], we should correct the casual effect using the formula β 0 = β/ 1 + (σ/1.7) 2 , where σ 2 can be estimated using var((βx + G 2 α + U y η y ). We display the result of simulation study in Figure S12. Clearly, the results are generally similar to those in quantitative traits. Only MR-LDP can control type-I error when there are both strong LD and horizontal pleiotropy.
The standard errors of MR-LDP is smaller due to the inclusion of all SNPs within LD.

CAD-CAD study
We display the scatter plot of γ (C4D) against Γ (CAD1) in Figure S13, each point is augmented by the standard error of γ i and Γ i on the vertical and horizontal sides. In Figure   S14, we report detailed results of CAD-CAD study with the shrinkage parameter λ = 0.15.

Height-Height study
We display the scatter plot of γ (height for males) against Γ (height for females) in Figure   S15, each point is augmented by the standard error of γ i and Γ i on the vertical and horizontal sides. In Figures S16 and S17, we report detailed results of Height-Height study with the shrinkage parameters λ = 0.1 and λ = 0.15, respectively.  Figure S16: The result of Height-Height using UK10K as the reference penal with shrinkage parameter λ = 0.1 under different p-value thresholds to choose genetic variants in the screening dataset, e.g., p-value = 5e-6, 1e-05. MR-LD, MR-LDP and LS methods use all SNPs selected by the screening dataset (denoted as P All ), but IVW, MR-Egger, RAPS and GSMR use pruned SNPs, where the default value of r 2 is used for GSMR (the number of SNP used: P GSMR ) and r 2 = 0.001 is used for IVW, MR-Egger and RAPS (the number of SNP used: P Ind ). Figure S17: The result of Height-Height using UK10K as the reference penal with shrinkage parameter λ = 0.15 under different p-value thresholds to choose genetic variants in the screening dataset, e.g., p-value = 5e-6, 1e-05. MR-LD, MR-LDP and LS methods use all SNPs selected by the screening dataset (denoted as P All ), but IVW, MR-Egger, RAPS and GSMR use pruned SNPs, where the default value of r 2 is used for GSMR (the number of SNP used: P GSMR ) and r 2 = 0.001 is used for IVW, MR-Egger and RAPS (the number of SNP used: P Ind ).

Applications to the effect of lipids and BMI on common diseases 4.2.1 Lipids-disease outcome
The association result of lipids on common human diseases with shrinkage parameter λ = 0.15 is presented in Table S1, where the threshold for selecting instrumental variants in the screening dataset is set to 1 × 10 −4 . Furthermore, we analysis the association of HDL-C on CAD1, CAD2, and PVD using a sequence of thresholds with different shrinkage parameters, the results are illustrated in Figures S18 -S22.  Table S1: Causal associations of lipids on common diseases using UK10K as the reference penal with the shrinkage parameter λ = 0.15. MR-LDP uses all SNPs selected by the screening dataset (denoted as P All ), but IVW, MR-Egger, RAPS and GSMR use pruned SNPs, where the default value of r 2 is used for GSMR (the number of SNP used: P GSMR ) and r 2 = 0.001 is used for IVW, MR-Egger and RAPS (the number of SNP used: P Ind ). Statistically significant results are indicated in blue.

BMI-disease outcome
Similarly, the association result of BMI on common human diseases with shrinkage parameter λ = 0.15 is showed in        6.3 Fit MR-LDP using CAD-CAD study.
Furthermore, we give an example to illustrate the implements of MR.LDP for real data analysis. The following datasets("heart attack myocardial infarction.txt", "c4d.txt", "cardiogram.txt", "all chr 1000G.bed", "all chr 1000G.fam", "all chr 1000G.bim", "fourier ls-filescreen, fileexposure, fileoutcome are the datasets names for screen, exposure and outcome, respectively. These three datasets must have the following format (note that it must be tab delimited):