Detecting and correcting for bias in Mendelian randomization analyses using Gene-by-Environment interactions

Abstract Background Mendelian randomization (MR) has developed into an established method for strengthening causal inference and estimating causal effects, largely due to the proliferation of genome-wide association studies. However, genetic instruments remain controversial, as horizontal pleiotropic effects can introduce bias into causal estimates. Recent work has highlighted the potential of gene–environment interactions in detecting and correcting for pleiotropic bias in MR analyses. Methods We introduce MR using Gene-by-Environment interactions (MRGxE) as a framework capable of identifying and correcting for pleiotropic bias. If an instrument–covariate interaction induces variation in the association between a genetic instrument and exposure, it is possible to identify and correct for pleiotropic effects. The interpretation of MRGxE is similar to conventional summary MR approaches, with a particular advantage of MRGxE being the ability to assess the validity of an individual instrument. Results We investigate the effect of adiposity, measured using body mass index (BMI), upon systolic blood pressure (SBP) using data from the UK Biobank and a single weighted allelic score informed by data from the GIANT consortium. We find MRGxE produces findings in agreement with two-sample summary MR approaches. Further, we perform simulations highlighting the utility of the approach even when the MRGxE assumptions are violated. Conclusions By utilizing instrument–covariate interactions in MR analyses implemented within a linear-regression framework, it is possible to identify and correct for horizontal pleiotropic bias, provided the average magnitude of pleiotropy is constant across interaction-covariate subgroups.


Introduction
Mendelian randomization (MR) has developed into a multifaceted approach to assessing causal relationships in epidemiology. 1,2 In many cases, MR analyses employ genetic variants as instrumental variables (IVs), allowing consistent estimation of causal effects in the presence of unmeasured confounding. This requires candidate variants to be associated with the exposure of interest (IV1), to be independent of confounders of the exposure and outcome (IV2) and to be independent of the outcome outside of the mediating effects of the exposure (IV3). 3 An instrument satisfying these assumptions is considered valid, although IV2 and IV3 cannot directly tested.
Pleiotropy plays a central role in MR analyses and can be subcategorized into vertical and horizontal forms. Vertical pleiotropy exists in cases where a single genetic variant influences a phenotype, which in turn influences another. 4 This is the primary mechanism underpinning the utility of MR in causal-effect estimation. However, a particular concern when applying MR is horizontal pleiotropy-occurring when a genetic variant is associated with a study outcome through biological pathways additional to the exposure of interest. 2,5 This violates assumption IV3, introducing bias into effect estimates in the direction of the horizontal pleiotropic (henceforth, pleiotropic) effect. 5,6 Where multiple instruments are available, one strategy is to combine causal estimates using each individual variant in turn within a meta-analysis framework. Provided the genetic variants are uncorrelated, an inverse-varianceweighted (IVW) estimate will be equivalent to two-stage least-squares (TSLS) regression and, where pleiotropy is suspected, MR-Egger, median and modal regression can be adopted as sensitivity analyses. 4,5,7,8 In the single instrument setting, Slichter regression has emerged from the econometrics literature as a method for evaluating instrument validity within a potential outcomes framework. 9,10 This involves observing or extrapolating to a population subgroup for which the instrument and exposure are independent (defined as a no-relevance group) and measuring the corresponding association between the instrument and outcome. The instrument-outcome association for a no-relevance group provides an estimate of pleiotropic effect and allows bias correction within a statistical model. Slichter regression builds upon several key developments in econometrics, in particular the identification and estimation of local average treatment effects put forward by Imbens and Angrist, 10 and the works of Card, 11 Conley et al. 12 and Small. 13 In this paper, we introduce Slichter regression within the context of epidemiology, formalizing the increasing use of gene-environment interactions in assessing instrument validity. [14][15][16][17][18][19][20] We present MR using Gene-by-Environment interactions (MRGxE) as a statistical framework and sensitivity analysis to identify and correct for pleiotropic bias in MR studies using gene-covariate interactions. Importantly, MRGxE can assess the validity of a single instrument, in contrast to methods examining heterogeneity across a set of MR estimates using many instruments, and is not reliant upon the existence of an observed no-relevance group. This represents an improvement upon similar methods such as Pleiotropy Robust Mendelian Randomization (PRMR) that, while sharing a similar intuitive framework, are reliant upon the existence of an actual no-relevance group being observed within the data, severely limiting the applicability of the approach. 21 Two features differentiate MRGxE from analogous methods in the econometrics literature. First, MRGxE adopts a linear-regression framework as opposed to utilizing local linear regression, improving the ease with which MRGxE can be implemented. Additionally, MRGxE can be applied using both individual-and summary-level data. Such data could be obtained from previously published studies where subgroup-specific estimates are provided or alternatively requested from consortia.
To illustrate the utility of MRGxE, we present an applied example examining the effect of body mass index (BMI) upon systolic blood pressure (SBP), utilizing data from the GIANT consortium and the full release of the UK Biobank (July 2017), respectively. 22 We find evidence suggesting a positive association between BMI and SBP, and substantial agreement between MRGxE and two-sample summary MR estimates. Finally, we conduct a simulation study demonstrating the effectiveness of the approach under varying conditions.

Non-technical intuition
Consider a situation in which the instrument-exposure association is found to vary between subgroups of the target population. We follow Slichter in defining an observed subgroup for which the instrument does not predict the exposure of interest as a no-relevance group. 9 As a valid IV can only be associated with the outcome of interest through the exposure, it follows that the IV would be independent of the outcome for the no-relevance group. An observed non-zero association for the no-relevance group therefore serves as evidence of pleiotropy.
This intuitive approach has been considered in several epidemiological studies. For example, Chen et al. 19 considered differences in drinking behaviour by gender in East Asian populations within a fixed-effects meta-analysis of the ALDH2 genetic variant and blood pressure. This interaction has received further attention in work such as Taylor et al. 23 and Cho et al. 14 Previous applications also extend beyond simple gender differences. For example, Tyrrell et al. identified genetically predicted BMI as a weaker instrument for participants experiencing lower levels of socio-economic deprivation, utilizing negative controls to examine residual confounding. 16 In presenting MRGxE, we highlight similarities to the approach of Cho et al., 14 in which a gender-ALDH2 interaction term was incorporated within a TSLS model to estimate the effect of alcohol consumption. We clarify how it works when individual-level data are available and crucially demonstrate how MRGxE extends this approach to summary data.

The MRGxE framework
Consider an MR study consisting of N participants (indexed by i ¼ 1; . . . ; N). For each participant, we record observations of a genetic instrument G i , an exposure X i , an outcome Y i and a further covariate Z i , which induces variation in the association between G i and X i through an interaction GZ i . The relationship between each variable is illustrated in Figure 1, with U representing a set of all unmeasured variables confounding X and Y, and I GZ representing the interaction term.
The exposure X is a linear function of G; Z; GZ; U and an independent error term, X , whilst the outcome Y is a linear function of G; Z; GZ; U; X and an independent error term, Y . Using c and b to denote regression coefficients for the first-and second-stage models, respectively, a two-stage model can be defined as: The causal effect of X on Y is denoted by b 1 and the pleiotropic effect of the instrument is b 2 . Note that regressing Y upon X would be prone to confounding bias and applying TSLS would give biased estimates when b 2 6 ¼ 0. This is demonstrated in the Supplementary Material, available as Supplementary data at IJE online.
MRGxE adopts a gene-covariate interaction as an instrument, subsequently placing restrictions on the interaction analogous to the IV assumptions. A suitable interaction GZ is therefore: GxE1: Associated to the exposure of interest (c 3 6 ¼ 0). GxE2: Not associated with confounders of the exposure and outcome (GZ ? U). GxE3: Not associated with the outcome outside of the exposure of interest (b 4 ¼ 0).
The first assumption is assessed by directly fitting the first-stage model. For the second assumption, it is important to stress that it pertains to the independence of the interaction with respect to confounders, and not G and Z individually. The third assumption requires pleiotropic effects remain constant across the population. Variation in pleiotropic effects can be driven by violations of the second assumption, as outlined in the following section.
A value of Z defining a no-relevance group (observed or hypothetical) can be derived as the covariate value Z ¼ z X at which G and X are independent, calculating the partial effect of G upon X and rearranging such that: This yields the trivial solution Where z X is observed in the population, regressing Y upon G for the subset of participants with Z ¼ z X provides a pleiotropy estimate (that is for b 2 Þ as the coefficient of G. Unfortunately, this is difficult to implement in practice, either because the value z X is not observed or the subset of participants is too small to provide sufficient power. Consequently, it is often appropriate to estimate pleiotropy at a theoretical (or extrapolated) no-relevance group, using differences in instrument-exposure associations across Z.
To illustrate how this is possible, a reduced-form IV model is constructed-that is, models for X given G, and Y given G by rewriting Model (1) as and Model (2) as The change in G À X and G À Y associations for a given change in Z can be identified as the coefficient of G in Models (5) and (6), respectively (with b 4 set to 0), as The Wald ratio 24 estimand for the causal effect of X on Y would then be equal to: This gives the causal effect, b 1 , plus a non-zero bias term whenever b 2 is non-zero. In the Cho et al. 14 analysis, an estimate for b 1 was obtained by performing TSLS regression using the interaction as the instrument; fitting Models (8) and (9) below: whereX i is the fitted value from Model (8). In this case, the coefficient b 2 represents the degree of pleiotropy for the genetic instrument G.
Whilst this approach does not require an observed no-relevance group, it has two limitations. First, as a consequence of utilizing TSLS, it is restricted to individual-level data. Second, it assumes an underlying linear model, which may not hold in practice. For example, if considering adiposity as an exposure, individuals at extreme values could be at greater risk, implying a curved relationship.
MRGxE overcomes these limitations by reframing the model within a two-sample summary MR context, and executing the following three-step procedure: 1. Estimate G À X and G À Y associations at a range of values of Z. 2. Regress the G À Y associations on the G À X associations within a linear regression. 3. Estimate the causal effect b 1 as the slope of the regression, and the mean pleiotropic effect as the intercept of the regression.
Let Z j denote the j th subgroup of Z (j ¼ 1; . . . ; JÞ. For each group Z j , we estimate the instrument-exposure association and standard error (Step 1) using the following regression model: Note that we include a subscript j to distinguish the regression parameters from the first-stage Model (1). The coefficient c j1 is therefore interpreted as the G À X association for group Z j . Next, we fit the corresponding instrument-outcome regression model (Step 2): We use d j1 to denote the G À Y association coefficient for group Z j , distinguishing Model (11) from Model (6). Thus, from Models (10) and (11), we obtain sets of G À X associations ðĉ J1 Þ and G À Y associations ðd J1 Þ across Z J subgroups. Finally, we regress the set ofd J1 estimates upon the set ofĉ J1 estimates (Step 3): In Model (12), b GxE0 is the pleiotropy estimate ðb 2 Þ, whilst b GxE1 is the effect of X upon Y correcting for pleiotropy (b 1 ). To illustrate, recall that b 2 represents a constant pleiotropic effect across Z. Model (12) is an average of the ratio estimates across Z J , with the bias parameter b 2 estimated as the intercept. A diagram illustrating these features is given in the Supplementary Material, available as Supplementary data at IJE online, accompanied by a demonstration of how the functional form of the interaction can be inferred from the distribution of the subgroup estimates.
To show how the intercept estimates b 2 , consider the reduced-form Model (6) evaluated for the no-relevance . Then, by substitution: Where the intercept is zero, the MRGxE causal-effect estimate is identical to an IVW estimate using the subgroup ratio estimates. This mirrors the equivalence of IVW and MR-Egger regression in the multiple instrument setting with balanced pleiotropy. R code for implementing MRGxE is provided in the Supplementary Material, available as Supplementary data at IJE online.
Before continuing, it is important to highlight several important factors to consider when implementing MRGxE. Initially, it is important to define an appropriate number of Z subgroups so as to accurately characterize the underlying gene-covariate interaction. Second, it is important to not transform effects to be positive, as performed for MR-Egger regression. This mischaracterizes the interaction term, attenuating causal-effect estimates. Finally, where instrument-exposure associations are present for all groups in the same direction, the accuracy in extrapolating the regression line towards a theoretical no-relevance group will be a function of the distance from the minimum Z j instrument-exposure association and variation in the set of Z j instrument-exposure associations. Further guidance and illustrations of these features of MRGxE are presented in the Supplementary Material, available as Supplementary data at IJE online.

The constant-pleiotropy assumption
As a single (constant) parameter, b 2 equates to the 'correct' intercept for MRGxE-that is, the intercept that must be estimated in order to identify the correct causal effect b 1 : Consistent estimates for both b 1 and b 2 are produced in cases where the pleiotropic effect remains constant across all values of Z ( b 4 ¼ 0). If b 4 6 ¼ 0, then the constant-pleiotropy assumption is violated, leading to the true pleiotropic effect b 2 being equated to b 2 À b 4 c 1 c 3 and bias in the causal estimand for b 1 such that: The derivation of this result is provided in the Supplementary Material, available as Supplementary data at IJE online. From Equation (14), it is clearly possible to mitigate such bias when the instrument-covariate interaction ðc 3 Þ is large relative to the variation in the pleiotropic effect b 4 , with the bias tending towards zero as c 3 increases. However, as it is not possible to directly estimate b 4 , justifying the relative effect sizes of the first-and second-stage interactions requires a priori knowledge.
Violations of the constant-pleiotropy assumption can result from specific confounding structures in the underlying true model. Specifically, there must be no downstream pathway from G to Z via the confounders U, no pathway from Z to G through U, and U cannot be a joint determinant of G and Z. Figure 2 shows four possible scenarios in which G and Z are associated with U: In Figure 2, Scenarios (a), (b) and (d) introduce bias in MRGxE estimates, whilst Scenario (c) and individual associations between either Z and G with U do not. Further details on the underlying mechanisms behind such bias are presented in the Supplementary Material, available as Supplementary data at IJE online. As a consequence, the range of interaction covariates suitable for use within MRGxE is not as restrictive as one might naively assume. In an MR context, there are limited cases in which a confounder will be a determinant of a genetic instrument, and this is only problematic where the confounder is simultaneously associated with the interaction covariate. It seems that MRGxE estimates will be most susceptible to bias where the instrument is a determinant of one or more confounders, which in turn are determinants of the interaction covariate. We recommend care be taken in examining such pathways and suggest MRGxE be implemented as one component of a series of sensitivity analyses, as with other such approaches. 4

MRGxE as a sensitivity analysis
In cases where the constant-pleiotropy assumption is assumed to be violated, MRGxE can still be applied in sensitivity analyses to select a subset of valid instruments. To demonstrate, consider that an invalid instrument can be detected, in principle whenever b 2 À b 4 c 1 c 3 6 ¼ 0, due to either b 2 6 ¼ 0, b 4 6 ¼ 0 or both. Consequently, MRGxE can be used to assess the validity of individual instruments, informing instrument selection and evaluating the appropriateness of their incorporation in allelic scores. There are, however, two important considerations when applying this approach. First, it is not possible to distinguish the average pleiotropic effect across interaction-covariate subgroups (b 2 ) from the change in pleiotropic effect between instrument-covariate subgroups b 4 ð Þ. It is therefore a test of invalidity due to either factor and cannot be used to correct MRGxE estimates directly. Second, MRGxE will incorrectly fail to detect invalid instruments (a Type II error) in the special case where b 2 is close to À b 4 c 1 c 3 .

Causal effect of BMI upon SBP
Previous observational, 25 randomized control trials 26 and MR [27][28][29] studies have reported evidence of a positive association between BMI and SBP. However, the magnitude of this association differs markedly between such studies, with observational studies often recording greater effect sizes than those using MR.
As an applied example, we perform two-sample summary MR and MRGxE analyses examining the effect of BMI upon SBP using variants identified from the GIANT consortium 22 and two non-overlapping random samples of UK Biobank. The decision to use two subsamples of the UK Biobank, as opposed to summary estimates from the GIANT consortium, is motivated by potential differences in the standardization of BMI between each sample. As MRGxE utilizes BMI values from the UK Biobank, selecting two subsamples for which BMI has been identically standardized allows a more effective comparison of the approaches.
The purpose of performing both two-sample summary and MRGxE analyses is to highlight the extent to which pleiotropic effect estimates obtained using MRGxE with a single instrument agree with conventional MR approaches. Initially, the UK Biobank sample contained a total of 502 614 individuals. From this sample, we excluded participants who failed to meet quality control, specifically in cases where genetic and reported sex conflicted, where sex chromosome karyotypes were putatively different from XX and XY, and individuals who were outliers with respect to heterozygosity and missing rates. Further, we removed participants of non-European ancestry and related individuals by preferentially removing individuals related to the greatest number of individuals until no related pairs remained. This resulted in a total of 358 928 participants being included in the analyses. In conducting a two-sample summary analysis, effect estimates and standard errors for 96 genetic variants identified by the GIANT consortium as being robustly associated with BMI ðp ¼ 5 Â 10 À8 Þ were obtained from a 50% random sample of the UK Biobank. 22 Corresponding estimates for each genetic variant with respect to SBP were obtained using the remaining UK Biobank sample. In contrast, MRGxE was implemented by constructing a weighted allelic score informed using estimates from the GIANT consortium. The MRGxE analysis can be viewed as analogous to two-sample summary MR, using instrument-exposure estimates for BMI as external weights and individual data from a separate sample to inform instrument-outcome association estimates. In each analysis, BMI, SBP and the weighted allelic score were standardized using a z-score transformation.

Two-sample summary analyses
We implement several two-sample summary MR methods utilizing the mrrobust software package 30 in Stata SE 14.0. 31 IVW provides estimates with greater precision than alternative summary approaches; however, as such estimates can exhibit bias in the presence of pleiotropy, MR-Egger regression, weighted median and weighted modal approaches are implemented as sensitivity analyses.
A range of methods are adopted in sensitivity analyses, as each method relies upon differing assumptions with respect to the underlying distribution of pleiotropic effects. MR-Egger regression requires the effect of genetic variants on the exposure to be independent of their pleiotropic effects on the outcome (InSIDE). 5 The weighted median requires more than 50% of the variants to be valid instruments accounting for weighting, 7 whilst the modal estimator assumes that the most frequent value of the pleiotropic bias across the set of genetic variants is zero (ZEMPA). 32

MRGxE analyses using Townsend Deprivation Index
In implementing MRGxE, Townsend Deprivation Index (TDI) was selected as a continuous covariate for which instrument strength varies, based on findings from previous studies. 16,33 TDI is a common derived measure of socioeconomic deprivation, using many variables such as car ownership, occupation type and educational attainment. 34 It is measured at an area level (electoral wards), with participants assigned a score based upon the area in which they lived. 34 Missing values were removed prior to performing the analysis, with observational and TSLS estimates presented in the Supplementary Material, available as Supplementary data at IJE online.

Simulation overview
To illustrate the effectiveness of MRGxE, and further consider the importance of the constant-pleiotropy assumption with respect to causal-effect estimation, we performed a simulation study within a two-sample MR framework. Considering a realistic case, two sets of simulations were performed, the first using a null causal effect ðb 1 ¼ 0) and indexed as A, and the second a positive causal effect ðb 1 ¼ 0:05) indexed as B. Individual-level data are generated, from which the necessary summary-data estimates are extracted. In each case, a total of 5 population subgroups are used from a sample size of 50 000, with further details provided in the Supplementary Material, available as Supplementary data at IJE online.
Four distinct cases were considered: The results for each case represent the mean values for 10 000 simulated datasets.

Analysis I: two-sample summary analysis
Estimates obtained from implementing each two-sample summary MR approach are presented in Table 1. All of the methods performed with the exception of SIMEXcorrected MR-Egger show evidence of a positive association between BMI and SBP. There also appears limited evidence of a pleiotropic effect, with the IVW estimate lying within the confidence intervals of both the weighted median and weighted modal estimates.

Analysis II: MRGxE using TDI
To perform MRGxE, we divided the sample using quantiles of TDI into 5, 10, 20 and 50 population subgroups, after which IVW and MRGxE estimates were produced. The results of each analysis are presented in Table 2, with IVW referring to an inverse-variance-weighted estimate using interaction-covariate subgroups.
The estimates in Table 2 largely agree with the twosample summary MR estimates in several aspects, with the direction of effect remaining consistent across each of the methods applied. This again implies a positive effect of BMI upon SBP. Considering the MR-Egger and MRGxE intercept estimates, there also appears to be little evidence of substantial pleiotropic bias. Constraining the MRGxE model to the intercept yields an estimate equivalent to the two-sample summary IVW estimate presented in Table 1.
Notably, whilst the MR-Egger estimates are consistent with the MRGxE estimates, the effects are markedly different, with MRGxE returning a positive point estimate greater in magnitude than the IVW estimate. The difference in these estimates can be attributed to the differing intercept estimates that, while close to zero in both cases, are different in terms of direction. As the MR-Egger and MRGxE intercept estimates lie within their overlapping confidence intervals, we would highlight this as a case where the discrepancy may be due to a lack of precision. In this case, it could be argued to be appropriate to conclude that there is a lack of robustly identified directional pleiotropic effect and adopt the IVW estimate. Figure 3 displays both the IVW and MRGxE estimates for the five-group case, whilst corresponding plots for other groups are presented in the Supplementary Material, available as Supplementary data at IJE online.
Considering Figure 3, the ordering of the TDI groups supports the assumption that the instrument-exposure association varies across levels of TDI. In particular, the least deprived groups (Group 1 and Group 2) have the weakest association, suggesting that genetically predicted BMI is a weaker predictor of BMI for participants experiencing lower levels of deprivation. A further observation is that the positioning of each estimate provides some evidence of a linear interaction, with instrument strength increasing monotonically as subgroup TDI increases. However, the close proximity of Groups 1 and 2, as well as Groups 3 and 4, could be indicative of non-linearity, as they could represent inflection points in the underlying distribution of the interaction (see the Supplementary Material, available as Supplementary data at IJE online, for inference guidelines).
One important consideration in performing MR analyses is that causal-effect estimates are often uncertain, due to Table 1. Two-sample summary MR estimates for the effect of body mass index (BMI) upon systolic blood pressure (SBP). A smoothing parameter (/ ¼ 1) was selected in implementing the modal estimator and a value I 2 GX ¼ 0:89 using MR-Egger is indicative of regression dilution of approximately 11% towards the null

Simulations
Results of the simulation analyses are presented in Table 3. The mean F statistic remains the same for each case, with substantial variation in the F statistic between interactioncovariate groups. This is essential, as the variation in instrument strength is representative of variation in instrument relevance across population subgroups. Estimates using IVW and MRGxE, as well as significance values, were taken directly from each regression output without using regression weights, as the variant-outcome associations have the same standard errors.
In the valid instrument case, both IVW and MRGxE provide unbiased effect estimates, though the IVW estimate is more accurate. This is similar to comparisons between IVW and MR-Egger regression, supporting use of IVW in cases where pleiotropy is absent. Type I error rates remained at approximately 5% for both IVW and MRGxE. In the second case, IVW exhibits bias, whilst MRGxE continues to produce unbiased estimates.
In the third case, the instrument is not valid, but pleiotropic effects change across population subgroups. Here, both IVW and MRGxE produce biased causal-effect estimates, with the MRGxE effect estimates showing a greater degree of bias than the IVW estimates. This increases the Type I error rate relative to IVW. In this situation, the MRGxE test for pleiotropy is particularly powerful, though this seeming increase in power can be attributed to violation of the constant-pleiotropy assumption b 4 6 ¼ 0 ð Þ leading to over-estimation of the magnitude of the pleiotropic effects. In the final case, both IVW and MRGxE produce estimates with similar bias and precision. Here, the MRGxE test for pleiotropy is suggestive of a null pleiotropic effect, remaining at 5%. This represents a situation in which b 2 ¼ b 4 , invalidating the use of MRGxE as a sensitivity analysis.

Discussion
In this paper, we present MRGxE as a simple and intuitive method to identify and correct for pleiotropic bias in MR studies using instrument-covariate interactions. MRGxE enables the pleiotropic effect of individual instruments (or single allele scores) to be assessed and, when such pleiotropy exists and satisfies the constant-pleiotropy assumption, MRGxE provides improved causal estimation compared with IVW. In the absence of such pleiotropy, the IVW approach is more accurate and should be preferred.
In cases where the constant-pleiotropy assumption is violated, a sensible approach would be to prune invalid variants using pleiotropy estimates from MRGxE and then implement IVW using valid variants. In this sense, MRGxE can be viewed very much as a tool for sensitivity analysis.

Two-sample summary MRGxE
Whilst this paper has focused on the application of MRGxE to individual-level data (albeit by extracting and then meta-analysing summary statistics obtained from it), it clearly applies where interaction-subgroup-specific summary data on instrument-exposure and instrument-outcome associations are available. An alternative approach would be to meta-analyse summary statistics obtained from many separate studies under the assumption that study-specific estimates relate to a study-specific characteristic. For example, the work of Robinson et al. 36 highlights the interaction between age and adult BMI heritability as one potential candidate, given that age is likely to vary naturally across contributing studies.

Limitations of MRGxE
A number of factors must be considered before implementing MRGxE. First, the constant-pleiotropy assumption is essential for causal estimate correction. If there is reason to believe that pleiotropic effects differ between population subgroups, then the approach will give misleading effect estimates. One useful aspect to this problem, however, is that, provided the first-stage interaction is sufficiently strong, bias from changes in pleiotropic effect may be sufficiently small as to be negligible in analyses. This may well be the case in situations such as the Cho et al. 14 study, where the difference in instrument effect between gender groups is very strong in comparison to potential variation in pleiotropic effect. In our analyses, the use of an allele score as a single (strong) instrument meant that it was naturally much more robust to bias than any individual component SNP. One strategy to overcome this limitation would be to carry out several analyses using differing interaction covariates. Provided that the instrument-covariate interaction of sufficient strength, it would be expected that resulting estimates would be in agreement. In cases where substantial disagreement is observed, such disagreement could be indicative of violation of the constant-pleiotropy assumption or characteristics of the underlying confounding structure. The work of Emdin et al. 15 and Krishna et al. 37 follows this reasoning. Further work will consider the implications of interaction-covariate selection and the role of confounding within the context of MRGxE.
A second limitation is that, owing to the limited availability of summary-data estimates for particular covariate groups, it may be difficult to implement in a summary-data setting. At present, researchers may be limited to common groupings such as gender.
Finally, it is important to consider results from MR gene-environment interaction approaches within the context of existing evidence using alternate estimation approaches, within the triangulation framework 38,39 in which differences in estimates across a range of approaches can be indicative of sources of bias potentially unique to each research design. Identifying disagreement in estimated effects across studies of differing design can therefore prove valuable in identifying avenues for further research, whilst substantial agreement strengthens confidence in the resulting findings and subsequent inference.

Supplementary data
Supplementary data are available at IJE online.