Factorial Mendelian randomization: using genetic variants to assess interactions

Abstract Background Factorial Mendelian randomization is the use of genetic variants to answer questions about interactions. Although the approach has been used in applied investigations, little methodological advice is available on how to design or perform a factorial Mendelian randomization analysis. Previous analyses have employed a 2 × 2 approach, using dichotomized genetic scores to divide the population into four subgroups as in a factorial randomized trial. Methods We describe two distinct contexts for factorial Mendelian randomization: investigating interactions between risk factors, and investigating interactions between pharmacological interventions on risk factors. We propose two-stage least squares methods using all available genetic variants and their interactions as instrumental variables, and using continuous genetic scores as instrumental variables rather than dichotomized scores. We illustrate our methods using data from UK Biobank to investigate the interaction between body mass index and alcohol consumption on systolic blood pressure. Results Simulated and real data show that efficiency is maximized using the full set of interactions between genetic variants as instruments. In the applied example, between 4- and 10-fold improvement in efficiency is demonstrated over the 2 × 2 approach. Analyses using continuous genetic scores are more efficient than those using dichotomized scores. Efficiency is improved by finding genetic variants that divide the population at a natural break in the distribution of the risk factor, or else divide the population into more equal-sized groups. Conclusions Previous factorial Mendelian randomization analyses may have been underpowered. Efficiency can be improved by using all genetic variants and their interactions as instrumental variables, rather than the 2 × 2 approach.


Introduction
Mendelian randomization is the use of genetic variants as proxies for interventions on risk factors to answer questions of cause and effect using observational data. 1,2 Formally, Mendelian randomization can be viewed as instrumental variable (IV) analysis using genetic variants as IVs. 3,4 Factorial Mendelian randomization is the use of genetic variants to answer questions about interactions. It does this by proposing a statistical model for the outcome as a function of the risk factors (or their genetic predictors) and a product term.
A statistical interaction is observed when the coefficient for the product term in the model is non-zero. A statistical interaction in the causal model for the outcome may represent a causal interaction, meaning that the effect of one risk factor on the outcome is dependent upon the value of the other risk factor. 5,6 This may arise due to a functional or biological interaction, in which there is a mechanistic connection between the two risk factors in how they influence the outcome. However, a statistical interaction may also arise due to non-linearity in the effect of a risk factor, or due to effect modification, in which the effect of one risk factor varies in strata of the other. Hereafter, we take the word 'interaction' to mean a statistical interaction in the causal model for the outcome, without implying a functional interaction between the risk factors.
Factorial Mendelian randomization was proposed in the seminal paper on Mendelian randomization by Davey Smith and Ebrahim in 2003. 1 The term is credited by the authors to Sheila Bird (https://en.wikipedia.org/wiki/ Sheila_Bird). However, the idea was not readily taken up in applied practice. The concept was raised again by Davey Smith and Hemani in a 2014 review, 7 who suggested that genetic predictors of obesity and alcohol consumption could be used to investigate the interaction between the two risk factors on risk of liver disease. This question was investigated for markers of liver function using data from the Copenhagen General Population Study in 2018; 8 no evidence for an interaction was found.
In parallel to this, the term factorial Mendelian randomization has been used for analyses employing genetic variants as proxies for pharmacological interventions. Ference et al. performed factorial Mendelian randomization to compare the effect of lowering low density lipoprotein (LDL) cholesterol levels on the risk of coronary heart disease (CHD) with two different LDL-cholesterol lowering agents (ezetimibe and statin), and with a combination of both. 9 Genetic variants associated with LDL-cholesterol were identified in the NPC1L1 gene (proxies for ezetimibe), and the HMGCR gene (proxies for statins), and combined into separate gene scores. To mimic a 2 Â 2 factorial randomized trial, the two gene scores were dichotomized to create a 2 Â 2 contingency table. The gene scores were dichotomized at their median values so that the numbers of participants were balanced across the four groups. Ference has conducted similar analyses for PCSK9 inhibitors and statins, 10 and for CETP inhibitors and statins. 11 A similar 2 Â 2 approach was used in each case, as well as in the analysis of obesity and alcohol mentioned above. 8 In this paper, we consider various aspects relating to the conceptualization, design, analysis and interpretation of a factorial Mendelian randomization investigation. First, we demonstrate the analogy between factorial Mendelian randomization and a factorial randomized trial, we make a connection with multivariable Mendelian randomization, and we describe two contexts in which factorial Mendelian randomization may have utility: for investigating interactions between risk factors, and for investigating interactions between pharmacological interventions on risk factors. We present simulated data demonstrating that the 2 Â 2 approach to analysis, while being conceptually appealing, is inefficient for detecting interactions. The same conclusion is reached in an applied investigation considering interactions between body mass index (BMI) and alcohol consumption on blood pressure using data from UK Biobank. Finally, we discuss the implications of our work for applied factorial Mendelian randomization investigations.

Key Messages
• Factorial Mendelian randomization is an extension of the Mendelian randomization paradigm to answer questions about interactions.
• There are two contexts in which factorial Mendelian randomization can be used: for investigating interactions between risk factors, and interactions between pharmacological interventions on risk factors.
• While most applications of factorial Mendelian randomization have dichotomized the population as in a 2 Â 2 factorial randomized trial, this approach is generally inefficient for detecting statistical interactions.
• In the first context, efficiency is maximized by including all genetic variants and their cross-terms as instrumental variables for the two risk factors and their product term.
• In the second context, efficiency is maximized by using continuous genetic scores rather than dichotomized scores.

Factorial randomized trials and Mendelian randomization
A factorial randomized trial allows for the simultaneous assessment of two or more treatments in a single study. 12 In its simplest form, a 2 Â 2 factorial trial investigates the effect of two binary treatments A and B on a binary outcome Y. Participants are randomly allocated to one of four groups: to receive treatment A only; to receive treatment B only; to receive both treatments A and B; or to receive neither treatment A nor B. The analogy between Mendelian randomization and a randomized trial has been made many times, 13,14 and the analogy between factorial Mendelian randomization and a factorial randomized trial has also been made previously in the context of multivariable Mendelian randomization ( Figure 1, adapted from 15 ).
Multivariable Mendelian randomization was motivated by the problem that some genetic variants are associated with multiple risk factors, such that it is impossible to find genetic variants that are specifically associated with a particular risk factor. 15 For illustration, we assume there are two risk factors (X 1 and X 2 ), and fit a model for the outcome in terms of the risk factors: We assume that we have genetic variants G that satisfy the multivariable IV assumptions for risk factors X 1 and X 2.
15 Specifically: i. Each variant is associated with at least one of the risk factors. ii. Each risk factor is associated with at least one of the genetic variants. iii. Variants are not confounded in their associations with the outcome. iv. Variants are not associated with the outcome conditional on the risk factors and confounders.
If we have at least two genetic variants that are valid multivariable IVs for X 1 and X 2 , then causal effects h 1 and h 2 can be estimated from the two-stage least squares method by first regressing the risk factors on the genetic variants, and then regressing the outcome on the fitted values of the risk factors from the first-stage regressions. 16 If summarized data on the genetic associations with the outcome ( b b Y ) and the risk factors ( b b X1 ; b b X2 ) are available, then the same estimates can be obtained by weighted linear regression of the beta-coefficients with the intercept set to zero: where weights are the reciprocals of variances of the geneoutcome associations seð b b Y Þ À2 . 17 In the language of a factorial randomized trial, this is referred to as an analysis performed 'at the margins'. 18 Estimates represent the average direct effect of each of the risk factors. 19 If there is an interaction between the risk factors, then these are marginal estimates-they are averaged over the distribution of the other risk factor.
We can extend multivariable Mendelian randomization by adding a term to the outcome model to estimate an interaction between the risk factors: where X 12 is the product X 1 Â X 2 , and h 12 is the interaction effect on an additive scale. In a factorial randomized trial, this is referred to as an analysis performed 'inside the table', as in a 2 Â 2 setting, the interaction can be estimated from the 2 Â 2 contingency table. 20 A factorial Mendelian randomization analysis is primarily interested in assessing the presence of, and estimating the interaction effect h 12 .
For simplicity, we initially assume that the associations of the genetic variants with the risk factors are homogeneous in the population and do not vary with time, also that the model relating the risk factors to the outcome is correctly specified, and the effects of the risk factors (and their product) on the outcome are also homogeneous in the population and do not vary with time. We return to the question of how to interpret estimates in this and in more realistic scenarios in the Discussion.

Two contexts: interactions between risk factors and interactions between interventions
Factorial Mendelian randomization study has been considered in two broad scenarios: (a) to estimate interaction effects between risk factors by using genetic variants as predictors of the risk factors; and (b) to identify interactions between interventions by using genetic variants as proxies for specific treatments. In the first case, the aim is to identify an interaction in the effect of two distinct risk factors on the outcome. In the second case, there may not even be two distinct risk factors (as in the example of two LDL-cholesterol lowering interventions discussed by Ference et al. 9 ) and the aim is to identify an interaction in the associations of the genetic variants with the outcome. In this case, an interaction is inferred between the interventions for which the genetic variants are proxies. We consider these two scenarios in turn.

Interactions between risk factors
The multivariable IV assumptions imply that there is no effect of the genetic variants on the outcome except potentially indirectly via one or both of the risk factors. We divide the genetic variants into three groups: G 1 contains variants that are associated with X 1 , G 2 contains variants that are associated with X 2 , and G c contains shared variants that are associated with X 1 and X 2 ( Figure 2). We can now perform two-stage least squares by first regressing the risk factors X 1 , X 2 , and the product X 12 on the genetic variants, and then regressing the outcome on the fitted values of these risk factors. This analysis treats X 12 as if it is a separate risk factor unrelated to X 1 and X 2. 21 For the second-stage regression model to be identified, at least three IVs are required, as three parameters are estimated, and all risk factors (X 1 , X 2 , X 12 ) must be associated with at least one IV. If we assume that the risk factors X 1 and X 2 are linear in the genetic variants: then an interaction between the risk factors means that the statistical model for the outcome includes cross-terms between the genetic variants (such as G 11 Â G 21 ). 22 This motivates the use of cross-terms between the genetic variants as separate IVs. If all the genetic variants and their cross-terms are used as IVs, then under the homogeneity assumptions, the fitted values of the risk factors and their product term can be consistently estimated, and hence the regression model for the outcome on these fitted values (as in the two-stage least squares method) will be correctly specified. Thus the homogeneity assumptions lead to consistent estimates of the parameters in equation (3).

Simulation study 1: interactions between risk factors
To investigate the performance of methods for estimating interactions between risk factors, we conduct a simulation study. We assume there are 10 genetic variants that are associated with X 1 and 10 genetic variants that are associated with X 2 , and vary the number of shared variants that are associated with both X 1 and X 2 from 0 (20 distinct genetic variants, each associated with one risk factor) to 10 Figure 2. Causal directed acyclic graph illustrating relationships between variables in a factorial Mendelian randomization framework for two risk factors (X 1 and X 2 ). There are three sets of genetic variants: G 1 (affecting X 1 only), G 2 (affecting X 2 only) and G c (shared variants, affecting X 1 and X 2 ). X 12 represents the product X 1 Â X 2 . The main effects of the risk factors X 1 and X 2 on the outcome Y are h 1 and h 2 , and the interaction effect of X 1 and X 2 on Y is h 12 . U 1 and U 2 are sets of confounders.
(all 10 genetic variants associated with both risk factors). All genetic variants are simulated as independent (i.e. not in linkage disequilibrium). We compare four methods: Method 1. Full set of interactions: we consider as IVs all the genetic variants and all cross-terms-so when there are 3 shared variants, there are 114 IVs in total: 7 þ 7 þ 3 ¼ 17 linear terms, 3 quadratic terms (shared variants only), 3 shared Â shared variant cross-terms, 42 shared Â non-shared variant cross-terms, and 49 non-shared Â non-shared variant cross-terms. Method 2. Reduced set of interactions: we consider as IVs all the genetic variants and all cross-terms between non-shared variants-so when there are 3 shared variants, there are 17 linear terms and 49 cross-terms. Method 3. Continuous gene scores: we construct weighted gene scores for each risk factor using external weights, and take the two gene scores and their product as IVs. Method 4. Dichotomized gene scores: we dichotomize both gene scores at their median, and take the two dichotomized gene scores and their product as IVs. This is equivalent to a 2 Â 2 analysis.
The data-generating model for the simulation study is provided in the Supplementary Material, available as Supplementary data at IJE online. Data were generated 10 000 times for each set of parameters on 10 000 individuals. Parameters were set such that the set of genetic variants explains around 10% of the variance in each risk factor. The effect of X 1 on the outcome was h 1 ¼ 0:3, the  effect of X 2 on the outcome was h 2 ¼ 0:2, and the interaction effect of X 12 on the outcome took values h 12 ¼ 0:1, 0.3, and 0.5.

Simulation study 2: interactions between interventions
We performed a further simulation study to investigate methods for detecting interactions between interventions. We assume there are 3 independent genetic variants that are proxies for intervention A, and the same for intervention B.
Fewer variants are considered here as typically variants for such an analysis will come from a single gene region for each intervention. 9 We compare two approaches.
i. Continuous gene scores: we construct weighted gene scores for changes in the risk factor corresponding to each intervention using external weights, and take the two gene scores and their product as IVs. ii. Dichotomized gene scores: we dichotomize both gene scores at their median, and take the two dichotomized gene scores and their product as IVs. This is equivalent to a 2 Â 2 analysis.
In each case, we regressed the outcome on the IVs, and estimated an interaction term between the gene scores that act as proxies for the interventions. As before, the datagenerating model for the simulation study is provided in the Supplementary Material, available as Supplementary data at IJE online. Data were generated 10 000 times for each set of parameters on 10 000 individuals. The interaction effect took values 0.1, 0.3, and 0.5. We varied the minor allele frequencies of the genetic variants used as proxies for interventions A and B, drawing from a uniform distribution between 0.1 and 0.2 (uncommon), or between 0.4 and 0.5 (common), and the proportion of variance explained by the genetic variants (3, 5 or 7%).
Applied example: the effects of BMI and alcohol on systolic blood pressure Increased systolic blood pressure (SBP) is associated with a range of health conditions, including cardiovascular disease and diabetes. 23,24 Whereas there have been numerous studies highlighting the adverse effects of increased BMI on SBP, 25,26 and the adverse effects of increased alcohol consumption, 27 there has been little research on the combined effect of BMI and alcohol consumption on SBP. We illustrate factorial Mendelian randomization by performing an analysis using individual participant data from UK Biobank to estimate the interaction effect of BMI and alcohol consumption on SBP. UK Biobank is a prospective, population-based cohort consisting of $500 000 participants aged from 40 to 69 years at baseline living in the UK. For the analysis, we considered 291 781 unrelated participants of European descent who passed data quality control measures and had genetic data available. We used the 77 genome-wide significant variants from a meta-analysis by the Genetic Investigation of ANthropometric Traits (GIANT) consortium in participants of European ancestry to act as IVs for BMI. 28 For alcohol, we identified 10 genetic variants in the ADH1B gene region that have been shown to be associated with alcohol consumption. 29 We performed factorial Mendelian randomization analyses using the full set of interactions, continuous gene scores, and dichotomized gene scores. We also performed analyses separately using the lead variant from the ADH1B gene region (rs1229984) as the sole IV for alcohol consumption, as was done in the analysis by Carter et al. 8

Simulation study 1: interactions between risk factors
Results from the simulation study for estimating interactions between risk factors are displayed in Table 1 (no shared variants) and Table 2 (varying the number of shared variants). All four approaches provided unbiased estimates of the interaction effect in all scenarios, with coverage for the 95% confidence interval close to the nominal 95% level. Power varied considerably between the methods. With no shared variants, method 1 (full set of interactions) and method 2 (reduced set of interactions) are equivalent and gave the most efficient estimates throughout. Method 3 (continuous gene scores) was less efficient, and method 4 (dichotomized gene scores) was the least efficient. With shared variants, method 1 was the most efficient throughout, and its efficiency was not strongly affected by the risk factors having Table 4. Subgroups defined by genetic predictors of BMI and alcohol consumption: numbers (%) of participants and mean (standard deviation) of body mass index, alcohol consumption and systolic blood pressure in 2 Â 2 subgroups when either 10 genetic variants or the rs1229984 variant used as IVs for alcohol consumption genetic predictors in common. Between methods 2 and 3, method 2 was more efficient when most of the variants were non-shared, whereas method 3 was more efficient when most of the variants were shared. Again, method 4 was the least efficient in all scenarios. This suggests that the 2 Â 2 approach may be underpowered in practice, and instead approaches using all genetic variants and their cross-terms should be considered. We also varied the strength of the genetic variants due to potential concerns about weak instruments. 30 We considered scenarios in which the genetic variants explained 1% and 5% of variance in the risk factors. Although substantial weak instrument bias was observed for the main effects, no bias was observed for the interaction term, even when there were 100 IVs in the analysis and F-statistics and conditional F-statistics 31 for the product term were $1 (Supplementary Tables A1 and A2, available as Supplementary data at IJE online). Similar findings were observed in a one-sample setting when varying the direction of confounder effects on the risk factor and outcome (results not shown). We also performed the simulation study centering the values of the risk factors to reduce the impact of collinearity. This changed the mean estimates of the main effects h 1 and h 2 and improved precision for the main effect estimates, but estimates and inferences for the interaction term h 12 were unchanged (Supplementary  Table A3, available as Supplementary data at IJE online). These additional simulations suggest that factorial Mendelian randomization should only be used when the interaction is the main object of interest, and numerical estimates for the main effects from this model should be interpreted with caution.

Simulation study 2: interactions between interventions
Results from the simulation study for estimating interactions between the gene scores that act as proxies for the interventions are displayed in Table 3. Whereas the numerical values of estimates differed between the two approaches, a consistent finding was that power to detect an interaction was greater using continuous gene scores than using dichotomized gene scores. Varying the proportion of variance explained by the genetic variants had no discernable effect on the power to detect an interaction. This can be seen by comparing scenarios 1, 2 and 3, and scenarios 5 and 6. However, varying the minor allele frequency had a strong effect on power, with greater power when the minor allele frequency was close to 0.5. This can be seen by comparing scenarios 2, 4 and 5, and scenarios 3 and 6. This suggests that ensuring comparable size between subgroups is an important factor for efficient detection of interactions, and can be more important than ensuring that the strongest variant is used in the analysis.

Applied example: the effects of BMI and alcohol on systolic blood pressure
The lead variant (rs1229984) explained 0.24% of the variance in alcohol consumption, whereas the 10 variants explained 0.28% of the variance. Although the alcoholdecreasing allele of the rs1229984 variant is dominant, its frequency is only 2.5%. Dichotomizing participants based on this variant led to unequal groups in the population, whereas dichotomizing based on the 10 variant score led to equal groups (Table 4). However, the difference in mean alcohol levels between subgroups was reduced when using the 10 variant score, as most of the difference is due to the rs1229984 variant.
Estimates of the interaction between BMI and alcohol consumption are displayed in Table 5. For the dichotomized gene scores, efficiency is greater when the rs1229984 variant is used, suggesting the importance of dichotomizing the risk factor at a natural break in its distribution (if one exists) rather than ensuring that subgroups are equal in size. However, efficiency is strikingly improved using the full set of interactions, with the standard error decreasing over 10-fold using the 10 variants, and by a factor of 4 using the rs1229984 variant, compared with the 2 Â 2 analysis. All estimates are compatible with the null, suggesting a lack of interaction in the effects of BMI and alcohol on SBP. There was no evidence of weak instrument bias, even though up to 857 IVs were used in the analyses and F-statistics were generally low (Supplementary Table A4, available as Supplementary data at IJE online).

Discussion
In this paper, we have provided a brief review of factorial Mendelian randomization, an approach that uses genetic variants as IVs to detect interactions. We have described two broad scenarios in which factorial Mendelian randomization has been implemented: to explore interactions between risk factors, and to explore interactions between interventions. Although most (perhaps even all) factorial Mendelian randomization analyses have been conducted using a 2 Â 2 approach in which the sample is divided into four subgroups, we have shown that this approach is generally inefficient, particularly for exploring interactions between risk factors. This has been demonstrated in simulation studies, and in an applied example in which a 4-to 10-fold improvement in efficiency was observed by an analysis using the full set of interactions between the genetic variants as IVs.

Choice of variants
Our findings suggest that factorial Mendelian randomization analyses should be conducted using all available genetic variants that are valid instruments, i.e. that satisfy the multivariable IV assumptions. Analyses should not only include the genetic variants as main effects, but also all relevant two-way cross-terms. A similar conclusion was made in a different context by Bollen and Paxton. 22 If investigators want to perform a 2 Â 2 analysis, this should be done to illustrate the method rather than being the main analysis for testing the presence of an interaction. For a 2 Â 2 analysis, the primary consideration for choosing genetic variants should be to divide the population at a natural break in the distribution of the risk factor, in order to maximize the difference between the mean level of the risk factor in the two halves of the population. If there is no natural break in the distribution, then investigators should find a division that splits the population as far as possible into equal groups. This may entail selecting genetic variants that explain less variance in the risk factor, but have minor allele frequency closer to 50%. There can also be substantial benefit in including multiple variants in a single gene region in an analysis, even if these variants only explain a small additional proportion of variance in the risk factor.

Weak instrument bias and efficiency
Conventionally, it is discouraged to use large numbers of genetic variants that are not strongly associated with the risk factor in a Mendelian randomization analysis due to weak instrument bias. 32 Although we did not detect any bias from weak instruments on interaction terms in our simulations, we acknowledge that users of the method may be reluctant to use hundreds of cross-terms as IVs. We would therefore encourage the use of continuous gene score methods as sensitivity analyses. Such analyses estimate fewer parameters, so should be less susceptible to bias. However, this advice is precautionary; no evidence of weak instrument bias in interaction estimates was observed in our simulations.

Summarized data
Whereas multivariable Mendelian randomization can be performed using summarized data that are typically reported from genome-wide association studies by large consortia, this is not possible for factorial Mendelian randomization. If summarized association estimates are available on genetic associations with the product of the two risk factors, as well as associations with the risk factors individually, then the interaction effect can in principle be estimated by weighted linear regression of the betacoefficients as in multivariable Mendelian randomization. However, if association estimates are only available for genetic variants, then the regression model is not identified asymptotically due to collinearity, and finite-sample estimates will be biased. 33 Association estimates for some cross-terms of genetic variants are additionally required. Hence, factorial Mendelian randomization can be performed using summarized data, but only if bespoke summarized data are available on associations of genetic variants and their cross-terms with the risk factors and their product.

Interpretation of the interaction effect
If genetic variants each satisfy the assumptions of an IV, then an interaction between risk factors has a causal interpretation. If the two risk factors are associated with the outcome then an interaction will exist on at least one of the additive or multiplicative scales. 6 However, there is no way of distinguishing a purely statistical interaction from a mechanistic or biological interaction based on observational data. We therefore advise caution in the interpretation of interaction findings, as a statistical interaction can arise due to non-linearity in the effect of a risk factor, or because of the scale on which the outcome is measured (for example, an interaction may occur on the original scale, but not on a log-transformed scale). When considering an interaction between interventions, researchers can investigate whether there is an interaction between the interventions on the risk factor(s) as well as on the outcome. This may help reveal where any biological interaction may take place.
Causal estimates from IV analysis have a clear interpretation in two cases: under the monotonicity assumption, and under the homogeneity assumption. 34 In a randomized controlled trial in which random allocation is taken as the IV and the treatment is the risk factor, monotonicity means that there are no individuals in the population (known as 'defiers') who would take the treatment only if they were randomly allocated to the control group, and not if they were allocated to the treatment group. Under monotonicity, all individuals are either 'always-takers' (they would always take the treatment whether assigned to or not), 'never-takers' (they would never take the treatment whether assigned to or not), or 'compliers' (they would take the treatment if and only if assigned to do so). 35 Under the monotonicity assumption, the IV estimate represents the complier average causal effect-the average causal effect amongst compliers. 36 However, these definitions suppose that the IV and risk factor are binary. In Mendelian randomization, these variables are typically continuous, and so the straightforward interpretation of an IV estimate as a single complier average causal effect is lost-it instead represents a weighted average of complier average causal effects. 37 In contrast, the IV estimate under the homogeneity assumption represents the average causal effect. In its simplest form, the homogeneity assumption states that causal effects are identical in all individuals in the population. Weaker versions of this assumption have been proposed.
If there is a non-zero interaction between the risk factors, then the homogeneity assumption in the multivariable Mendelian randomization model is violated, and the IV estimate only has a clear interpretation under the monotonicity assumption. However, the homogeneity assumption in the factorial Mendelian randomization model may still hold, if there is homogeneity in the effects of the two risk factors and their product on the outcome. Hence under homogeneity, the interaction effect has an interpretation as an average causal effect.
A further potential complication arises if genetic associations with the risk factor or outcome vary over time. As genetic variants are assigned at conception for all individuals and tend to influence risk factor levels throughout the life-course, Mendelian randomization estimates are naturally interpreted as the impact of a life-long change in the trajectory of a risk factor. 38 Hence the natural interpretation of an interaction effect is that of a statistical interaction in the relationship between the outcome and the risk factors that relates to long-term changes in the risk factors. If genetic associations vary over time, then the interpretation of the causal estimate from Mendelian randomization is unclear. This is true for a conventional Mendelian randomization analysis as well as for a factorial Mendelian randomization analysis. One notable case to consider is if the risk factors have mutual effects on each other, as in the case of a feedback mechanism. In this situation, provided that the associations of the genetic variants with the risk factors remain linear (which would occur if all relationships between variables are linear), then this would mean that all genetic variants are associated with both risk factors. A factorial Mendelian randomization analysis would still hold for the causal interaction between the risk factors, as in the examples with shared genetic variants described earlier in the paper. Hence feedback between the risk factors does not necessarily lead to a non-zero interaction estimate. However, if the two variables of interest have a complex longitudinal relationship, and in particular if there are mutual dependencies that might vary over time, then extra caution should be taken in interpreting results from a Mendelian randomization investigation, especially numerical estimates of causal effects. This advice is also relevant if the effects of the risk factors on the outcome may vary over time (for example if there is a critical period when exposure to the risk factor influences the outcome). If the associations between variables became non-linear, then it may be worth considering using the control function approach, an extension to the two-stage least squares method that makes stronger assumptions, but can result in more efficient estimation. 39 Comparison with previous work Previous work investigating interactions using IVs has been limited. A formal framework for defining interaction effects in the context of clinical trials was proposed by Blackwell, 40 who used the language of principal stratification (compliance classes and monotonicity) to define local average interaction effects in a similar way to how local average causal effects (also called complier-averaged causal effects) are defined for single risk factors. 41 However, the principal stratification framework presupposes that risk factors are binary (or categorical) to assign compliance classes, whereas risk factors in Mendelian randomization are typically continuous. Additionally, the principal stratification framework presupposes a single binary IV, whereas Mendelian randomization investigations often use multiple genetic variants. There is therefore little practical advice in the literature on how to perform a factorial Mendelian randomization analysis.

Limitations
There are several limitations to this work. We rely on the assumption that all genetic variants included in our analyses are valid IVs. The IV assumptions may be violated by including genetic variants that are associated with the outcome independently of the risk factors. This violation would result in biased estimates, and could potentially lead to incorrect inferences on the presence of an interaction effect. Our recommendations rely on simulated data. Different choices for the parameters included in the simulation studies may have resulted in different conclusions. However, our findings were robust to different choices of parameters considered in this paper, they correspond to what we know about the theoretical properties of estimators, and similar conclusions were observed from the applied analysis. We have only considered interactions on an additive scale, although interactions could be considered on a multiplicative scale by log-transforming the outcome. Finally, we have not considered the impact of model misspecification on estimates. It would not be possible to perform simulation studies corresponding to all possible ways that model misspecification could occur, meaning that our recommendations cannot be proven to be optimal in all settings. We believe that we have chosen parameters and scenarios that are relevant to modern Mendelian randomization analyses.

Conclusion
Overall, factorial Mendelian randomization is a promising technique for assessing interactions using genetic variants as IVs. Our findings suggest that current applications of factorial Mendelian randomization based on a 2 Â 2 analysis could be improved by better selection of genetic variants, and by better choice of analysis method.

Supplementary data
Supplementary data are available at IJE online.

Introduction
Have you ever wondered how Mendelian randomization (MR) studies can estimate a lifetime effect when the exposure is only measured once? 1 This is incredible, considering that other familiar methods 2 would require that the exposure (and time-varying covariates) be measured repeatedly and frequently throughout the life course to estimate the same effect. MR avoids this by making important assumptions about time to estimate effects. For example, the assumption that the relationship between the genetic variant(s) and the exposure is constant through time 3 allows the estimation of a lifetime effect (as defined in Table 1) even when exposure is only measured once. Regardless of the methods used to infer causality, it is not