The methodology and application of Mendelian randomization to study causal mechanisms in health and disease has developed dramatically over the past decade. New methods, large-scale genome-wide analyses, molecular epigenetics and other new -omics technologies are all providing exceptional opportunities for the exploitation of Mendelian randomization approaches to understand causes of complex traits and disease outcomes. This research has the potential to identify new approaches for the prevention and treatment of common conditions.
The origins of what is now termed ‘Mendelian randomization’ (Figure 1, see caption for assumptions) can be traced back over half a century,1 although the first extended presentation of the principles was in this journal just over a decade ago,2 Since then it has become a widely utilized methodology, with publications covering many branches of biomarker,3–12 behavioural13–16 and infectious disease17,18 epidemiology. Mendelian randomization studies with clear implications for pharmacotherapeutics are also becoming commonplace,19–21 and applications to social science and to economics (the field in which the statistical technique of instrumental variables analysis central to Mendelian randomization was initially conceived22) are being developed.23,24
Over the past few years, several methodological advances have been made. The basic assumption—that genetic variants which can proxy for a potentially modifiable exposure are essentially unrelated to confounding factors—has been demonstrated to have widespread plausibility.25 The connection between the standard Mendelian randomization experiment and the theory of instrumental variables has been elaborated upon.26,27 Extensions to use multiple genetic variants for increasing power and investigating the influence of pleiotropy have been theorized28 and implemented.29–32 Bidirectional Mendelian randomization for informing the direction of causal effects has been exemplified33,34 and extended to consider more complex networks.35 Methods for the estimation of non-linear causal effects have been proposed.36,37 Causal effects of related phenotypes with common genetic predictors in a multivariable analysis framework have been estimated.38,39 Factorial Mendelian randomization to predict the separate and combined effect of treatments using different genetic proxies has been undertaken.40 Sensitivity analyses for investigating the biasing effects of pleiotropy have been developed.41,42 Extensions to consider gene-by-environment interactions have been outlined and applied.43–45 The integration of epigenetic profiles as an intermediate phenotype has been proposed46,47 and implemented.48,49 The development of Mendelian randomization into the hypothesis-free resolution of causal directions in correlated networks has been outlined.50 In summary, methodological development has been undertaken in response to the challenges of new substantive applied questions and increasingly detailed genetic data. This development has enabled (and continues to enable) more sophisticated questions to be answered using the framework of Mendelian randomization.
Mendelian randomization in the post genome-wide association study era
Initial applications of Mendelian randomization generally incorporated a single genetic variant, and assessed the causal relationship of the modifiable intermediate phenotype on the outcome in a single sample. The proliferation of genome-wide association study (GWAS) data, and in particular publicly available GWAS data51 (such as summary genetic associations with coronary artery disease in over 60 000 cases and 130 000 controls from the CARDIoGRAMplusC4D consortium52) provides opportunities to extend this via the use of the following.
Increased sample sizes. Consortia with GWAS data on large sample sizes are available for many phenotypic traits and disease outcomes. This increases the power of Mendelian randomization investigations.53
Multiple genetic variants. For many intermediate phenotypes investigated in Mendelian randomization studies, GWAS investigations have been able to identify multiple genetic variants contributing to variation in the phenotype. Again, this increases the power of Mendelian randomization investigations.54
Two-sample Mendelian randomization. The ideal context for the precise estimation of genetic associations with modifiable intermediate phenotypes is population-based cohort studies. In contrast, the ideal context for the precise estimation of genetic associations with disease outcomes is case-control studies. Two-sample Mendelian randomization is a design strategy whereby genetic associations with the phenotype and with the outcome are taken from separate samples.55 Provided that the samples come from the same underlying population (for example, the same ethnicity), valid causal estimates can be obtained even if concomitant data on the genetic variants, intermediate phenotype and outcome are not available for any individuals. Moreover, such estimates can be obtained from summarized data rather than individual-level data.56,57 This allows the efficient evaluation of causal effects in large sample sizes without requiring sharing of individual-level data.
As it is not required for the phenotype and outcome in two-sample Mendelian randomization to be estimated on the same individuals, genetic associations with the phenotype and outcome can be taken from large consortia, thus potentially greatly increasing power compared with a one-sample Mendelian randomization analysis.51
Over the past decade, the heritability of many complex traits has been explored using GWAS. In general, common genetic variants have small effects on complex traits. In the recently completed UK10K study [www.uk10k.org], novel genetic variants with relatively large phenotypic effects were observed.58 However, large effect sizes seemed to be confined to the rarest detectable signals and, for the most part, effects attributable to common genetic variants were small. This is rather disappointing from the viewpoint of developing predictive tools for even highly heritable traits. Studies like UK10K assessing the genetic architecture of complex traits more thoroughly through sequencing suggest that, for complex traits, this picture is unlikely to change. But even variants with modest effect sizes provide opportunities for the investigation of potential novel causal pathways using Mendelian randomization, particularly given the development of novel statistical tools for detecting and adjusting for pleiotropy from multiple genetic variants.41
The promise of -omics
Mendelian randomization studies have generally focused on a limited number of intermediate phenotypes, but recent applications of -omic technologies into large-scale population-based studies present new opportunities for identifying novel predictive biomarkers and causal links between established phenotypes and disease outcomes.47,59–63 Both metabolomic and DNA methylation data are increasingly being exploited.49,64
Metabolomic data, representing multiple metabolic pathways in systemic metabolism, can be quantified by targeted mass spectroscopy or by proton nuclear magnetic resonance spectroscopy. With this, it has been possible to examine the causal role of risk factors such as body mass index (BMI) in the formation of metabolomic profiles and thus to consider the finer aetiology of possible disease effects.65 Furthermore, many metabolites have substantial heritability and robust genetic variant associations have already been identified.66,67 Metabolite profiles have proved useful in the prediction of cardiometabolic disease,68,69 although their role as modifiable targets for intervention or causal mediators of disease risk is unclear. The availability of genetic instruments for many metabolites provides opportunities to assess the causal effects of metabolites on disease risk. Both bi-directional (see above) and hypothesis-generating (see below) applications of Mendelian randomization are likely to be useful in exploiting these data.
Methylation of DNA is a partially stable mechanism for gene regulation, occurring from the earliest stages of development onwards, under genetic, environmental and stochastic influences.70 In a similar way to metabolomic data, the availability of large collections of genome-wide epigenetic data marks presents a valuable opportunity to consider the role of gene regulation in the aetiology of complex disease. In this case, methylation-related genetic variants (mQTLs) are used as proxy markers of DNA regulatory variation, which maybe causally implicated in diseases. A theoretical framework for this work has been developed46,47,71 and applied.48,49 (Figure 2).
As well as being potential targets for intervention, both metabolomic72,73 and methylation data may serve as indicators of exposure to difficult-to-measure intermediate phenotypes. In the case of DNA methylation data in particular, these could provide proxy measures of long-term74 or critical period exposure75,76 that could otherwise not be assessed on large population samples.
Taxonomy of Mendelian randomization investigations
Limitations in our understanding of genetic variants used in Mendelian randomization has led to suggestions that evidence from Mendelian randomization studies in informal evidence synthesis should be down-weighted.77–79 However, not all applications of Mendelian randomization are the same in terms of their aims, procedures and quality of evidence generated. We provide a taxonomy of Mendelian randomization investigations into three broad categories, based largely on the nature of the intermediate phenotype evaluated and the biological plausibility of the genetic variants for use in assessing causal effects. These categories are presented separately but form a spectrum of evidence quality, as some investigations will not fall neatly into a single category.
Validation of potential drug targets
Some phenotypes have a genetic aetiology dominated by a relatively small number of key coding or functionally relevant loci (such as C-reactive protein,3 interleukin-6,19,20 lipoprotein-associated phospholipase A2,80 or secretory phospholipase A2,81 bilirubin,82 uric acid83). Mendelian randomization investigations conducted using a small number of genetic variants in a single gene region having clear biological links to the intermediate phenotype provide the closest parallels to a randomized trial.84 These are the most plausible Mendelian randomization investigations, in terms of the validity of the instrumental variable assumptions that the variants are specific proxies for the phenotype, as well as providing evidence to aid the prioritization and development of pharmacological interventions which have a reasonable likelihood of producing health benefits.85 This type of Mendelian randomization experiment mirrors the potential effects of a drug acting on the same pathway. Such applications have advantages for pharmaceutical companies in prioritizing drugs for clinical trials, and for investigating unintended consequences of drugs (both for drug repositioning and for investigating safety signals).
There are several examples of Mendelian randomization investigations relevant to pharmacological investigations. Drugs to inhibit C-reactive protein were not developed further after Mendelian randomization experiments demonstrated no causal role of C-reactive protein in cardiovascular disease.3,86 In contrast, the interleukin-6 receptor can be blocked by a monoclonal antibody (tocilizumab) which was developed for the treatment of rheumatoid arthritis. A variant in the IL6R gene region shows an association with coronary heart disease risk,87,88 so consequently tocilizumab would be worthwhile taking forward into trials for cardiovascular risk prevention.89 As another example, statins are associated with an increased risk of type 2 diabetes. A Mendelian randomization study using genetic variants coding for HMGCoA reductase (the protein target that is inhibited by statins) demonstrated that these variants were associated with an increase in type 2 diabetes.90,91 The inference from these findings is that attempts to make statins more specific and thereby reduce off -target effects will not avoid the increased risk of the diabetes. Genetic variants in the CETP gene region have been used as proxies for cholesterylester transfer protein (CETP) inhibitors, such as dalcetrapib.92 These drugs are developed to raise high-density lipoprotein cholesterol levels. Variants in the CETP region have shown null associations with coronary artery disease risk,21 although null associations with blood pressure suggest that the blood pressure-increasing effect of torcetrapib93 is an off-target effect rather than a downstream consequence of CETP inhibition.94
A recent investigation to assess the impact ofinterleukin-1 inhibition (e.g. by use of the drug anakinra, which is beneficial in rheumatoid arthritis) on cardiometabolic disorders found that genetic variants which proxy the effects of sustained dual interleukin-1α/β inhibition were associated with an increased risk of cardiovascular diseases.95 Two notable aspects of this investigation are the use of positive control variables (variables that should be affected by the phenotype according to biological knowledge) and the consideration of multiple outcomes. Clinical trials of anakinra show decreases in C-reactive protein and interleukin-6 levels that are also predicted by the associations of the genetic variants. The concordant associations with these positive controls increase the plausibility that the genetic variants are good proxies for the pharmacological intervention. The investigation of large numbers of outcomes, made practical by publicly available GWAS data, enables both the search for potential causal mediators of disease risk (in this case, proatherogenic lipids) and drug repositioning. Here, rather than finding another disease outcome that may be beneficially treated by anakinra, an important safety signal was discovered.
Investigation of complex intermediate phenotypes
Many intermediate phenotypes are not regulated by single metabolic pathways but are influenced by multiple genetic variants. Examples of multifactorial and polygenic risk factors include body mass index,96 height97 and blood pressure.56 In these situations, Mendelian randomization investigations often proceed in a different manner, and on the basis of a large number of genetic variants in different gene regions. These variants may be discovered in GWAS investigations and the biological pathways linking each variant to the intermediate phenotype may be unknown. Clearly, the formal instrumental variable assumptions that the only causal pathway from the genetic variants to the outcome passes via the phenotype of interest are rarely satisfied.98 Plausibility of a causal effect can be increased by empirical evidence that the genetic variants are not associated with measured confounders, as well as by demonstrating consistency and directional concordance of the causal estimate across genetic variants in multiple gene regions with different biological effects on the same phenotype. If many different independent genetic variants all suggest the same direction of causal effect, and if the overall statistical result is not dependent on one or two variants, then a causal conclusion is most plausible.50 However, the associations of genetic variants with unmeasured or unknown confounders cannot be assessed, and so the instrumental variable assumptions are not fully testable. Additionally, even if a genetic variant is associated with a measured covariate, it is not possible to tell empirically whether this association is a (horizontally) pleiotropic effect of the genetic variant (hence a violation of the assumptions), or an effect of the intermediate phenotype (a mediated, or vertically pleiotropic effect). In the latter case, provided that the only causal pathway from the genetic variant to the outcome is via the intermediate phenotype, the instrumental variable assumptions are not violated.
In these cases, the aim of a Mendelian randomization investigation is not only to give a more definitive answer as to whether the intermediate phenotype is causal or not, but also to investigate mechanisms linking the phenotype to the outcome. Particularly for phenotypes such as adult height, which is not readily modifiable, the findings of the analysis usually go beyond a simple instrumental variable analysis and investigate potential causal pathways.
A final category of analyses (which some may feel are not true Mendelian randomization analyses) are termed ‘hypothesis-generating investigations’. As with GWAS studies, these are undertaken particularly for intermediate phenotypes that do not have strong known genetic determinants, such as educational attainment.99,100 Automated analyses of associations between a range of risk factors and outcomes have been undertaken using whole-genome scores101 and summarized data from across the whole genome,102 to investigate whether common genetic predictors correlate with phenotypic and outcome traits. Such investigations have given mixed results, and should be regarded as hypothesis-generating rather than assessments of causation. Nonetheless, they represent a natural extension to the methods of Mendelian randomization. Findings will be more speculative, but the statistical power to detect a causal effect may be greater. To this end, the application of automated two-sample Mendelian randomization in a hypothesis-generating approach is likely to expand rapidly the capacity of conventional epidemiology to generate plausible hypotheses. In this case, derived genetic instruments may be exported to existing large GWAS collections of any disease or outcome and employed to give estimates of the causal implications of exposure to novel modifiable risk factors. This would yield a potential return on the large collections of genetic variant data in the GWAS community which are, as yet, underutilised.
In this themed issue of the journal we have published both methodological developments and substantive findings from many research groups. Methodology for improving quality of reporting,103 bias detection due to invalid instruments41 and mediation in causal pathways35 are covered. The effects of a wide range of intermediate phenotypes on disease outcomes using genetic instruments are also examined. These range from sex-hormone binding globulin64, tobacco (smoking does lower body weight),104 coffee,105 milk106 and alcohol107 intakes, obesity,108,109 vitamin D110 and testosterone.111,112 These analyses using genetic instruments provide a means of interrogating potential causal associations, particularly in circumstances where associations are likely to be heavily confounded and randomized experiments are not feasible.
Caution and conclusion
Potential limitations of the Mendelian randomization strategy were discussed extensively in its initial formal presentation2 and have been reiterated elsewhere.113–116 Largely as a function of the potentially overwhelming collection of genetic variants available to the epidemiologist looking to practise Mendelian randomization, the potential to fall into one of a series of analytical traps has been increased. Power, linkage disequilibrium, pleiotropy, canalization and population stratification have all been recognized as potential flaws in the Mendelian randomization approach as methods have been developed. While avoidance strategies for these limitations are now really beginning to appear, further limitations are being realized. In circumstances where we are less likely to have well-characterized and biologically understood genetic variants as instruments, it is tempting to use the totality of available variants in an analysis, for example in a genetic risk score approach.117 Although it is attractive at the outset to amalgamate genetic variants into comprehensive genetic scores which have the potential to increase variance in the phenotype explained (and thus increase power),118 it is increasingly clear that where these scores are not understood completely, the potential for inferential complication is greater now than ever.
Using the example of educational attainment, large-scale GWAS meta-analysis has successfully identified genetic variants reliably correlated with education.99 However, these signals represent a small fraction of the total variability in educational attainment.100 Genome-wide predictors will enhance the power of a Mendelian randomization analysis, with genetic scores including all variants (even those not associated at a conventional level of significance) explaining around 3% of the variance (see Figure 3). However, as a result of the combined impact of linkage disequilibrium, genetic contributions from many different biological pathways and the possible biasing effects of pleiotropy, the use of such a genome-wide estimator may sadly produce effect estimates which suffer the similar limitations as a more conventional, observational estimates.
The next decade will see a deeper understanding of the properties of genetic variants which will be crucial to the appropriate implementation and interpretation of Mendelian randomization analyses. Over the past decade, Mendelian randomization has provided a novel and flexible paradigm to understand the causal nature of associations between modifiable risk factors and common diseases. Mendelian randomization has made use of the massive investment in human genetic research, focusing on causal mechanisms that have the promise of identifying worthwhile targets for pharmacological research and for preventive public health interventions that are already making a difference and will continue to do so in the coming decade.