## Abstract

**Background** Mendelian randomization (MR) studies assess the causality of associations between exposures and disease outcomes using data on genetic determinants of the exposure. In this work, we explore the effect of exposure and outcome measurement error in MR studies.

**Methods** For continuous traits, we describe measurement error in terms of a theoretical regression of the measured variable on the true variable. We quantify error in terms of the slope (calibration) and the *R*^{2} values (discrimination or classical measurement error). We simulated cohort data sets under realistic parameters and used two-stage least squares regression to assess the effect of measurement error for continuous exposures and outcomes on bias, precision and power. For simulations of binary outcomes, we varied sensitivity and specificity.

**Results** Discrimination error in continuous exposures and outcomes did not bias the MR estimate, and only outcome discrimination error substantially reduced power. Calibration error biased the MR estimate when the exposure and the outcome measures were not calibrated in a similar fashion, but power was not affected. For binary outcomes, exposure calibration error introduced substantial bias (with negligible impact on power), but exposure discrimination error did not. Reduced outcome specificity and, to a lesser degree, reduced sensitivity biased MR estimates towards the null.

**Conclusions** Understanding the potential effects of measurement error is an important consideration when interpreting estimates from MR analyses. Based on these results, future MR studies should consider methods for accounting for such error and minimizing its impact on inferences derived from MR analyses.

## Introduction

Mendelian randomization (MR) is an approach for determining whether there is a causal relationship between an exposure and a correlated disease outcome using data on a genetic determinant(s) of the exposure.^{1}^{,}^{2} Exposures with genetic determinants are typically biomarkers, such as circulating molecules or physical traits, and are often subject to confounding in their associations with health outcomes. Causal effects can be estimated in the MR setting when a genetic determinant is used as an instrumental variable (IV) and is analysed jointly with exposure and outcome data to derive an estimate of the exposure’s effect on the outcome. MR estimates are not as precise as simple association-based measures of effect, but, in theory, they represent the causal component of an observed association, not the components due to confounding or reverse causation.

IVs are required to be (i) associated with the exposure, (ii) independent of the outcome conditional on the exposure and confounders (measured or unmeasured) and (iii) independent of all unmeasured confounders of the exposure–outcome association.^{1}^{,}^{2} Genetic factors are attractive as IVs because they are randomly assigned before conception and should not be affected by any potential confounders other than ancestry, which can be measured and adjusted for using additional genetic data.^{3}^{,}^{4} Owing to the correlations among neighbouring genetic variants (i.e. linkage disequilibrium), multiple variants may be acceptable proxies for some unmeasured causal variant, although the mechanism by which the causal variant influences the exposure is often poorly understood. Most genetic factors can be measured with high accuracy using modern genotyping technologies and careful quality control (QC) procedures.^{5}^{,}^{6}

In contrast, non-genetic exposures and outcomes are often measured with substantial error. In ordinary least squares (OLS) regression models, the presence of non-differential classical measurement error in the exposure (i.e. errors that are randomly distributed around a true value and unrelated to the outcome) will bias exposure–outcome associations towards the null (i.e. regression dilution or attenuation).^{7}^{,}^{8} However, non-differential classical errors in continuous outcome measures do not systematically bias association estimates but increase their standard error. Systematic (non-random) measurement error in an exposure or an outcome, error whose value depends on some other feature(s) of the data, can lead to bias.^{9} Differential measurement errors, where errors in exposure depend on outcome status (or vice versa), can lead to substantial and often unpredictable biases.^{10}

To date, no studies have examined the effects of measurement error on MR estimates. Such studies are needed because MR is becoming a common approach in the epidemiological literature,^{11} in large part because recent genome-wide association studies have identified single nucleotide polymorphisms (SNPs) associated with a wide array of biomarkers with relevance for disease traits, such as body mass index,^{12} lipid-related traits^{13} and C-reactive protein^{14}. In this work, we explore the effects of various types of non-differential measurement error on bias, precision and power in MR studies of continuous exposures in the cohort setting. We do so analytically and using simulated data sets generated using plausible parameters for epidemiological data.

## Methods

### Theoretical framework for measurement error

Measurement error for a continuous exposure (or outcome) can be described in terms of a theoretical regression of the error-prone measure (X*) on the true exposure (X)^{15–17} (Figure 1). The regression *R*^{2} represents discrimination (classical measurement error) or the degree to which individuals with higher measured values tend to have higher true values. The regression slope represents calibration or the sensitivity of the measured value to variation in the true value. The intercept represents bias, the degree to which, on average, the true value is over- or underestimated. By varying these regression characteristics in simulated data, we can systematically assess the effects of measurement error on the MR estimate. Examples of these types of non-differential error are shown in Figure 2.

### Effect of measurement error on bias in MR studies of continuous outcomes

Let G be a genetic risk measure (a genotype or a risk score), let X be a continuous exposure affected by G and let Y be a continuous outcome affected by X. If the three requirements for MR are met, the standard MR Wald estimator is

where is the coefficient for the regression of Y on G, and is the coefficient for the regression of X on G. Assuming and are generated with proper control for population stratification,^{3}

^{,}

^{4}these coefficients can be interpreted as effect estimates for G (on X and Y). The MR Wald estimator is equivalent to that obtained from a two-stage least squares (2SLS) regression,

^{1}and these are the most common analysis techniques for MR studies of continuous exposures and outcomes.

Treating the coefficients in Equation 1 as true effects rather than estimates and assuming the effects of G on X and X on Y are linear, we can decompose into (a standard decomposition for path analysis and, more generally, linear structural equation models^{18}):

Acknowledging that G, X and Y are measured with error, we can include measurement error in the MR causal diagram (Figure 3). In this framework, G, X and Y exert some effect on their measured values (G*, X* and Y*). Note that in the presence of measurement error, there is no path from G to Y that passes through X* and avoids X, which maintains MR assumption (ii) for the true exposure. Assuming linear effects for the causal model in Figure 3, the MR estimator using the measured rather than the true variables as in Figure 3 is given by the following equation:

where the effects of X on X* and Y on Y* are represented by β_{xx*}and β

_{yy*}. This equation can be simplified to obtain an equation that relates the calibration of X* (β

_{xx*}) and Y* (β

_{yy*}) to the MR estimate:

Thus, if X* and Y* are not perfectly calibrated (i.e. β_{xx}_{*} ≠ 1 or β_{yy}_{*} ≠ 1), then the MR estimate will be biased, unless X* and Y* are mis-calibrated in an identical fashion (β_{xx}_{*} = β_{yy*}).

### Effect of measurement error on precision in MR studies of continuous outcomes

The standard error of the MR estimate^{19} is

_{g}and σ

_{ε}represent the standard deviations of G and ε, respectively, and σ

_{g,x}represents the covariance of G and X. This is equivalent to

^{19}:

In the context of measurement error, σ_{ε}^{2} is equivalent to the Var(Y*) minus the variance in Y* that is explained by X* [i.e. Var(β_{x*y*}X*) under the simplifying assumption cov(X, ε) = 0, which may not hold in the MR context]. Expanding this expression based on Figure 3, we obtain the following equation:

From this equation, it is clear that increases in Var(Y*) (e.g. discrimination error) will increase the standard error of the MR estimate, if all other parameters remain constant. Describing the effects of other types of measurement error is not straightforward because the measurement error may affect multiple parameters in Equation 7.

### Measurement error when outcomes are binary

Several authors have described the bias present in MR studies of binary outcomes,^{2}^{,}^{20}^{,}^{21} including Palmer *et al.*,^{20} who derived equations for the MR estimate. Analytically, integrating measurement error into these equations is not straightforward. However, this bias may be small and inconsequential in certain studies settings, such as epidemiological studies of rare disease outcomes.^{22} In this work, we assess the effects of measurement error in MR studies of binary outcomes using simulated data and show that many of the lessons learned from continuous outcomes also apply to MR studies of binary outcomes.

### Simulation 1: the effect of discrimination error on bias and power in MR studies

We evaluated the effect of discrimination error on bias, precision and power in MR studies of a continuous exposure and outcome using simulated cohort data sets. For each simulated scenario, we generated 10 000 data sets consisting of 5000 observations and five variables: a genetic susceptibility score (G), a true exposure value (X) influenced by G, an error-prone measured value of X (X*), a true value of an outcome (Y) influenced by X and an error-prone measured Y value (Y*). We introduced an unmeasured confounding variable U that effects both X and Y. G and U were generated as random numbers drawn from a standard normal distribution. X was also a randomly generated standard normal variable, but with linear effects exerted by G and U:

β_{gx}was chosen to produce an

*R*

^{2}of 0.05 for the regression of X on G using the following equation:

Y was modelled as a random number from a standard normal distribution plus a linear effect of X and U:

β_{xy} was set to 0.0, 0.10, 0.25 or 0.50. The value of β_{ux} and β_{uy} was set to 0.5.

Error-prone measures of X and Y (discrimination error only) were generated by adding normally distributed error components to X and Y. For example, X* was generated as follows:

δ_{x*} was chosen to produce a specific *R*^{2} value (1.00, 0.75, 0.50 or 0.25) for the regression of X* on X, using the following equation:

Y* was generated in a fashion similar to X*, where lower *R*^{2} values represent increasing discrimination error. Each simulation differed only in the amount of discrimination error in X* (*R*^{2}^{,}_{xx*}) and Y* (*R*^{2}_{yy*}) and the true effect of X on Y (β_{xy}).

For each simulation, 2SLS regression was performed on each of the 10 000 simulated data sets using Stata’s ivregress command. This procedure can be viewed as two regressions, although Stata uses a one-step procedure as described in Baum.^{23} Stage 1 of 2SLS is a regression of X* on the IV (G). Stage 2 is a regression of Y* on the fitted X values from stage 1. MR estimates and standard errors were obtained. Power was defined as the proportion of the 10 000 data sets in which a statistically significant positive effect of X on Y was detected (two-sided *P* < 0.05).

These simulations were repeated in the absence of confounding (with different U variables affecting X and Y) yielding similar results and conclusions. It has previously been shown that X–Y confounding does not affect bias or power for 2SLS when IVs are strong. IV strength is measured by *F* statistic from the first-stage regression of X on G.^{24} IVs with *F* > 10 are typically considered to be strong. The mean first-stage *F* values for all scenarios considered in this work were > 50 and thus free of appreciable weak-IV biases.

### Simulation 2: the effect of calibration error on bias and power in MR studies

Similar simulations consisting of 10 000 data sets were carried out to evaluate the effect of calibration error on bias, precision and power. G, X, U and Y were generated in an identical fashion to simulation 1. However, X* and Y* were generated with calibration error rather than discrimination error. For example, X* was generated as

Calibration error was introduced by setting β_{xx*} equal to 1.50, 1.25, 1.00, 0.75 or 0.50, with 1.00 representing perfect calibration. Y* was generated in a similar fashion, varying β_{yy*}. 2SLS was used to analyse all simulated data sets.

### Simulation 3: exposure discrimination error in MR studies of binary outcomes

To examine the effect of exposure measurement error in MR studies of binary outcomes, data on G, X and U were generated as in simulations 1–2, but Y was generated as a binary outcome using a logistic model for 5000 individuals in 10 000 data sets.

The logistic model for Y also includes effects for both X and U:

β_{xy} was chosen to produce specific odds ratios for the true effect of X on Y (odds ratio = 1.0, 1.5 and 2.0), and β_{0} was chosen to produce an average population risk of 0.10 (this was varied in supplementary analyses). β_{ux} and β_{uy} were set to 0.5. For X*, we introduced discrimination error (*R*^{2}_{xx}_{*} = 1.0, 0.75, 0.50 or 0.25) as described in previous simulations. Each simulated data set of 5000 observations was analysed using a two-stage regression: linear regression of X on G, followed by a logistic regression of Y on the predicted X value from the stage-1 regression. In the second stage, standard errors were obtained using the ‘robust’ option in Stata.

### Simulation 4: exposure calibration error in MR studies of binary outcomes

Data on G, X, U and Y were generated, and analyses were conducted in an identical fashion to simulation 1. However, for X*, we introduced calibration error (β_{xx}_{*} = 1.5, 1.25, 1.0, 0.75 or 0.5) as described in previous simulations.

### Additional simulations

We conducted additional simulations investigating measurement error in binary outcomes in the MR setting, by varying the sensitivity and specificity of the outcomes measure. We also investigated the effect of varying the population risk for the outcome in the context of reduced sensitivity and specificity. Details on these simulations can be found in the supplementary material.

## Results

### Simulation 1: discrimination error affects power but does not introduce bias (Table 1)

For all scenarios evaluated, the mean MR effect estimates were equal to the true effect. However, discrimination error in Y* resulted in substantial increases in the mean standard error of the MR estimates and corresponding decreases in power for all scenarios in which β_{xy} did not equal zero. These increases became more pronounced as *R*^{2}_{yy*} decreased and the true effect size (β_{xy}) increased. Discrimination error for X* had very minor effects on the standard error and power as compared with Y*. When discrimination errors in X* and Y* were examined jointly, their effects on bias, precision and power were similar to their effects when examined independently (Supplementary Table S1, available as Supplementary data at *IJE* online).

True effect of X on Y | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Discrimination error | β_{xy} = 0.0 | β_{xy} = 0.1 | β_{xy} = 0.2 | β_{xy} = 0.3 | |||||||||

Exposure () | Outcome () | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power |

Discrimination error for X only | |||||||||||||

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.37 | 0.20 | 0.062 | 0.88 | 0.30 | 0.062 | 0.99 |

0.75 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.37 | 0.20 | 0.063 | 0.88 | 0.30 | 0.063 | 0.99 |

0.50 | 1.00 | 0.00 | 0.062 | 0.02 | 0.10 | 0.063 | 0.37 | 0.20 | 0.064 | 0.88 | 0.30 | 0.065 | 0.99 |

0.25 | 1.00 | 0.00 | 0.063 | 0.02 | 0.10 | 0.064 | 0.36 | 0.20 | 0.067 | 0.87 | 0.30 | 0.072 | 0.99 |

Discrimination error for Y only | |||||||||||||

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.37 | 0.20 | 0.062 | 0.88 | 0.30 | 0.062 | 0.99 |

1.00 | 0.75 | 0.00 | 0.072 | 0.03 | 0.10 | 0.072 | 0.29 | 0.20 | 0.073 | 0.77 | 0.30 | 0.073 | 0.97 |

1.00 | 0.50 | 0.00 | 0.088 | 0.03 | 0.10 | 0.089 | 0.21 | 0.20 | 0.090 | 0.60 | 0.30 | 0.092 | 0.89 |

1.00 | 0.25 | 0.00 | 0.124 | 0.02 | 0.10 | 0.126 | 0.12 | 0.20 | 0.130 | 0.35 | 0.30 | 0.134 | 0.60 |

True effect of X on Y | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Discrimination error | β_{xy} = 0.0 | β_{xy} = 0.1 | β_{xy} = 0.2 | β_{xy} = 0.3 | |||||||||

Exposure () | Outcome () | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power |

Discrimination error for X only | |||||||||||||

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.37 | 0.20 | 0.062 | 0.88 | 0.30 | 0.062 | 0.99 |

0.75 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.37 | 0.20 | 0.063 | 0.88 | 0.30 | 0.063 | 0.99 |

0.50 | 1.00 | 0.00 | 0.062 | 0.02 | 0.10 | 0.063 | 0.37 | 0.20 | 0.064 | 0.88 | 0.30 | 0.065 | 0.99 |

0.25 | 1.00 | 0.00 | 0.063 | 0.02 | 0.10 | 0.064 | 0.36 | 0.20 | 0.067 | 0.87 | 0.30 | 0.072 | 0.99 |

Discrimination error for Y only | |||||||||||||

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.37 | 0.20 | 0.062 | 0.88 | 0.30 | 0.062 | 0.99 |

1.00 | 0.75 | 0.00 | 0.072 | 0.03 | 0.10 | 0.072 | 0.29 | 0.20 | 0.073 | 0.77 | 0.30 | 0.073 | 0.97 |

1.00 | 0.50 | 0.00 | 0.088 | 0.03 | 0.10 | 0.089 | 0.21 | 0.20 | 0.090 | 0.60 | 0.30 | 0.092 | 0.89 |

1.00 | 0.25 | 0.00 | 0.124 | 0.02 | 0.10 | 0.126 | 0.12 | 0.20 | 0.130 | 0.35 | 0.30 | 0.134 | 0.60 |

Estimates were derived using a two-stage least squares regression on 10 000 simulated data sets. Simulated data sets consisted of 5000 samples. In these simulations, G explains 5% of the variation in X (*R*^{2 }= 0.05).

### Simulation 2: calibration error can bias the MR estimate but does not affect power (Table 2)

When calibration error for X* and Y* was equal (β_{xx}_{*} = β_{yy*}), the MR estimate was unbiased (shown in bold). However, when X and Y were measured with different amounts of calibration error and the true effect of X on Y (β_{xy}) was not zero, MR estimates were biased. Specifically, when β_{xx}_{*} > β_{yy}_{*}, bias was towards the null, and when β_{xx}_{*} < β_{yy}_{*}, bias was away from the null. Absolute bias increased with β_{xy}, but the relative bias was constant (β_{yy*}/β_{xx*}). When β_{xy} = 0, calibration error did not introduce bias. Standard errors decreased when bias towards the null was present and increased when bias away from the null was present, resulting in power estimates and type-II error rates that were unaffected by calibration error.

True effect of X on Y | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Calibration error | β_{xy} = 0.00 | β_{xy} = 0.10 | β_{xy} = 0.20 | β_{xy} = 0.30 | |||||||||

Exposure () | Outcome (β_{yy*}) | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power |

Calibration error for X only | |||||||||||||

1.50 | 1.00 | 0.00 | 0.041 | 0.03 | 0.08 | 0.050 | 0.38 | 0.13 | 0.041 | 0.88 | 0.20 | 0.041 | 1.00 |

1.25 | 1.00 | 0.00 | 0.050 | 0.03 | 0.10 | 0.062 | 0.38 | 0.16 | 0.050 | 0.88 | 0.24 | 0.050 | 0.99 |

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.13 | 0.083 | 0.37 | 0.20 | 0.062 | 0.89 | 0.30 | 0.062 | 1.00 |

0.75 | 1.00 | 0.00 | 0.083 | 0.03 | 0.20 | 0.124 | 0.37 | 0.27 | 0.083 | 0.88 | 0.40 | 0.083 | 0.99 |

0.50 | 1.00 | 0.00 | 0.124 | 0.02 | 0.07 | 0.041 | 0.37 | 0.50 | 0.124 | 0.89 | 1.00 | 0.124 | 1.00 |

Calibration error for Y only | |||||||||||||

1.00 | 1.50 | 0.00 | 0.093 | 0.03 | 0.15 | 0.093 | 0.38 | 0.30 | 0.093 | 0.88 | 0.45 | 0.093 | 0.99 |

1.00 | 1.25 | 0.00 | 0.078 | 0.02 | 0.12 | 0.078 | 0.37 | 0.25 | 0.077 | 0.89 | 0.37 | 0.078 | 0.99 |

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.38 | 0.20 | 0.062 | 0.89 | 0.30 | 0.062 | 1.00 |

1.00 | 0.75 | 0.00 | 0.047 | 0.03 | 0.07 | 0.047 | 0.37 | 0.15 | 0.047 | 0.88 | 0.22 | 0.047 | 1.00 |

1.00 | 0.50 | 0.00 | 0.031 | 0.03 | 0.05 | 0.031 | 0.38 | 0.10 | 0.031 | 0.88 | 0.15 | 0.031 | 0.99 |

True effect of X on Y | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Calibration error | β_{xy} = 0.00 | β_{xy} = 0.10 | β_{xy} = 0.20 | β_{xy} = 0.30 | |||||||||

Exposure () | Outcome (β_{yy*}) | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power | Mean MR estimate | Mean standard error | Power |

Calibration error for X only | |||||||||||||

1.50 | 1.00 | 0.00 | 0.041 | 0.03 | 0.08 | 0.050 | 0.38 | 0.13 | 0.041 | 0.88 | 0.20 | 0.041 | 1.00 |

1.25 | 1.00 | 0.00 | 0.050 | 0.03 | 0.10 | 0.062 | 0.38 | 0.16 | 0.050 | 0.88 | 0.24 | 0.050 | 0.99 |

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.13 | 0.083 | 0.37 | 0.20 | 0.062 | 0.89 | 0.30 | 0.062 | 1.00 |

0.75 | 1.00 | 0.00 | 0.083 | 0.03 | 0.20 | 0.124 | 0.37 | 0.27 | 0.083 | 0.88 | 0.40 | 0.083 | 0.99 |

0.50 | 1.00 | 0.00 | 0.124 | 0.02 | 0.07 | 0.041 | 0.37 | 0.50 | 0.124 | 0.89 | 1.00 | 0.124 | 1.00 |

Calibration error for Y only | |||||||||||||

1.00 | 1.50 | 0.00 | 0.093 | 0.03 | 0.15 | 0.093 | 0.38 | 0.30 | 0.093 | 0.88 | 0.45 | 0.093 | 0.99 |

1.00 | 1.25 | 0.00 | 0.078 | 0.02 | 0.12 | 0.078 | 0.37 | 0.25 | 0.077 | 0.89 | 0.37 | 0.078 | 0.99 |

1.00 | 1.00 | 0.00 | 0.062 | 0.03 | 0.10 | 0.062 | 0.38 | 0.20 | 0.062 | 0.89 | 0.30 | 0.062 | 1.00 |

1.00 | 0.75 | 0.00 | 0.047 | 0.03 | 0.07 | 0.047 | 0.37 | 0.15 | 0.047 | 0.88 | 0.22 | 0.047 | 1.00 |

1.00 | 0.50 | 0.00 | 0.031 | 0.03 | 0.05 | 0.031 | 0.38 | 0.10 | 0.031 | 0.88 | 0.15 | 0.031 | 0.99 |

Estimates were derived using two-stage least squares regression on 10 000 simulated data sets. Simulated data sets consisted of 5000 samples. In these simulations, the IV (G) explains 5% of the variation in X (*R*^{2 }= 0.05).

When calibration errors in X* and Y* were examined jointly, their effects on bias, precision and power were similar to their effects when examined independently (Supplementary Table S2, available as Supplementary data at *IJE* online). When calibration and discrimination error were examined jointly, their effects on bias and power were similar to their effects when examined independently (Supplementary Table S3, available as Supplementary data at *IJE* online).

### Simulations 3 and 4: exposure mis-calibration introduces bias in studies of binary outcomes

As previously reported,^{20} our results show that MR studies of continuous exposures and binary outcomes using a two-stage linear-logistic approach produce biased effect estimates. Under the realistic scenarios examined, discrimination error in X* introduced no additional bias into the MR estimate and resulted in slight increases in the width of the confidence interval with no detectable effect on power (Table 3). Calibration error in X* introduced substantial additional bias into the MR estimate when a true effect was present (odds ratio ≠ 1), with bias away from the null when β_{xx}_{*} < 1 and bias towards the null when β_{xx}_{*} > 1. Calibration error in X* had no clear effect on power (Table 4).

Estimates from MR | ||||
---|---|---|---|---|

Discrimination error for X () | True effect of X on Y (OR_{xy}) | OR | 95% CI | Power |

1.00 | 1.00 | 1.00 | 0.68–1.47 | 0.02 |

1.25 | 1.24 | 0.85–1.81 | 0.20 | |

1.50 | 1.46 | 1.01–2.11 | 0.52 | |

1.75 | 1.66 | 1.16–2.37 | 0.79 | |

2.00 | 1.83 | 1.29–2.60 | 0.92 | |

0.75 | 1.00 | 1.00 | 0.68–1.48 | 0.02 |

1.25 | 1.24 | 0.85–1.81 | 0.20 | |

1.50 | 1.46 | 1.01–2.12 | 0.52 | |

1.75 | 1.66 | 1.16–2.37 | 0.79 | |

2.00 | 1.83 | 1.29–2.61 | 0.93 | |

0.50 | 1.00 | 1.00 | 0.68–1.48 | 0.03 |

1.25 | 1.24 | 0.84–1.81 | 0.19 | |

1.50 | 1.46 | 1.01–2.12 | 0.52 | |

1.75 | 1.66 | 1.16–2.38 | 0.79 | |

2.00 | 1.83 | 1.29–2.61 | 0.92 | |

0.25 | 1.00 | 1.00 | 0.67–1.48 | 0.03 |

1.25 | 1.24 | 0.84–1.82 | 0.19 | |

1.50 | 1.46 | 1.01–2.13 | 0.52 | |

1.75 | 1.67 | 1.16–2.41 | 0.79 | |

2.00 | 1.84 | 1.29–2.63 | 0.92 |

Estimates from MR | ||||
---|---|---|---|---|

Discrimination error for X () | True effect of X on Y (OR_{xy}) | OR | 95% CI | Power |

1.00 | 1.00 | 1.00 | 0.68–1.47 | 0.02 |

1.25 | 1.24 | 0.85–1.81 | 0.20 | |

1.50 | 1.46 | 1.01–2.11 | 0.52 | |

1.75 | 1.66 | 1.16–2.37 | 0.79 | |

2.00 | 1.83 | 1.29–2.60 | 0.92 | |

0.75 | 1.00 | 1.00 | 0.68–1.48 | 0.02 |

1.25 | 1.24 | 0.85–1.81 | 0.20 | |

1.50 | 1.46 | 1.01–2.12 | 0.52 | |

1.75 | 1.66 | 1.16–2.37 | 0.79 | |

2.00 | 1.83 | 1.29–2.61 | 0.93 | |

0.50 | 1.00 | 1.00 | 0.68–1.48 | 0.03 |

1.25 | 1.24 | 0.84–1.81 | 0.19 | |

1.50 | 1.46 | 1.01–2.12 | 0.52 | |

1.75 | 1.66 | 1.16–2.38 | 0.79 | |

2.00 | 1.83 | 1.29–2.61 | 0.92 | |

0.25 | 1.00 | 1.00 | 0.67–1.48 | 0.03 |

1.25 | 1.24 | 0.84–1.82 | 0.19 | |

1.50 | 1.46 | 1.01–2.13 | 0.52 | |

1.75 | 1.67 | 1.16–2.41 | 0.79 | |

2.00 | 1.84 | 1.29–2.63 | 0.92 |

Estimates were derived using a two-stage regression (i.e. a linear regression of X* on G, followed by a logistic regression of Y on the predicted X value from the first regression) on 10 000 simulated data sets. Simulated data sets consisted of 5000 samples. In these simulations, the IV (G) explains 5% of the variation in X (*R*^{2 }= 0.05). ORs and CIs are derived from the means of the beta and standard errors derived from the simulations.

OR, odds ratio; CI, confidence interval.

Estimates from MR | ||||
---|---|---|---|---|

Calibration error for X (β_{xx*}) | True effect of X on Y (OR_{xy}) | OR | 95% CI | Power |

1.50 | 1.00 | 1.00 | 0.79–1.26 | 0.02 |

1.25 | 1.15 | 0.92–1.44 | 0.23 | |

1.50 | 1.29 | 1.03–1.60 | 0.61 | |

1.75 | 1.40 | 1.13–1.74 | 0.87 | |

2.00 | 1.49 | 1.21–1.84 | 0.97 | |

1.25 | 1.00 | 1.00 | 0.76–1.32 | 0.03 |

1.25 | 1.18 | 0.90–1.55 | 0.24 | |

1.50 | 1.35 | 1.04–1.76 | 0.61 | |

1.75 | 1.50 | 1.16–1.93 | 0.87 | |

2.00 | 1.62 | 1.26–2.08 | 0.97 | |

1.00 | 1.00 | 1.00 | 0.70–1.41 | 0.03 |

1.25 | 1.24 | 0.88–1.74 | 0.23 | |

1.50 | 1.45 | 1.04–2.02 | 0.61 | |

1.75 | 1.66 | 1.20–2.28 | 0.87 | |

2.00 | 1.83 | 1.34–2.50 | 0.97 | |

0.75 | 1.00 | 1.00 | 0.63–1.59 | 0.02 |

1.25 | 1.33 | 0.84–2.09 | 0.23 | |

1.50 | 1.65 | 1.06–2.57 | 0.62 | |

1.75 | 1.95 | 1.27–3.00 | 0.87 | |

2.00 | 2.23 | 1.47–3.38 | 0.97 | |

0.50 | 1.00 | 1.00 | 0.50–2.00 | 0.03 |

1.25 | 1.54 | 0.78–3.04 | 0.24 | |

1.50 | 2.13 | 1.10–4.12 | 0.61 | |

1.75 | 2.72 | 1.43–5.18 | 0.87 | |

2.00 | 3.33 | 1.78–6.23 | 0.97 |

Estimates from MR | ||||
---|---|---|---|---|

Calibration error for X (β_{xx*}) | True effect of X on Y (OR_{xy}) | OR | 95% CI | Power |

1.50 | 1.00 | 1.00 | 0.79–1.26 | 0.02 |

1.25 | 1.15 | 0.92–1.44 | 0.23 | |

1.50 | 1.29 | 1.03–1.60 | 0.61 | |

1.75 | 1.40 | 1.13–1.74 | 0.87 | |

2.00 | 1.49 | 1.21–1.84 | 0.97 | |

1.25 | 1.00 | 1.00 | 0.76–1.32 | 0.03 |

1.25 | 1.18 | 0.90–1.55 | 0.24 | |

1.50 | 1.35 | 1.04–1.76 | 0.61 | |

1.75 | 1.50 | 1.16–1.93 | 0.87 | |

2.00 | 1.62 | 1.26–2.08 | 0.97 | |

1.00 | 1.00 | 1.00 | 0.70–1.41 | 0.03 |

1.25 | 1.24 | 0.88–1.74 | 0.23 | |

1.50 | 1.45 | 1.04–2.02 | 0.61 | |

1.75 | 1.66 | 1.20–2.28 | 0.87 | |

2.00 | 1.83 | 1.34–2.50 | 0.97 | |

0.75 | 1.00 | 1.00 | 0.63–1.59 | 0.02 |

1.25 | 1.33 | 0.84–2.09 | 0.23 | |

1.50 | 1.65 | 1.06–2.57 | 0.62 | |

1.75 | 1.95 | 1.27–3.00 | 0.87 | |

2.00 | 2.23 | 1.47–3.38 | 0.97 | |

0.50 | 1.00 | 1.00 | 0.50–2.00 | 0.03 |

1.25 | 1.54 | 0.78–3.04 | 0.24 | |

1.50 | 2.13 | 1.10–4.12 | 0.61 | |

1.75 | 2.72 | 1.43–5.18 | 0.87 | |

2.00 | 3.33 | 1.78–6.23 | 0.97 |

Estimates were derived using a two-stage regression (i.e. a linear regression of X* on G, followed by a logistic regression of Y on the predicted X value from the first regression) on 10 000 simulated data sets. Simulated data sets consisted of 5000 samples. In these simulations, G explains 5% of the variation in X (*R*^{2 }= 0.05). ORs and CIs are derived from the means of the beta and standard errors derived from the simulations OR, odds ratio; CI, confidence interval.

### Additional simulations

Additional simulations with a binary outcome showed that the bias of the two-stage estimator decreased with rarer outcomes and that imperfect specificity generated more bias than imperfect sensitivity. Further details are available in the supplementary materials.

## Discussion

To our knowledge, this is the first article to systematically consider the effect of measurement error on IV estimates in the MR setting. We have examined two types of measurement error that are common in epidemiological research: discrimination error and calibration error. The third characteristic of our theoretical framework for assessing measurement error is ‘bias’ (Figure 1) in the intercept term in the regression calibration setting, which is not expected to have any effect on bias or power in MR studies, as it does not affect the non-intercept regression coefficients. Our results confirm this expectation (not reported).

Using simulated data, we observed that MR estimates for continuous outcomes are not biased by exposure discrimination error (in contrast to OLS regression) or by outcome discrimination error (similar to OLS), consistent with expectations from equations derived in this work. This is expected, as 2SLS is known to be a consistent estimator of the coefficient for the true exposure,^{25} not being susceptible to OLS ‘regression dilution bias’ when exposures are measured with classical (discrimination) error. This consistency in the presence of classical measurement error has been mentioned as a key advantage of MR that should motivate its use in epidemiology.^{1}^{,}^{11}^{,}^{26}^{,}^{27}

Our simulations also show that increasing discrimination error in the measured exposure (X*) has little impact on precision and power, another important advantage of MR. On the other hand, increasing discrimination error in the measured outcome (Y*) will increase standard errors and decrease power (similar to OLS regression), and these effects increase as the true effect of X on Y increases. This result is consistent with analytic expectations based on the equation for the standard error of the MR estimate provided by Martens *et al.* (Equations 5–7),^{19} where increases in the variance of Y* have a clear effect on the variance of the MR estimator, but increases in the variance of X* have a less straightforward effect owing to accompanying changes in β_{x*x} and β_{xx*}. For continuous outcomes, our analytic conclusions and our simulation-based conclusions show that calibration errors in X and Y can bias the MR estimate, and this bias can be understood in terms of ‘differential calibration error’ between the measured exposure (X*) and the measured outcome (Y*). In other words, if X* and Y* have similar calibration error, then bias will be small, but if X* and Y* are calibrated in different ways with respect to X and Y, then bias may be substantial. Differential calibration error affects the standard error of the MR estimate in such a way that power is always equal to the scenario where calibration is perfect (similar to OLS regression).^{10}

We have also demonstrated the unique aspects of dealing with measurement error in MR studies of binary outcomes (in the cohort setting). Using a two-stage linear-logistic regression approach, MR estimates are typically somewhat biased, with the confounding structure influencing the magnitude of the bias.^{20} Several authors have described this bias,^{2}^{,}^{20}^{,}^{28} which arises because a non-linear relationship between X and Y depends on the unknown distribution of U, whereas a linear relationship does not. However, based on the realistic scenarios examined here, this bias will typically be mild, and power will not be substantially affected. Similar to continuous outcomes, exposure discrimination error results in slight decreases in power and precision, whereas exposure calibration error introduces additional bias. Suboptimal specificity for the measured outcome results in much larger biases towards the null and reductions in power than suboptimal sensitivity because reduced specificity results in misclassification of a larger number of individuals.^{29} The bias observed in our analyses of binary outcomes decreased as the prevalence of the outcome decreased, consistent with observations made by Bowden and Vansteelandt^{22} in the context of structural mean models.

Consideration of measurement error issues may be important for the interpretation of estimates from MR studies of error-prone exposures. For instance, self-reported body mass index (BMI), commonly used as an exposure in MR studies,^{30}^{,}^{31} is known to be under-reported as BMI increases, resulting in a BMI measure that is not perfectly calibrated to its true value (β_{xx}_{*} < 1). Thus, if self-reported BMI is used as an exposure (or an outcome) in the MR setting, the MR effect estimate may be underestimated (or overestimated, respectively). If BMI is considered a proxy for exposures such as peripheral or central adiposity or percent body fat, measurement error structure is even more complicated, an issue that warrants further study. Similarly, self-reported hours of sleep are not well-calibrated, with over-reporting for individuals who sleep less.^{15}

Error in molecular biomarker measurements can occur for several reasons. For example, biomarkers can vary by time of day, month, season or in response to acute events or preclinical disease.^{32} Hence, the timing of sample collection may lead to discrepancies between measured values and longer-term average values. Furthermore, measurements made outside the relevant aetiological time window may not accurately reflect the relevant historical exposure value.^{33} A wide array of laboratory factors can affect biomarkers in stored samples, including freeze–thaw cycling and storage conditions, contamination (e.g. trace elements) and anticoagulants or stabilizing agents.^{32}^{,}^{34} Additional variation in measured values could be introduced by temporal variation in the laboratory environment or inherent limitations of the measurement method.^{35} Unfortunately, the measurement error structure for any biomarker or outcome may not always be measurable owing to a lack of gold-standard measures. However, there are various strategies for reducing the influence of measurement error in biomarker studies, including careful sample handling, inclusion of QC samples to assess and account for errors^{35} and taking multiple measurements over a period.^{36}^{,}^{37}

We did not explore the effects of more complex types of measurement error in this work, such as limit of detection errors, where low values cannot be detected. Schisterman and Little^{38} have described appropriate strategies for dealing with such data. We also did not consider ‘differential’ measurement error (i.e. errors whose values depend on other features of the data, such as covariates or outcomes). Although this work does not consider more complex forms of measurement error, their effects can be conceptualized to some extent by understanding their effects on the parameters described in this work. For example, limit of detection errors, if not properly accounted for, would be likely to result in calibration and bias errors owing to a mass of X* values at zero.

For this analysis, we have used a continuous variable as an IV, representing a genetic risk score for X. Such a score may be unrealistic for exposures with few genetic determinants or undesirable because of potential violations of the assumptions required for IV analyses. However, gene-based IVs can be modelled in multiple ways (e.g. a single or multiple allele count variables, dummy variables for specific genotypes/haplotypes), and it has previously been shown that the key factor influencing power is the *R*^{2} of the first-stage regression, regardless of what type of IV is used.^{24} Thus, our findings for a given first-stage *R*^{2} will apply approximately to any type of instrument be it continuous, discrete or a set of multiple instruments. Our analyses were performed using an *R*^{2} of 0.05 for the effect of the IV on the exposure. This value is becoming realistic for many disease-related biomarkers of interest, including various lipid-related traits^{13} and inflammatory biomarkers,^{14}^{,}^{39}^{,}^{40} assuming multi-SNP IVs are used, but this value remains unrealistically high for other biomarkers. MR studies of exposures that have genetic determinants with weaker effects are more likely to require unrealistic sample sizes. Our simulations were conducted under the assumption that IVs were valid (i.e. all the IV assumptions are met). First-stage *F* values for all scenarios considered in this work were > 50, and therefore free of detectable weak-IV biases. Hence, our results apply to strong IVs only. One key difference between weak-IV biases and the biases owing to discrimination and calibration discussed in this work is that these measurement error biases do not inflate the type-I error rate, whereas weak-IV biases can increase this rate. Genetic variants are typically measured with little error, assuming modern genotyping technologies and adequate QC measures are used.^{5}^{,}^{6} Thus, we did not devote substantial attention to genotyping error in this article.

Data for our analyses were generated according to the two-stage models that were used to analyse the data, assuming no interactions. In scenarios where violations of these assumptions occur, effect estimates will likely be biased.^{41}^{,}^{42} Other models are available for binary outcomes, such as probit structural equation models and generalized method of moments estimators; however, we chose the two-stage linear-logistic approach (a linear regression followed a logistic regression in the second stage) model because it is most familiar to epidemiologists. MR studies of binary outcomes are prone to biases, which are difficult to completely account for using standard statistical methods; however, such bias can be reduced by including the residuals from the first-stage linear regression in the second-stage logistic regression, assuming the confounding variable is normally distributed and not a modifier of the effect of X on Y.^{20}^{,}^{43} This ‘residual inclusion’ method has also been shown to reduce (but not eliminate) 2SLS bias when effects are non-linear,^{44} although the interpretation of the resulting odds ratio parameter is not straightforward. The causal inference literature provides additional methods for handling such settings,^{45–47} but these impose an assumption of homogeneity (i.e. constant causal effect across units in the population) or give only local causal effects (i.e. effects only for the sub-population for which the instrument changes the exposure). This 2SLS MR estimate has been called the ‘linear IV average effect estimator’ and corresponds to the population, individual or local average causal effect, depending on what model assumptions are made.^{41} The two-stage linear-logistic model, also called the ‘Wald odds ratio’, corresponds to the causal odds ratio or the local causal odds ratio.^{41}

Although this work applies to the cohort studies, case–controls studies are a common setting for analysis of binary outcomes. MR analyses of case–control data are ideally conducted using methods that integrate information on the prevalence or incidence of the outcome or the distribution of the IV in the population.^{22}^{,}^{48} Alternatively, a rare disease assumption can be used (when appropriate) to obtain approximate estimates.^{22} Also, only continuous exposures are considered in this work. Binary exposures will likely be less common in MR applications; however, additional research is needed to evaluate analysis methods and potential biases associated with these scenarios.^{49} Future studies should explore the effects of measurement error in these settings.

In conclusion, measurement error in both the exposure and the outcome can affect both bias and precision in MR studies. Understanding the potential impact of such errors will help researchers interpret estimates derived from MR analyses. Sensitivity analyses and QC procedures can be used to explore the degree to which the results of MR studies may be affected by measurement error. In future work, we will consider methods for quantifying and accounting for measurement error in MR analyses.

## Supplementary Data

Supplementary Data are available at *IJE* online.

## Funding

This work was supported by the Department of Defense [W81XWH-10-1-0499 to B.P.].

**Conflict of interest:** None declared.

Classical measurement error (i.e. discrimination error) in continuous exposures and outcomes will not bias MR estimates (under traditional assumptions); precision and power will be reduced in the presence of outcome error but essentially unaffected by exposure error.

Calibration error in exposure and outcome measures can bias MR estimates if a true effect exists, but power will not be affected.

For binary outcomes, the biased two-stage linear-logistic estimator is additionally biased by calibration error, but not discrimination error, in the measured exposure, and the magnitude and direction of this bias depend on the nature of the mis-calibration.

## References

*Department of Biostatistics Working Papers 2008*(Working Paper 198). http://www.bepress.com/jhubiostat/paper198 (5 September 2012, date last accessed)