-
PDF
- Split View
-
Views
-
Cite
Cite
Maya B Mathur, Louisa H Smith, Kazuki Yoshida, Peng Ding, Tyler J VanderWeele, E-values for effect heterogeneity and approximations for causal interaction, International Journal of Epidemiology, Volume 51, Issue 4, August 2022, Pages 1268–1275, https://doi.org/10.1093/ije/dyac073
- Share Icon Share
Abstract
Estimates of effect heterogeneity (i.e. the extent to which the causal effect of one exposure varies across strata of a second exposure) can be biased if the exposure–outcome relationship is subject to uncontrolled confounding whose severity differs across strata of the second exposure.
We propose methods, analogous to the E-value for total effects, that help to assess the sensitivity of effect heterogeneity estimates to possible uncontrolled confounding. These E-value analogues characterize the severity of uncontrolled confounding strengths that would be required, hypothetically, to ‘explain away’ an estimate of multiplicative or additive effect heterogeneity in the sense that appropriately controlling for those confounder(s) would have shifted the effect heterogeneity estimate to the null, or alternatively would have shifted its confidence interval to include the null. One can also consider shifting the estimate or confidence interval to an arbitrary non-null value. All of these E-values can be obtained using the R package EValue.
We illustrate applying the proposed E-value analogues to studies on: (i) effect heterogeneity by sex of the effect of educational attainment on dementia incidence and (ii) effect heterogeneity by age on the effect of obesity on all-cause mortality.
Reporting these proposed E-values could help characterize the robustness of effect heterogeneity estimates to potential uncontrolled confounding.
Effect heterogeneity estimates can be biased if the exposure–outcome relationship is subject to uncontrolled confounding whose severity differs across strata of the second exposure.
We propose sensitivity analyses, analogous to the E-value, that characterize the severity of uncontrolled confounding strengths that would be required to ‘explain away’ an estimate of effect heterogeneity.
We provide an R package, EValue, to conduct all proposed sensitivity analyses.
Background
Estimates of effect heterogeneity (i.e. the extent to which the causal effect of an exposure, X, varies across strata of another variable, Z) can be biased if the exposure–outcome relationship is subject to uncontrolled confounding whose severity differs across strata of Z.1,2 For example, suppose that in the Z = 1 stratum, the exposure–outcome relationship is biased upward due to uncontrolled confounding, but that in the Z = 0 stratum, the relationship is not biased. The uncontrolled confounding in the Z = 1 stratum could, for example, spuriously increase the magnitude of the observed ratio or difference between the two strata’s estimated exposure–outcome effects, thus increasing the observed estimate of effect heterogeneity. Such bias in effect heterogeneity estimates can be consequential in practice. For example, effect heterogeneity estimates on the additive scale can inform decisions of which individuals (i.e. which stratum of Z) to treat in order to most reduce the disease caseload if treatment resources are limited.1 In this context, a biased estimate of effect heterogeneity could potentially suggest a treatment allocation scheme that fails to minimize, or even increases, the overall caseload.
Although sensitivity analyses exist for causal interactions (i.e. effects of jointly manipulating X and Z vs each independently),3 we are not aware of comparable methods for effect heterogeneity. We propose sensitivity analyses that characterize the severity of uncontrolled confounding strengths that would be required, hypothetically, to ‘explain away’ an estimate of multiplicative or additive effect heterogeneity in the sense that controlling for those confounder(s) would have shifted the effect heterogeneity estimate to the null, or alternatively would have shifted its confidence interval to include the null. We recommend reporting E-values for both the point estimate and the confidence interval; the latter is especially important for effect heterogeneity estimates, for which statistical precision is often considerably lower than for total effects. We also discuss ‘non-null’ E-values required to shift the estimate or its confidence interval to any arbitrary value. These metrics are straightforward extensions of the standard E-value for total effects, which represents the minimum strength of association, on the risk ratio (RR) scale, that uncontrolled confounder(s) would need to have with the exposure, the outcome, or both, conditional on any measured and controlled confounders, to explain away the total effect.4,5 As we describe below, it is equivalent to interpret the E-value as the minimum strengths of association that uncontrolled confounder(s) would need to have with both the exposure and the outcome if these two strengths of association are taken to be of equal magnitude.
Like the standard E-value, our proposed E-values for effect heterogeneity do not require assumptions on the nature of the uncontrolled confounder(s).4–6 That is, these metrics represent bounds under hypothetical worst-case confounding: they consider the maximum bias that could be generated by a given strength of confounder strengths, but actual uncontrolled confounders might not generate that much bias.4,5 We also give alternative E-values that can be applied under the assumption that uncontrolled confounding operates in the same direction in each stratum of Z (‘unidirectional confounding’). We provide software to calculate all of these E-values, discussed below.
Details and reporting guidelines for the standard E-value are discussed elsewhere.4–7 The E-value has limitations, which have been discussed and debated elsewhere.8–11 For example, to avoid making assumptions about the prevalence or distribution of uncontrolled confounder(s), the E-value does not make use of potential known information about these and hence may understate the amount of confounding required to explain away an effect. Additionally, the E-value does not account for biases or threats to inferential validity other than uncontrolled confounding, such as measurement error, selective reporting or uncontrolled multiple testing; we provide methods and recommendations regarding these issues elsewhere.6,12–14 The same considerations and limitations apply for the analogues we propose.
Setting and notation
All proofs, with formalized assumptions and definitions, appear in the Supplementary material (available as Supplementary data at IJE online). Let define the strata between which the causal effect of might vary, and let be a potential outcome when intervening to set X = x. (If Z is categorical rather than binary, the same results can be applied to contrasts between two specified levels of Z.) We assume that the confounded estimate of effect heterogeneity is greater than the null (e.g. >1 for multiplicative measures or >0 for additive measures); otherwise, one can simply reverse the coding of Z before applying the results below. We also assume that, other than the omission of uncontrolled confounder(s), the estimator used to obtain the confounded estimate of effect heterogeneity is correctly specified; i.e. the estimate would have been unbiased for the causal estimand if the uncontrolled confounder(s) had in fact been controlled. For example, if the statistical interaction coefficient for X × Z in a regression model is taken to be the effect heterogeneity estimate, this requires assuming that X does not affect Z, even if there were no uncontrolled confounding.
We define two sensitivity parameters for each stratum, called the ‘within-stratum confounding strengths’. These are the same parameters as used for the standard E-value, but applied separately to each stratum of Z.5 Within stratum Z = z, let Uz denote the uncontrolled confounder(s), defined as a set of one or more variables that would suffice to control for confounding of the exposure–outcome relationship in this stratum.
Each parameter represents, within stratum Z = z, the maximal RR of Uz = u for X = 1 vs X = 0 across strata of Uz. As for the standard E-value, the same sensitivity parameters and results accommodate the possibility that Uz includes one or more confounders of any type (e.g. binary, categorical or continuous) and distribution. For example, if Uz is binary, is simply the RR relating Uz to Y within stratum Z = z. Precise interpretations of the sensitivity parameters when Uz contains multiple confounders or is continuous are given elsewhere.6
We define the E-value for an effect heterogeneity point estimate as the minimum magnitude that at least one of the four within-stratum confounding strengths must have, on the RR scale, such that fully controlling for confounding would have shifted the estimate to the null. Like the standard E-value, this E-value essentially sets the four confounding strengths equal to one another to obtain the required joint minimum for all of them. For the special case in which confounding is assumed to be unidirectional, we define E-values as the minimum value that at least one of the two confounding strengths in at least one stratum must have in order to explain away the effect heterogeneity.
To dispel a common misconception, we note that mathematically setting the confounding strengths equal to one another in this manner does not require assuming that, in reality, the confounding strengths actually are equal. The E-value is derived by considering all possible combinations of confounding strengths that could produce enough bias to explain away the effect heterogeneity, and then solving for the combination that minimizes the maximum of these confounding strengths (Supplementary material, Section 2.1, available as Supplementary data at IJE online). This unique combination, it turns out, is the one in which the confounding strengths are equal. If, in reality, there are uncontrolled confounders whose confounding strengths are not equal, the E-value still applies; it states that at least one of the confounding strengths must exceed the E-value in order to explain away the effect. Again, this interpretation is mathematically equivalent to considering confounding strengths of equal magnitude. Additionally, we note that setting the confounding strengths equal to one another does not require assuming that these associations arise from the same confounders in the two strata.
Multiplicative effect heterogeneity
This E-value represents the minimum magnitude that at least one of the four within-stratum confounding strengths must have in order to explain away the effect heterogeneity (i.e. to have ). Equivalently, this E-value represents the minimum magnitude of all four within-stratum confounding strengths that would be required to explain away the effect heterogeneity if all four confounding strengths are taken to be of equal magnitude. We use ‘magnitude’ to indicate that the confounding strengths on the RR scale are taken to be regardless of the direction of association, as for the standard E-value.4 In fact, this bound is attained when the effects within each stratum have bias of the same magnitude, but in opposite directions. It might be quite unlikely in practice that any given uncontrolled confounder has all four confounding strengths of equal magnitude, or that confounding bias operates in different directions in each stratum. We return to these points in the section “Interpreting E-values in light of their mathematical conservatism”. In the Supplementary material Section 2.3 (available as Supplementary data at IJE online), we establish connections between this bound and classical bounds on bias in total effects due to uncontrolled confounding.15,16
The E-value required to shift to a non-null value, , rather than to the null can be obtained by replacing in Equation (1) with . The E-value required to shift the confidence interval to include the null or another specified value can be obtained by replacing above with the lower confidence interval limit. All of these E-values can be obtained using existing software that calculates E-values for total effects17,18 by simply performing the calculation using rather than itself, and likewise for the confidence interval limit or , because doing so is equivalent to applying Equation (1).
This represents the minimum magnitude that at least of one of the within-stratum confounding strengths must have in at least one stratum of Z in order to explain away the effect heterogeneity. Equivalently, this E-value represents the minimum magnitude that both within-stratum confounding strengths must have in at least one stratum of Z if both confounding strengths in that stratum are taken to be equal. This bound is attained when the effect within the other stratum is unbiased. The expression is in fact equivalent to the standard E-value for .
Additive effect heterogeneity
and
This bound is attained when the within-stratum effects are biased in opposite directions, but with potentially different amounts of absolute bias () in each stratum. (This asymmetry arises because, in the additive case, the amount of absolute bias produced by a given fixed set of sensitivity parameters depends on nuisance parameters, such as exposure prevalences and outcome probabilities, that can differ between strata of Z. In contrast, for the multiplicative case, the amount of multiplicative bias is independent of any such nuisance parameters.) The Supplementary material (available as Supplementary data at IJE online) provides an E-value for the confidence interval, a generalization for shifting ICc to a non-null value, and E-values that apply if the direction of the confounding bias is assumed to be unidirectional and positive, unidirectional and negative, or unidirectional with the direction unknown. All of these E-values for interaction contrasts can be obtained in R via EValue :: evalues.IC.17
Applied examples
Education and dementia
Letenneur et al.20 investigated the effect of low vs high education (≤7 years of schooling vs ≥12 years) on dementia incidence, additionally estimating effect heterogeneity by sex. They pooled data from population-based longitudinal studies of aged women (n = 3,352) and men (n = 2,395). In analyses that adjusted for baseline confounders (age, smoking, myocardial infarction, stroke and study centre), the authors estimated a strong association between low education and dementia incidence in women [RR = 3.78 (95% CI: 1.64, 8.72)], but an apparently much weaker association in men [RR = 1.09 (0.61, 1.94)]. The authors suggested that uncontrolled confounding might have produced this apparent effect heterogeneity. For example, they speculated that socio-economic status, an uncontrolled confounder, might produce different biases for each sex.
On the multiplicative scale, we estimated 3.47 (1.26, 9.57); P = 0.02. Without making assumptions on the direction of uncontrolled confounding bias for each sex, the E-values for this estimate and its lower confidence interval limit were thus 3.13 and 1.49, respectively [from Equation (1)]. Thus, at least one of the four confounding strengths would need to be at least 3.13 on the RR scale in order to explain away the effect heterogeneity and would need to be at least 1.49 to shift the confidence interval to include the null. If we assume that uncontrolled confounding operated in the same direction for each sex, then the E-values for the point estimate and its lower confidence interval limit increase to 6.39 and 1.82, respectively [from Equation (2)]. Given these studies’ control of several known confounders, it may be somewhat implausible that uncontrolled confounders, such as socio-economic status, were strong enough to attain these confounding strengths, although such judgments would need to be informed by domain expertise.
On the additive scale, we estimated risk differences of 0.04 (0.02, 0.05) for women and 0.01 (–0.01, 0.02) for men, such that 0.03 (0.01, 0.05); P = 0.01. This analysis does not adjust for confounders because Letenneur et al.20 reported only unadjusted prevalences. Without assumptions on the direction of confounding bias, the E-values for the interaction contrast and its lower confidence interval limit were thus 2.54 and 1.44, respectively [from Equation (3)]. If we assume that uncontrolled confounding operated in the same unspecified direction for both sexes, then these E-values respectively become 3.30 and 1.63 (Supplementary material, Section 3.2, available as Supplementary data at IJE online).
Obesity and all-cause mortality
Winter et al.21 meta-analysed eight longitudinal studies to investigate the extent to which the effects of body mass index (BMI) on mortality differed by age (<65 vs ≥65 years). The authors estimated that being obese (BMI ) vs normal weight ( BMI < 25) was associated with increased mortality among participants aged <65 years [HR = 1.42 (95% CI: 1.22, 1.65)], but not among participants aged ≥65 years [HR = 1.04 (95% CI: 0.91, 1.19)]. Confounding control in the eight studies was quite limited: e.g. five studies did not control for comorbid health conditions, two studies did not control for smoking, five did not control for physical activity, and none controlled for diet or caloric intake.
On the multiplicative scale, we estimated 1.37 (1.12, 1.67); P = 0.002. E-values can be applied to meta-analysis point estimates, in which case they represent average confounding strengths across studies.13,22 Without making assumptions on the direction of uncontrolled confounding bias for each age group, the E-values for this estimate and its lower confidence interval limit were thus 1.61 and 1.30, respectively. If we assume that uncontrolled confounding operated in the same direction for each age group, then the E-values for the point estimate and its lower confidence interval limit increase to 2.07 and 1.48, respectively. Given these studies’ limited control of confounders whose associations with both obesity and mortality may be quite strong, it may be plausible that uncontrolled confounders could have the strengths of association indicated by the E-values. Furthermore, it may be plausible that the strength of confounding bias could differ by age group. For example, older individuals might be less physically resilient to comorbid conditions than younger individuals, such that having comorbid conditions might be more strongly associated with BMI or with mortality among older individuals. We could not conduct analyses on the additive scale given the statistics reported in the meta-analysis.21
Practical interpretation and reporting of E-values
Elsewhere, we have provided recommendations on reporting E-values for total effects4,7 and provided caveats about potential misinterpretations.11,23,24 These considerations apply to the present analogues as well; we comment here on only a subset of these considerations that are particularly pertinent to the setting of effect heterogeneity.
Assessing the plausibility that confounding strengths attain the E-value
As we have emphasized in the context of total effects, E-values must be interpreted in light of the quality of a study’s existing control for confounding: it will be more plausible that there exists uncontrolled confounding as strong as is indicated by the E-value in a poorly controlled study than a well-controlled study.4,7 As a limitation of sensitivity analyses in general, it can be challenging to assess plausible strengths of association of uncontrolled confounder(s). This may be particularly so in the context of effect heterogeneity, for which the relevant confounding strengths are defined within the strata of Z rather than marginally.
In the context of total effects, we have suggested listing specific variables that are thought to be uncontrolled confounders and, to help benchmark intuitions, reporting measured confounders’ strengths of association with X and with Y. Because E-values consider joint associations produced by potentially multiple uncontrolled confounder(s), above and beyond controlled confounder(s), it may be particularly informative to report confounding strengths for each measured confounder as well as for all measured confounders jointly.6 However, these empirical benchmarks must be interpreted carefully to avoid common misconceptions.7,24
When applying E-values for effect heterogeneity, one could similarly report measured confounding strengths within strata of Z; e.g. one could report the associations of each measured confounder, and of all measured confounders jointly, with X, and with Y for each stratum of Z. We acknowledge, though, that comparing the E-value to such benchmarks may be difficult because confounders, whether measured or not, can differ substantially in their associations with the exposure and outcome as well as in the strengths of these associations in each stratum. When additional information about uncontrolled confounders is available, the E-value could be supplemented or replaced by more precise sensitivity analyses; we return to this point in the next section.
Interpreting E-values in light of their mathematical conservatism
As noted above, like the standard E-value for total effects,4 our proposed analogues avoid making assumptions about the prevalence or distribution of uncontrolled confounder(s) by not incorporating any sensitivity parameters regarding these properties. Therefore, E-values might understate the amount of confounding required to explain away the effect or effect heterogeneity. For example, if Uz is a binary variable, then given its confounding strengths, the bias it produces in stratum Z = z would be maximized if and (or vice versa). The E-value is conservative in that it allows for this extreme, and sometimes implausible, possibility.
In some settings, one might wish to conduct a sensitivity analysis for specific uncontrolled confounder(s) whose prevalence conditional on X or conditional on Z is known. One could then obtain a more precise sensitivity analysis by applying methods that do incorporate information about prevalences. For effect heterogeneity, one could apply existing sensitivity analyses for uncontrolled confounding that incorporate prevalences or other external information (reviewed in 25) to bound the bias in each stratum of Z separately. Analogous bounds incorporating prevalences exist for causal interaction.3
Of course, the disadvantage of incorporating prevalences is that, if the specified prevalences are incorrect or if there exist other uncontrolled confounder(s) with prevalences other than those specified, bounds obtained by incorporating prevalences may give a false impression of robustness to uncontrolled confounding. Additionally, methods that involve specifying numerous sensitivity parameters could introduce additional ‘researcher degrees of freedom’, such that researchers could potentially search for combinations of parameters that produce attractive results of sensitivity analyses.4
For these reasons, in the context of total effects, we suggested that when external information is available regarding uncontrolled confounder(s), one might first report the E-value, because despite its conservatism, the E-value might nevertheless be large enough to suggest robustness to uncontrolled confounding, even without making use of external information.25 One could then supplement this mathematically conservative analysis with additional sensitivity analyses that do incorporate external information.25 The same considerations also apply when considering effect heterogeneity.
A second form of conservatism arises specifically in the context of E-values for effect heterogeneity. Namely, as noted above, the general bound in Equation (1) is attained when the effects in each stratum of Z have bias of the same magnitude, but in opposite directions. In some scientific contexts, it might not be plausible that uncontrolled confounder(s) could in fact produce bias in opposite directions, and in these settings, the general bound in Equation (1) might again understate the amount of confounding required to explain away the effect heterogeneity. However, in other contexts, it might be quite plausible that confounding bias could operate in different directions, because either the uncontrolled confounder(s) themselves, or alternatively the direction of their effects on X or on Y, could differ between strata. Consider a hypothetical study examining effect heterogeneity between men and women (Z) in the effects of smoking (X) on all-cause mortality (Y). Suppose the study excluded heavy drinkers but did not control for moderate vs low alcohol consumption (Uz)—a variable that is associated with increased smoking for both sexes. Moderate alcohol consumption is thought to reduce the risk of cardiovascular disease and diabetes, especially among individuals with predisposing risk factors, but is thought to increase the risk of cancer and other chronic diseases.26 In some populations, then, moderate consumption could plausibly have a protective net effect on all-cause mortality for men (with their relatively higher burden of cardiovascular and metabolic diseases and related risk factors), yet could have a detrimental net effect for women.27 If this is the case, alcohol consumption could produce confounding bias in different directions for men vs women.
Conclusion
Reporting these proposed E-values could help characterize the robustness of effect heterogeneity estimates to potential uncontrolled confounding. These results apply to effect heterogeneity rather than causal interaction; the Supplementary material Section 4 (available as Supplementary data at IJE online) provides E-values for causal interaction that are approximate, ‘weak’ bounds in a sense detailed there. The above E-values for effect heterogeneity could also be applied for causal interaction if one exposure is assumed to be unconfounded (e.g. because it was randomized).
Ethics approval
Not applicable.
Data Availability
All code, materials and data required to reproduce the applied example are publicly available and documented (https://osf.io/79scv/). Code for the R package EValue and the online tool are open-source (https://github.com/mayamathur/evalue_package and https://github.com/mayamathur/evalue_website).
Supplementary data
Supplementary data are available at IJE online.
Author contributions
M.B.M. conceived of the research, led theoretical developments, led writing, analysed the applied example and wrote code for the R package. L.H.S. contributed to theoretical developments. L.H.S. and all other authors contributed critical intellectual content to writing.
Funding
M.B.M. and T.V.W. were supported by R01 CA222147 and R01 LM013866. M.B.M. was supported by: (i) the NIH-funded Biostatistics, Epidemiology and Research Design (BERD) Shared Resource of Stanford University’s Clinical and Translational Education and Research (UL1TR003142); (ii) the Biostatistics Shared Resource (BSR) of the NIH-funded Stanford Cancer Institute (P30CA124435); and (iii) the Quantitative Sciences Unit through the Stanford Diabetes Research Center (P30DK116074). K.Y. was supported by Brigham and Women’s Hospital Department of Medicine Fellowship Award and K23AR076453 (NIAMS). The funders had no role in the design, conduct or reporting of this research.
Acknowledgements
None.
Conflict of interest
None declared.