Practice of Epidemiology Sensitivity Analysis on Odds Ratios

,

In the classic 1959 paper (1), Cornfield et al. dealt a deathblow to Fisher's constitutional hypothesis that the observed association between cigarette smoking and lung cancer is noncausal, and gave birth to the field of sensitivity analysis (2).The key idea derived from Cornfield's work is that if a strong observed association was merely due to confounding, the confounding variable needs to be strongly associated with both the exposure and the outcome.
Precisely, let the risk ratio (RR) between binary random variables X and Y be defined as: Make the assumption that X and Y are independent conditional on a third binary random variable Z, which is not negatively associated with either X or Y: Note that the second assumption can always be satisfied by recoding the variables X or Y, and that as a consequence RR XY ≥ 1 as well.The following result is often referred to as the classical Cornfield inequalities (3), despite only the first half being proven by Cornfield (1,4).Theorem 1.Under Assumption 1, Even though Theorem 1 is stated in terms of RR XZ and RR ZY , X does not need to be a cause of Z or Z of Y, because the risk ratio is a measure of association, not causation.Assumption 1 is valid in many causal graphs; for example, in a fork where Z is a common cause of X and Y, or in a chain where Z is a mediator between X and Y (or between Y and X).It might feel more "natural" to use RR ZX instead of RR XZ in a fork, for example, but a result like Theorem 1 where RR XY is bounded by a function of RR ZX is not possible.The reason is that no matter how large RR XY is, RR ZX can be arbitrarily close to 1.
Another way to formulate Theorem 1 is RR XY ≤ min (RR XZ , RR ZY ).A natural question is to ask how the maximum function behaves, and the answer is that it is the upper bound of the so-called E-value RR XY + √ RR XY (RR XY − 1) ≤ max (RR XZ , RR ZY ) (5).That is, under Assumption 1 not only do both RR XZ and RR ZY have to be at least RR XY ; in addition, at least one of them has to be Sensitivity Analysis on Odds Ratios 1883 even bigger.A second natural question is to ask whether the minimum function is optimal, and the answer is negative.Ding and VanderWeele (6) have shown that the sharp bound of RR XY using both variables RR XZ and RR ZY is Another popular measure of the strength of association between binary variables is the odds ratio (OR): OR XY = P (Y|X) P Y|X P Y|X P Y|X .
The odds ratio arises naturally in the context of logistic regression or case-control design.In some situations where the outcome is rare, the risk ratio and the odds ratio are very close, and a square root conversion has been suggested for other situations (7).To do a precise conversion, however, one needs some additional parameter, such as the prevalence among the nonexposed.The additional parameter can introduce additional uncertainty, in underdiagnosed disease in particular.It would therefore be preferable to use results explicitly designed for the odds ratio rather than converting to risk ratios and using Cornfield's inequalities or inequality (2).
In this work, we derive an odds ratio analogue of the classical Cornfield inequalities using the mediant inequality dating back to ancient Alexandria.We also compile a complete collection of 2-variable bounds analogous to inequality (2), where some of the 3 risk ratios are odds ratios instead.One such bound by Lee (8) already exists in the literature.Many combinations of risk and odds ratios do not admit a bound, and there is also symmetry in play, so we present 4 original results (2 of which are almost trivial corollaries).

Bounding with a single ratio
The proof of Theorem 1 will serve as a template for original results.
By the law of total probability and Assumption 1, ( Cornfield calls the first inequality of Theorem 1 "rather obvious" and derives it algebraically (1, Appendix A) from Ding and VanderWeele (3).We wish to give this classical inequality a classical proof.
Special cases of the mediant inequality were already known by Plato and Euclid, but the first known formal proof is given by Pappus of Alexandria (9,10).The validity of the mediant inequality is easy to see geometrically: The slope of the sum vector is between the slopes of the summands.Applying the mediant inequality to equation 3

Bounding with 2 ratios
We set out to complete the works of Ding and Vander-Weele (6) and Lee (8) with an exhaustive collection of formulae bounding a risk or odds ratio between X and Y with any pair of riskco or odds ratios between X and Z and between Z and Y.Not all combinations admit a bound, however.

Bounding a risk ratio
It is not possible to bound RR XY with RR ZX or RR YZ , so we only have 4 combinations, 2 of which are existing results (Theorems 4 and 6) and 2 their immediate corollaries (Theorems 5 and 7).The bounds are sharp.Theorem 4. By Ding and VanderWeele (6).Under Assumption 1, Theorem 5.Under Assumption 1, Proof.Define and differentiate The claim follows from RR XZ > 1, Theorem 4, and RR ZY < OR ZY .Theorem 6.By Lee (8).Under Assumption 1, Theorem 7.Under Assumption 1, Proof.Define and differentiate The claim follows from √ OR XZ > 1, Theorem 6, and √ RR ZY < √ OR ZY .

Bounding an odds ratio
It is not possible to bound OR XY with risk ratios between X and Z and between Z and Y. Theorem 8 uses a familiar function of OR XZ and OR ZY .Theorem 9 provides a bound as a function of RR XZ and OR ZY , and by the symmetry of the odds ratio can also be used for OR XZ and RR YZ .For the combination of OR XZ and RR ZY , and by symmetry RR ZX and OR ZY , the bound of Theorem 3 is already sharp.Theorem 8.Under Assumption 1, Proof.Denote P Z|X = r and P Y|Z = s, which implies Plugging this into equation 4 gives . By the chain rule, Because g s (r, s) = (1 − r) (OR ZY − 1) > 0, the partial derivative f s (r, s) is zero on the curve g (r, s) = √ OR XZ OR ZY r, where f (r, s) is actually a constant: Theorem 9.Under Assumption 1, Proof.Denote P Z|X = r and P Y|Z = s, which implies According to equation 4 we can write where It follows that f r (r, s) > 0 and so we set r = 1/RR XZ .To maximize over s we differentiate

APPLICATION
As an illustrative example we use a systematic review by Owen et al. (11) on the effect of breastfeeding on type 2 diabetes; see also Ip et al. (12).Owen et al. report an aggregate odds ratio of 0.61 (95% confidence integral: 0.44, 0.85, P value = 0.003) between ever breastfed vs. exclusively formula fed and later type 2 diabetes of the child.The metaanalysis comprises 7 individual studies on Europeans and Americans, one of which has a case-control design, with others based on a cohort.Because type 2 diabetes is common and underdiagnosed, converting odds ratios into risk ratios introduces complications for both cohort and casecontrol designs, so basing sensitivity analysis on Cornfield's inequalities 1 or Theorems 4-7 is not advised.Instead we can use Theorems 3, 8, and 9.
To satisfy Assumption 1, we switch the order of the 2 feeding groups so that the aggregate odds ratio becomes 1/0.61 = 1.63.We then ask: Can some third unmeasured binary variable Z fully explain the observed association between breastfeeding and the child's type 2 diabetes status?According to Theorem 3, the association between Z and breastfeeding and the association between Z and child's type 2 diabetes would both need to have an odds ratio of at least 1.61.Moreover, if one of the associations is weak, the other has to compensate by being strong.Say Z is type 2 diabetes of the mother (diagnosed before or after the childbirth).According to Meigs et al. (13), the association between the mother's and the child's type 2 diabetes has an odds ratio of 3.4.By solving for RR XZ using Theorem 9 with OR XY = 1.61,OR ZY = 3.4 we see that the risk of type 2 diabetes among mothers who do not lactate would have to be at least 1.35 . . .times the risk among mothers who do not.Diabetics and future diabetics can have on average more trouble lactating, and lactating can have a protective effect against diabetes.A recent metaanalysis by Pinho-Gomes et al. (14) estimates that never vs. ever lactation is associated with maternal type 2 diabetes with risk ratio at 1/0.73 = 1.36 . . . .(We should point out that there is a slight discrepancy because a woman can have several children and only breastfeed some of them.)A noncausal association between breastfeeding and a child's type 2 diabetes, entirely confounded by maternal type 2 diabetes, looks plausible at the moment.
However, from the proof of Theorem 9, we can see that the bound is sharpest when P Z|X , the prevalence of type 2 diabetes among women who ever lactated, is large (close to 0.73).We can reasonably argue that it should not be more than the prevalence among men, which should be safe to bound under 30%.Thus, the association reported by Pinho-Gomes et al. (14) has an odds ratio no more than 1.63, where we have used the conversion formula we see that the odds ratio would have to be at least 5.7 if confounding by maternal type 2 diabetes were to fully explain the observed breastfeeding-diabetes association.Indeed, 3 of the 7 studies accounted in the meta-analysis by Owen et al. (11) used a number of relevant covariables, including parental type 2 diabetes, and when restricting to those 3 studies the aggregate odds ratio was similar (1/0.55 = 1.81) with and without covariates.
Y|X) P Y|X = P (Y|Z) P (Z|X) + P Y|Z P Z|X P (Y|Z) P Z|X + P Y|Z P Z|X .

OR XZ = 1 −
P Z|X RR XZ 1 − P Z|X RR XZ and the crude estimate P Z|X ≤ 0.3.If we plug OR XY = 1.61,OR ZY = 3.4 into Theorem 8 and solve for OR XZ , immediately gives RR XY ≤ max RR XZ , RR XZ = RR XZ .XY ≤ max RR XZ , RR XZ ×max RR XZ , RR XZ = OR XZ .
XY = P (Y|X) P Y|X P Y|X P Y|X = P (Y|Z) P (Z|X) + P Y|Z P Z|X P (Y|Z) P Z|X + P Y|Z P Z|X × P Y|Z P Z|X + P Y|Z P Z|X P Y|Z P (Z|X) + P Y|Z P Z|X .(4) Using the mediant inequality (Lemma 2) to both fractions yields OR P (Y|Z) P (Z|X) + P Y|Z P Z|X P Y|Z P (Z|X) + P Y|Z P Z|X × P Y|Z P Z|X + P Y|Z P Z|X P (Y|Z) P Z|X + P Y|Z P Z|X .XY ≤ OR XZ , OR XY ≤ OR ZY .