Commentary: The formal approach to quantitative causal inference in epidemiology: misguided or misrepresented?

Two

Suppose we are interested in the potential causal relationship between urinary tract infections (UTIs) in the 2nd and 3rd trimesters of pregnancy and the subsequent low birthweight (LBW) of newborn babies, and that existing epidemiological evidence is suggestive of a clinically meaningful causal e↵ect but not definitively so. Thus, a large prospective cohort study is designed, with laboratory assessment of the exposure in the second and third trimester of pregnancy.
In this appendix, we will outline how some of the key concepts from the FACE might be applied to the analysis of this cohort study and the interpretation of the results. We are not aiming to cover all possible concerns here; of course in any given real example there will always be specific additional features and complexities to be considered. Rather, we aim to highlight the thinking and the assumptions as they relate to a particular simple example, and above all to outline how the linking of the target causal estimand and the data can be achieved via these assumptions in a few simple mathematical steps. We recommend that those wanting to read about more realistic examples, with numerous additional complexities, turn to the many substantive papers cited in the main manuscript, e.g. [10][11][12][13]43].
As with any epidemiological investigation, we would start (at the point of designing the study, more than when starting the analysis) with establishing what exactly is the question of interest. This involves asking: A. What is the population of interest? All pregnant women living in the particular region/country to be studied? Perhaps non-singleton births should be excluded? And so on.
B. What exactly is the exposure of interest? Are all subtypes of UTIs to be included? What about those experiencing more than one UTI during pregnancy? What about the timing of acquiring the UTI(s)? etc.
C. How will the outcome be classified in pre-term births? etc.
D. Do we simply want to quantify the magnitude of the e↵ect of maternal UTIs on the prevalence of LBW in the population of interest? Or are we interested in particular subgroups, e.g. particular racial groups, or mothers with gestational diabetes? Are we interested in investigating possible synergistic e↵ects of other infections? And so on.
Let us assume for simplicity that the population of interest is all full-term singleton live-born babies from a particular region in a particular period, and that the exposure is simplified to ever having experienced a UTI in the second and/or third trimester of pregnancy versus never in these two trimesters of pregnancy. Let X be the binary exposure (maternal UTI, ever, coded 1, or never, coded 0), Y the binary outcome (LBW newborn, yes or no) and C = C 1 , . . . , C p a set of potential confounders, e.g. maternal age, comorbidities (such as diabetes), other maternal infections and socio-economic factors.
The following features would characterise a FACE approach to this example:

Potential Outcomes
Since causal questions involve not just features of the data at hand, but also a notion of how things would have been had something been di↵erent, then expressing causal questions mathematically requires an extension to the traditional statistical language of expectations, variances, probabilities, odds, etc. The extension most commonly used in the FACE is that of potential outcomes.
For each infant in the population of interest, let Y (0) denote the outcome that would have been observed for that infant had the mother, possibly counter to fact, not been exposed during pregnancy, and let Y (1) denote the outcome that would have been observed had the mother been exposed.

Possible Estimands
A population-level causal e↵ect is then expressed as a contrast between the distributions of Y (0) and Y (1). For example, a marginal causal risk di↵erence is: } where expectations are taken over the population of interest. This estimand is interpreted as the di↵erence in prevelance of LBW if, hypothetically, all pregnant women would be exposed, versus if no pregnant woman were exposed.
If we were interested in e↵ect modification by maternal age, then we could compare the following conditional risk di↵erence: for di↵erent values of a, where A is the age of the mother.
Suppose a di↵erent type of maternal infection, MIB, together with UTI, were believed to have a possible synergistic e↵ect on LBW, then we would redefine our potential outcomes to include MIB too. Y (0, 0), Y (1, 0), Y (0, 1) and Y (1, 1) could be used to denote, respectively, the potential value that Y would take were both infections to be absent, only UTI present, only MIB present, or both exposures present. A synergistic e↵ect of the two exposures would be present if: if the e↵ect of UTI is more pronounced when combined with MIB.
The estimand can also be chosen so as to reflect an interest in the excess prevalence of LBW that is attributed to UTIs, i.e.
The possibilities are endless, but the resulting considerations are broadly similar, and so, for simplicity, in what follows, we assume that it is this final estimand that is of interest, i.e. the di↵erence between the actual prevalence of LBW and the hypothetical prevalence of LBW that would be seen if UTIs could be prevented for all pregnant woman in the relevant population of interest.

Specificity
As we discuss in the main manuscript, for this estimand to be well-understood, the potential outcome involved in it should ideally be close to being well-specified. This is not too problematic with an exposure such as infection. The hypothetical world in which no pregnant woman is infected could be one in which a hypothetical preventative treatment exists, with no (good or bad) side-e↵ects, and is given to all pregnant women. It may also be useful to add that this hypothetical treatment be free for all pregnant women, so that in the hypothetical world in which all pregnant women are vaccinated, there are no knock-on e↵ects on, say, their diet, due to having re-allocated resources towards acquiring the treatment.
For our chosen estimand, su ciently well-specified Y (0) would su ce. However, had we chosen an alternative estimand, we would also need to consider the meaning of Y (1), and this could perhaps be a little trickier in this setting, since there could be many di↵erent reasons for acquiring a UTI present in the observed data. For simplicity, suppose there are two reasons: (1) high glucose level (e.g. due to poorly-controlled or undiagnosed diabetes), (2) incomplete voiding of the bladder. It may be plausible (from subject-matter knowledge) that the e↵ect of maternal UTI on infant birthweight is the same regardless of the reason for its contraction, in which case the intervention (contract UTI by some means) would be su ciently well-specified. However, if this were not the case, for example, if the UTIs acquired due to reason (1) were more severe than those acquired due to reason (2), then (for reasons that will become apparent below) the hypothetical intervention that we should imagine as giving rise to Y (1) is a compound intervention (see [41]) in which the infection is contracted for reason (1) with probability ⇡ and for reason (2) with probability 1 ⇡, where ⇡ is the proportion in the observed data who actually contracted the infection due to reason (1). To be as specific as possible, it would be good to try to infer ⇡ from the data. If this is not possible, then the ambiguity should be made explicit.

2.
Assumptions: Depending on the analytic approach to be taken, di↵erent assumptions may be invoked. For example, instrumental variable analyses rely on a di↵erent set of assumptions than analyses based on confounder adjustment. Suppose, for example, that we will proceed by confounder adjustment, then the assumptions under which the causal estimand (1) above can be identified (i.e. re-written as functions of the distribution of the observed data) are: (a) The consistency assumption, which for this example states that: In words, this says that for infants whose mothers were in reality not exposed, their outcome in reality is equal to what it would have been in the hypothetical world in which no mother was exposed. For this assumption to hold, the observed data need to be relevant for the hypothetical intervention we have in mind that gives rise to Y (0). Or, in other words, the hypothetical intervention we have in mind should be non-invasive in the sense described in the main manuscript: the hypothetical intervention, when applied to prevent UTIs, should not change the outcome for pregnant women who would in reality not have been exposed, from the outcome that was in fact observed. Recall the caveats above that the hypothetical treatment should have no (good or bad) side e↵ects and should be free. These caveats help to make the consistency assumption more plausible; they precisely concern the non-invasiveness of the hypothetical intervention.
(b) The conditional exchangeability assumption, which for this example states that: Y (0) ? ? X| C. This states that, having adjusted for the observed set of potential confounders C, the exposure and the potential outcome Y (0) should be independent. In this example, Y (0) denotes whether or not an infant would have been LBW had his/her mother, possibly counter to fact, not been exposed. It can therefore be viewed as a relevant summary of all the other determinants of LBW (observed or unobserved) apart from UTIs. If the exposure were to be associated with these other determinants, even after conditioning on all the measured confounders, this would mean that there is residual confounding, i.e. even after allowing for all the measured confounders, if UTIs are, say, more common amongst those with a higher risk of LBW even in the absence of UTIs, then adjusting for the measured confounders will not lead us to a causally-interpretable estimate of (1). Causal diagrams are often advocated as a tool to help us decide how plausible the conditional exchangeability assumption is, based on existing knowledge (or plausible conjectures) regarding the causal structure of X, Y and C. The more complex the situation (e.g. when variables contained in C are not simply common causes of X and Y ) the more useful the causal diagrams are. (c) The positivity assumption. For the case of our estimand, the relevant version is that, if all confounders C are discrete: For all c such that P {C = c} > 0, P (X = 0 |C = c) > 0. In words, this says that, for a su ciently large sample size, there should be unexposed mothers observed at every observed value of the confounders. For continuous confounders, a similar condition can be expressed using densities.
3. Identification: Under the assumptions above, we can re-write our casual estimand of interest (1) in terms of aspects of the joint distribution of the observed data. It is useful to follow these mathematical steps, to appreciate why we need the assumptions above. First we re-write our estimand using iterated expectations as: Then, under the conditional exchangeability assumption, Y (0) is independent of X given C, which gives us the licence to insert X = 0 on the right-hand side of the conditioning line in the second term, which means that our estimand is re-written as: Then, under the consistency assumption, we can replace Y (0) by Y on the left-hand side of the conditioning line in the same term, giving us: 6