Abstract

Marginal structural models (MSMs) are causal models designed to adjust for time-dependent confounding in observational studies of time-varying treatments. MSMs are powerful tools for assessing causality with complicated, longitudinal data sets but have not been widely used by practitioners. The objective of this paper is to illustrate the fitting of an MSM for the causal effect of iron supplement use during pregnancy (time-varying treatment) on odds of anemia at delivery in the presence of time-dependent confounding. Data from pregnant women enrolled in the Iron Supplementation Study (Raleigh, North Carolina, 1997–1999) were used. The authors highlight complexities of MSMs and key issues epidemiologists should recognize before and while undertaking an analysis with these methods and show how such methods can be readily interpreted in existing software packages, including SAS and Stata. The authors emphasize that if a data set with rich information on confounders is available, MSMs can be used straightforwardly to make robust inferences about causal effects of time-dependent treatments/exposures in epidemiologic research.

Received for publication May 14, 2003; accepted for publication November 18, 2003.

In observational studies, estimation of the causal effect of an exposure on an outcome may be biased because of confounding, where covariates associated with treatment may also be associated with potential response, so that observed response differences cannot be attributed directly to exposure. Proper estimation of causal effects must account for confounding. In a point-exposure study, this is traditionally done by modeling the probability of disease as a function of exposure and pretreatment covariates. However, with a time-varying exposure, these traditional methods may be biased if time-varying covariates are simultaneously confounders and intermediates—that is, covariates are predictors of outcome and also predict subsequent exposure, and past exposure history predicts resulting covariate level (1). Such covariates are called time-dependent confounders (1), and they pose unique analytical challenges requiring specialized methods.

Marginal structural models (MSMs), developed by Robins et al. (14), allow proper adjustment for time-dependent confounding. Although MSMs are relatively simple to implement, they have been used almost exclusively by methodologists (510), not practicing epidemiologists. In this paper, we describe the application of MSMs to estimation of the causal effect of iron supplementation during pregnancy, a time-varying exposure, on odds of anemia at delivery. Compliance with use of iron supplements does not cause anemia, so this example is a good test case for causal methods. In this example, subjects and clinicians made decisions to change iron treatment over time based on subjects’ attributes, including hemoglobin and serum ferritin concentrations (blood markers of iron status) and treatment side effects. Thus, these time-dependent covariates not only are independent predictors of anemia risk but also predict subsequent iron treatment (i.e., they are confounders). These covariates are also affected by prior iron treatment (i.e., they are intermediates). Without proper adjustment, iron supplementation will probably appear harmful rather than protective. Therefore, comparing adjustment methods yields useful insights into the advantages of MSMs. In this paper, we compare results obtained using MSMs with those obtained using ordinary logistic regression, discuss the strengths and weaknesses of MSMs, and highlight key issues epidemiologists should recognize before and while undertaking such an analysis.

MATERIALS AND METHODS

To estimate the causal effect of iron supplementation on anemia at delivery, we used data from the Iron Supplementation Study, a randomized trial of prenatal iron supplementation (11). Women were recruited from a public prenatal clinic in Raleigh, North Carolina, that serves a mostly low-socioeconomic-status population (1997–1999). Women who were less than 20 weeks’ pregnant at the initial visit (median gestational age, 12.0 weeks) were recruited in 1997–1999 (n = 867). Baseline hemoglobin and serum ferritin concentrations were used to randomize women into groups receiving a daily prenatal vitamin containing 0, 30, or 60 mg of iron. Women with a serum ferritin level less than or equal to 40 µg/liter and a hemoglobin level greater than 90 g/liter were randomized to receive prenatal supplements containing 30 or 60 mg of iron. Women with a serum ferritin level greater than 40 µg/liter and a hemoglobin level greater than 110 g/liter were randomized to receive supplements containing 0 or 30 g of iron. Women were treated from baseline to 24–29 weeks’ gestation. Women were asked to return their study pill bottles and to complete questionnaires regarding compliance and side effects. Pharmacy data were also collected on the dispensing of iron-containing supplements.

A midgestation assessment of hemoglobin and serum ferritin concentrations was made during a visit in the interval of 24–29 weeks’ gestation, after which the standard clinic protocol was implemented. Typically, women with good iron status (the definition was left up to the clinician’s discretion) received a supplement containing 30 mg of iron in a standard prenatal vitamin through delivery, while women with poorer iron status were prescribed higher-dose supplements (60–325 mg/day). Supplement use from 24–29 weeks’ gestation to delivery was assessed with abstracted pharmacy data.

After delivery, medical records were abstracted for ascertainment of hemoglobin concentration no more than 7 days before delivery, sociodemographic data, and pregnancy events. Prenatal iron status was rarely measured at visits other than enrollment, 24–29 weeks, and delivery. Maternal anemia at delivery was defined by gestational-age-specific hemoglobin concentration cutpoints after adjustment of hemoglobin level for smoking (12).

Treatment

From enrollment to 24–29 weeks’ gestation, 41 percent of subjects did not pick up their assigned supplements, and 85 percent of women who returned pill bottles had less-than-perfect adherence. Thus, cumulative dose of iron was estimated using a combination of pill-count and pharmacy data. We estimated the percentage of compliance on the basis of pill-count data, since a validation study found these data to be more accurate than questionnaires in assessing compliance. For each woman, percentage of compliance was estimated as (32 pills per bottle – number of pills remaining)/(number of days between dispensing and refill) × 100. However, only 31 percent of the women returned their pill bottles, so we estimated pill counts for the remaining women using data on pharmacy prescription pickups, since these data were available for all of the women and were moderately correlated with pill-count compliance (r = 0.5). Percentage of compliance using pharmacy data was estimated as (number of pills dispensed/number of days between dispensing and refill) × 100. We used simple regression imputation with percentage of pharmacy compliance as the main independent variable to predict pill-count compliance for women who did not return pill bottles (13).

For each supplement, we calculated the therapeutic dose of iron as (mg/day in the supplement × days between dispensing and refill × percent compliance) and summed these doses to estimate cumulative dose between enrollment and 24–29 weeks and between 24–29 weeks and delivery. Finally, we categorized cumulative dose in each time period into four groups reflecting iron intake relative to the 30-mg/day dose recommended for nonanemic pregnant women (12), as follows. For the period from the initial visit to 24–29 weeks (hereafter called time period 0), 30 mg/day corresponded to a cumulative dose of 2.0–3.5 g, since most women had the opportunity for 10–16 weeks of treatment. Therefore, we identified four groups, (0, 1, 2, 3), which corresponded to iron intakes of 0 g, 0.1–1.9 g (less than the prescribed amount), 2.0–3.5 g (approximately the prescribed amount), and >3.5 g (more than the prescribed amount), respectively. For the period from 24–29 weeks to delivery (hereafter called time period 1), most women had 10–14 weeks of supplementation, so the four groups (0, 1, 2, 3) corresponded to 0 g, 0.1–1.9 g (less than the prescribed amount), 2.0–3.0 g (approximately the prescribed amount), and >3.0 g (more than the prescribed amount), respectively.

Our objective was to estimate the causal effect of all 16 combinations, or regimes, of iron treatment throughout pregnancy on the odds of anemia at delivery, while adjusting for time-dependent and -independent confounders.

Potential confounders

Potentially confounding maternal characteristics were maternal ethnicity/race (non-Hispanic White, non-Hispanic Black, or other); education (≤12 years vs. >12 years); age (<20, 20–29, or ≥30 years); parity (1, 2, or ≥3); marital status (married vs. unmarried); anemia at baseline (dichotomous variable); serum ferritin concentration at baseline (continuous variable); gestational age at entry (continuous variable); smoking status (smoker vs. nonsmoker); gestational age at delivery (continuous variable); and prepregnancy body mass index (weight (kg)/height (m)2; <19.8, 19.8–26.0, 26.1–29.0, or >29.0 (14)). Symptoms that may interfere with supplement use (stomach discomfort, nausea, vomiting, gas, constipation, diarrhea, appetite loss, or cramps in the past month; dichotomous variable) were self-reported on the questionnaire.

After 24–29 weeks’ gestation, data were available on the presence of severe nausea and vomiting from the medical record, serum ferritin concentration at 24–29 weeks (continuous variable), and hemoglobin concentration at 24–29 weeks (<100, 100–109, or ≥110 g/liter).

Because collection of data on covariates was not central to the study’s goal, certain women were missing some of these data. Since our objective was to provide a pedagogic demonstration of MSMs, we used data from 426 women with all measured covariates required for fitting the models described in the equations presented below. Forty-three women were censored between baseline and 24–29 weeks, leaving 383 with full exposure data at 24–29 weeks. Of these, 81, 138, 135, and 29 women fell into treatment groups 0, 1, 2, and 3, respectively. Fifty-five women were censored between 24–29 weeks and delivery, leaving 328 with full exposure data through delivery. Of these, 40, 139, 97, and 52 women fell into treatment groups 0, 1, 2, and 3, respectively. Lastly, there were 234 uncensored women with complete exposure, covariate, and outcome data, 119 of whom were anemic at delivery. Censoring was taken into account according to techniques described by Robins et al. (1) and discussed below.

Causal inference and MSMs

Ideally, for estimation of the causal effect of supplementation on anemia, a large sample of women should be randomized to different treatment regimes at enrollment, with perfect adherence ensured and no censoring; here, the assumption of no confounding would be reasonable. Although, in our study, women were randomized to treatment initially, noncompliance was common, and decisions were made in each period to modify or terminate treatment that may be associated with both subject attributes and previous treatment. Side effects from a high iron dose may have caused a health-care provider to lower the iron dose during the second period. Poor iron status from an initially low iron dose may have caused a provider to increase the iron dose during the second period. Thus, time-dependent confounding is to be expected in this situation, and estimation of causal effects based on the observed exposures is likely to yield biased estimates.

Borrowing the notation of Robins et al. (1), let A0 equal observed cumulative iron dose from baseline (time 0 = treatment initiation) to 24–29 weeks, and let L0 denote a vector of measured confounders that may be associated with A0 (ethnicity/race, baseline hemoglobin level, etc.). Let A1 equal observed cumulative iron dose from 24–29 weeks to delivery (time 1 = 24–29 weeks’ gestation), and let L1 denote measured confounders possibly associated with A1 (e.g., 24- to 29-week hemoglobin level measured before treatment modification at time 1). A0 and A1 have four possible values, (0, 1, 2, 3). Because we use a cumulative treatment to account for noncompliance with assigned treatment, our notation differs slightly from that of Robins et al. (1). However, we maintain their notation here for consistency. Let C0 indicate loss to follow-up (censoring) before 24–29 weeks (C0 = 1 if censored and 0 otherwise). Because cumulative dose from entry to 24–29 weeks may be determined only if a woman is not censored before 24–29 weeks, A0 is not always observed for these women. Similarly, let C1 indicate censoring before delivery for a woman whose A0 and L1 are observed but A1 is not necessarily determined (C1 = 1 if censored and 0 otherwise). Women who had therapeutic or spontaneous abortions were considered censored, whereas women with stillbirths were uncensored if they had undergone hemoglobin measurement before delivery. It is possible that there are vectors of unmeasured confounders at times 0 and 1; denote these by U0 and U1, respectively. Figure 1 shows a possible directed acyclic graph (causal diagram) (1517) that explicates the temporal ordering of all variables, assuming no unmeasured confounding.

Let graphic = (A0, A1), graphic = (C0, C1), and graphic = (L0, L1) denote the observed “histories” of treatment, censoring, and measured confounders, respectively. Let graphic = (a0, a1) be a possible value of graphic ; there are 16 such graphic ’s representing all possible combinations of dose categories for the two time periods. Let graphic denote the potential binary response (indicator of anemia at delivery) we would see if a subject were to follow history graphic . In the study, we see only the observed response Y = graphic if the subject actually followed graphic and was not censored ( graphic = graphic and graphic = (0, 0)). There are 16 possible graphic ’s for each subject, called “counterfactuals” because only one can be observed, with the rest “counter to the facts.” If P denotes probability, then P( graphic = 1) is the probability of anemia at delivery if the entire population were to follow graphic and not be censored. If these probabilities vary for different values of graphic , then the difference must be “caused” by treatment combination; hence, comparing these probabilities or associated odds for different graphic ’s addresses a valid causal question.

Our goal is to estimate the probabilities P( graphic = 1) for all 16 possible graphic ’s to compare treatment regimes and estimate causal effects. In an ideal randomized study (described above) with women assigned to the 16 combinations, this would be possible. However, with our sample, we cannot estimate these probabilities directly; we can only estimate the probability of having anemia at delivery (Y = 1) among those with observed treatment histories graphic = graphic , denoted P(Y = 1| graphic = graphic ).

An MSM is a model for P( graphic = 1). If graphic contains all confounders for treatment and outcome (i.e., no unmeasured confounders exist), it is possible to fit this model unbiasedly (1). This is accomplished by fitting a model to the observed data (i.e., P(Y = 1| graphic = graphic )) but weighting the contribution from each uncensored subject by the inverse of, approximately, the joint probability of having her treatment and censoring histories as a function of her covariate history (1). The weighting creates a “pseudopopulation” in which confounders are no longer associated with treatment (see the paper by Robins et al. (1) for a simple example of inverse weighting). Of course, we cannot determine from the data whether there are unmeasured confounders; thus, as in any causal analysis, the method is predicated on the assumption that there are none.

To demonstrate, we propose the MSM

graphic

Define logit P = log(P/1 – P). Here, the logit of the probability of anemia at delivery depends on graphic through indicator variables corresponding to the treatment combination involved in graphic ; (dose10, dose20, dose30) are indicators for the treatment categories (1, 2, 3) during time period 0, and (dose11, dose21, dose31) are defined similarly for time period 1. For simplicity, we consider a model that includes only main effects for treatments in each time period, though more complex models are possible. Thus, in model 1, β10, β11, etc., have a causal interpretation: graphicis the causal anemia odds ratio that would result if all women received 0.1–1.9 g of iron during time period 0 and 0 g during time period 1 relative to the reference group of all women receiving no iron during either period; graphic is the odds ratio that would result if the population of women received 0.1–0.9 g of iron during each period relative to the reference group; etc. While we cannot fit model 1 directly, we can fit

graphic

Models 1 and 2 are different; model 1 is a model for the population’s probability of anemia if everyone received graphic , relevant for causal inference, while model 2 describes the probability of anemia for those observed to have history graphic . Thus, graphic , etc., in general. From the paper by Robins et al. (1), if L0 and L1 contain all confounders associated with subsequent treatment, censoring, and potential response, we can unbiasedly estimate graphic in model 1 by estimating graphic in model 2 using the inverse probability weighting. The model is fitted to all women whose response Y is observed (with graphic = (0, 0) and complete data on graphic (n = 234)), where the weights are estimated on the basis of all women (n = 426) as discussed below. Because such weighting can lead to unstable fitting, a “stabilized” modification of the weights is recommended (1), the construction of which we illustrate below. (Doubly robust estimators weighted by the inverse probability of treatment have also been developed to significantly decrease the influence of extreme weights (18) but will not be presented here.)

Calculating weights

The stabilized weight for the ith woman with an observed response is formed by the product of two factors, one accounting for treatment history and the other for censoring. If the ith woman is observed to have A0 = a0i, A1 = a1i, L0 = l0i, and L1 = l1i, then her treatment history weight is

swi = {P(A0 = a0i|C0 = 0) × P(A1 = a1i|A0 = a0i, C1 = 0, C0 = 0)}/ {P(A0 = a0i|L0 = l0i, C0 = 0) × P(A1 = a1i|A0 = a0i, L0 = l0i, L1 = l1i, C1 = 0, C0 = 0)}.

Each component of model 3 represents a probability for observed data. For instance, P(A1 = a1i|A0 = a0i, L0 = l0i, L1 = l1i, C1 = 0, C0 = 0) is the probability of receiving a1ifor subjects observed to receive a0i having measured confounders l0i and l1i who are not censored. The denominator is approximately the probability of having the subject’s observed treatment and confounder history, given that she was not lost to follow-up. Her censoring history weight is

swi† = {P(C0 = 0) × P(C1 = 0|A0 = a0i, C0 = 0}/{P(C0 = 0)|L0 = l0i) × P(C1 = 0|A0 = a0i, L0 = l0i, L1 = l1i, C0 = 0)}.

For example, P(C1 = 0|A0 = a0i, L0 = l0i, L1 = l1i, C0 = 0) is the probability of not being censored in time period 1 for subjects observed to receive a0i having measured confounders l0i and l1i who are not censored in time period 0. The denominator is roughly the probability of being uncensored with the given treatment/confounder history. The full stability weight is the product of the predicted probabilities that come from models 3 and 4: msmwti = swi × swi†. The probabilities in the denominators must be positive for all possible values of A0, A1, L0, and L1; that is, it must be possible for a subject to receive all of the available treatments at each time point.

For calculation of msmwti, each probability in models 3 and 4 is modeled and fitted on the basis of the observed data from all women. These fits are then used to assign each uncensored woman a set of predicted probabilities, which may be substituted in models 3 and 4 to obtain her estimated swi and swi†.

We estimated P(A0 = a0i|C0 = 0) in model 3 by the simple frequency for each possible a0i value, (0, 1, 2, 3), from women among the 426 who were not censored in the first time period; the predicted probability for each uncensored woman was the frequency corresponding to her observed value a0i. We modeled the remaining probabilities by ordinal logistic regression models:

graphic ;

graphic ;

graphic ,

where (dose10, dose20, dose30) have values corresponding to the value of a0 and where l0 and l1 are vectors of measured confounders. Models including interaction terms could also be used. For models 5 and 7, the fit was based on uncensored subjects (i.e., C0 = 0, C1 = 0). For model 6, the fit was based on all women not censored in the first time period (i.e., C0 = 0). For each woman, we then substituted her treatment and confounder values a0i, l0i, and l1i in the right-hand sides of models 5–7 and obtained her predicted probabilities by taking j = a1iin models 5 and 7 and k = a0i in model 6. From model 4, we estimated P(C0 = 0) by the simple frequency of women uncensored in the first period among the 426 and similarly obtained predicted probabilities to substitute in model 4 from fits of the following logistic regression models to the observed data on censoring:

graphic

logit P(C0 = 0|L0 = l0) = α0 + α1l0;

graphic

In hopes of satisfying the assumption of no unmeasured confounders, for models 5–10 we included the full list of potential confounders described above for l0 and l1; no attempt was made to simplify these models. Robins has argued that such a conservative approach may be preferable to the risk of mistakenly eliminating covariates that are true confounders (J. M. Robins, Harvard University, personal communication, 2002).

From these fits, we calculated for each woman the full stability weight msmwti, estimated the coefficients in model 1 by fitting model 2 using msmwti to weight each observation using SAS PROC GENMOD (19) and Stata (20) (see Appendix), and obtained confidence intervals using “robust” methods (1). The software treats the weights as fixed instead of estimated, which leads to conservative intervals guaranteed to have at least a 95 percent coverage probability.

For comparison with the inferences obtained from fitting the MSM in model 1, we fitted two additional models directly to the observed data on the 234 uncensored women. First, we fitted a model for the probability of anemia at delivery as a function of treatment history—for example, a crude ordinary logistic regression model (i.e., model 2 without inverse weighting), where we would expect iron to appear harmful, since we have not properly adjusted for confounding. This would be the correct model if all women had been randomized to one of the 16 treatment regimes at baseline and had not deviated from their assigned treatment. Second, we fitted a model for the probability of anemia at delivery as a function of treatment and covariate history—for example, an ordinary logistic regression model adjusting for confounder history by inclusion of all covariates in the model as regressors:

graphic

Although including L0 in this model would adequately control for confounding by pretreatment variables, addition of the L1 variables causes biased estimates (1). We fitted model 11 to show the effect that “traditional” (but incorrect) adjustment has on inference.

RESULTS

Estimates of the parameters in models 1 and 2 without weighting and model 11 are shown in table 1. Table 2 shows this information in terms of odds ratios associated with each treatment combination relative to graphic = (0, 0) based on the main effects shown in table 1, which some readers may find easier to interpret. The MSM coefficients for nearly all parameters were negative (table 1), and correspondingly, the odds ratios indicate that in comparison with no iron treatment throughout pregnancy, almost all treatment combinations were highly protective against anemia at delivery (table 2). If we had conducted the definitive substantive analysis, these data would suggest, for example, that after appropriate adjustment for confounding by time-dependent and -independent covariates in the MSM, 2–3.5 g in time period 0 and 0.1–1.9 g in time period 1 caused a reduction in the odds of anemia by 93 percent in comparison with no iron in either period. In comparison, the adjusted ordinary logistic regression model suggested that this treatment is associated with a 4.3-fold increase in the odds of anemia in comparison with no prenatal iron treatment. Inverse weighting of the MSM reversed nearly all of the apparent adverse affects of iron seen in the crude ordinary logistic regression model.

The MSM findings were intuitive except for treatments of >3.0 g after 24–29 weeks, which appeared to increase the odds of anemia at delivery, especially for low doses before 24 weeks. These counterintuitive findings result from limitations in our data set. Exposure misclassification was probably a major source of bias. Women who took little or no iron before 24–29 weeks had low hemoglobin concentrations at 24–29 weeks, were at high risk of anemia at delivery, and consequently were prescribed high doses of iron after 24–29 weeks. We hypothesize that these high doses caused side effects that led to poor compliance, but our reliance on pharmacy data prohibited us from more accurate classification. Additionally, women who returned pill bottles were used for estimation of cumulative dose, and these women were possibly unrepresentative. Few women received treatment combinations involving 0 g at time 1, resulting in imprecise estimates. Potential bias due to model misspecification in our imputation procedure and the inherent uncertainty involved may have led to inaccurate doses for some women. Furthermore, we lacked data on potential confounders: additional treatment side effects from 24–29 weeks through delivery, income, health beliefs, and genetic predisposition. Sampling variability may also have contributed to our counterintuitive findings. These limitations highlight the reasons why our analysis is not intended for substantive interpretation.

An important consideration for MSMs is the effect of possibly “influential” individuals, since data from persons with very large msmwti may play a major role in dictating results. Because the weights involve probabilities, they are positive and thus are likely to have skewed distributions, so some fraction of influential subjects is expected. Mean values, median values, and standard deviations for estimated swi, swi, and msmwti were (4.75, 0.92, 21.08), (1.09, 0.97, 0.39), and (4.76, 0.93, 19.56), respectively, reflecting the skewness of swi and msmwti. However, it is important to examine the estimated msmwti to understand whether data from persons with potentially unusual observed covariate histories are driving results. For msmwti, the 95th percentile was 17.99 with a maximum of 225.58. We examined the influence of the 11 women with msmwti in this range in two ways: 1) by deleting these women and refitting models 5–10 and MSM model 1, which potentially involves selection bias, and 2) by maintaining these women but truncating msmwti to equal 20 when fitting the MSM. In both instances, qualitative results were similar to those shown in tables 1 and 2, indicating that, despite large weights, these women did not appear to drive conclusions. Covariate patterns for these women did not seem unusual relative to the rest of the sample.

We included all covariates in the fits of models 5–10, so inclusion of variables that are actually not confounders could have resulted in reduced estimation precision of these models and the MSM. To gain a sense of this, we refitted the MSM using only covariates with p < 0.30 in models 5–10; the results did not meaningfully change. Finally, we refitted the models using alternative specifications for confounders in models 5–10 (i.e., categorized variables that were previously continuous, etc.); the MSM results did not meaningfully change.

DISCUSSION

Our objective in this paper was to give an accessible account of an application of MSMs to a real analytical problem to encourage greater use of this approach among practicing epidemiologists. MSMs handle different types of exposure and outcome data, allow adjustment for selection bias due to censoring, and may be fitted using standard software. Additionally, MSMs can be generalized for assessment of effect modification by L0 variables (1), provided that the investigator is more interested in determining the causal effect of treatment within levels of L0 rather than the entire source population. This information would be useful for clinicians who wish to make treatment decisions based on pretreatment characteristics.

MSMs are limited to the study of nondynamic treatment regimes (regimes that are “fixed in advance”)—for example, treatment with 30 mg of iron per day throughout pregnancy regardless of intervening events before delivery. Thus, MSMs inform us about the difference in the probability of anemia between two fixed regimes a health-care provider could specify at the first prenatal visit. Dynamic regimes, in contrast, allow treatment to change over time based on history (21)—for example, treatment with 30 mg/day from the start of care to 24–29 weeks, then measurement of hemoglobin concentration, and then an increase in the dose to 60 mg/day for the remainder of pregnancy if the hemoglobin level is less than 105 g/liter but maintenance at 30 mg/day otherwise. Dynamic regimes are more consistent with clinical practice, since providers rarely specify fixed regimes at the beginning of treatment. Modeling and causal inference for dynamic regimes can be accomplished with structural nested models or G-estimation (21).

MSMs and other models for causal inference from complex longitudinal data (1, 3, 22) are important, since costs and ethical concerns can rule out the conduct of randomized trials. If observational data with little missingness and sufficiently rich information on confounders for exposure and censoring can be collected, MSMs are an accessible tool for causal inference.

ACKNOWLEDGMENTS

This research was partially funded by cooperative agreements S0454 and S1326 from the Association of Schools of Public Health and the Centers for Disease Control and Prevention; a postdoctoral traineeship from the National Institutes of Health; a Woodrow Wilson Johnson & Johnson dissertation grant in women’s health; and grants R37 AI031789, R01 CA51692, and R01 CA085848 from the National Institutes of Health.

The authors thank the numerous persons who contributed thoughtful comments on early drafts of this paper, most notably Drs. Cande Ananth, Mary Cogswell, Irva Hertz-Piccioto, Jay Kaufman, and Charles Poole.

APPENDIX

The SAS code (version 8.0 (19)) for obtaining estimators weighted by the inverse probability of treatment using stabilized weights is as follows:

proc genmod descending data = work;

class id;

model anemia = dose0_1 dose0_2 dose0_3

dose1_1 dose1_2 dose1_3/ link = logit dist = bin;

weight msmwt;

repeated subject = id/type = ind;

run;

The Stata code (version 7.0 (20)) for obtaining estimators weighted by the inverse probability of treatment using stabilized weights is as follows:

logit anemia dose0_1 dose0_2 dose0_3 dose1_1 dose1_2 dose1_3 [pweight = msmwt], robust cluster(id)

Correspondence to Dr. Lisa Bodnar, Magee-Women’s Research Institute, 204 Craft Avenue, Pittsburgh, PA 15213 (e-mail: bodnar@mwri.magee.edu).

FIGURE 1. Directed acyclic graph (causal diagram) (15–17) for the Iron Supplementation Study (Raleigh, North Carolina, 1997–1999), corresponding to the assumption of no unmeasured confounders given control for measured factors. A0 and A1 represent observed cumulative iron doses for the period from entry into care to 24–29 weeks (time period 0) and the period from 24–29 weeks to delivery (time period 1), respectively. L0 and L1 denote vectors of measured confounders that may be associated with A0 and A1, respectively. C0 and C1 are indicators of loss to follow-up in time periods 0 and 1, respectively. U0 and U1 represent vectors of unmeasured covariates in time periods 0 and 1, respectively. Arrows (directed edges) represent causal effects. The lack of directed edges from U0 and U1 to A0 and A1 and to C0 and C1 reflects the (unverifiable) assumption that, conditional on the measured covariates L0 and L1, there are no unmeasured covariates associated with treatment exposure or loss to follow-up in time periods 0 and 1. If there were directed edges from U0 and U1, these variables would be considered unmeasured confounders. (This directed acyclic graph is the most general graph possible coinciding with the assumption of no unmeasured confounders. In our study, it may be that A1 is caused primarily by factors other than treatment during the first period (A0), for example, compliance behavior. Under this belief, the directed acyclic graph could be simplified by removing the arrow running from A0 to A1.)

FIGURE 1. Directed acyclic graph (causal diagram) (15–17) for the Iron Supplementation Study (Raleigh, North Carolina, 1997–1999), corresponding to the assumption of no unmeasured confounders given control for measured factors. A0 and A1 represent observed cumulative iron doses for the period from entry into care to 24–29 weeks (time period 0) and the period from 24–29 weeks to delivery (time period 1), respectively. L0 and L1 denote vectors of measured confounders that may be associated with A0 and A1, respectively. C0 and C1 are indicators of loss to follow-up in time periods 0 and 1, respectively. U0 and U1 represent vectors of unmeasured covariates in time periods 0 and 1, respectively. Arrows (directed edges) represent causal effects. The lack of directed edges from U0 and U1 to A0 and A1 and to C0 and C1 reflects the (unverifiable) assumption that, conditional on the measured covariates L0 and L1, there are no unmeasured covariates associated with treatment exposure or loss to follow-up in time periods 0 and 1. If there were directed edges from U0 and U1, these variables would be considered unmeasured confounders. (This directed acyclic graph is the most general graph possible coinciding with the assumption of no unmeasured confounders. In our study, it may be that A1 is caused primarily by factors other than treatment during the first period (A0), for example, compliance behavior. Under this belief, the directed acyclic graph could be simplified by removing the arrow running from A0 to A1.)

TABLE 1.

Relations between iron supplement intake in two time periods during pregnancy and anemia at delivery using a marginal structural model, a crude ordinary logistic regression model, and an adjusted ordinary logistic regression model, based on data from the Iron Supplementation Study (n = 234), Raleigh, North Carolina, 1997–1999

Time-varying exposure Marginal structural model  Crude OLR* model  Adjusted OLR model† 
β (SE*) p value  β (SE) p value  β (SE) p value 
Cumulative iron intake (g) in time period 0‡         
0.0§   0.0§   0.0§  
0.1–1.9 –1.13 (0.61) 0.06  –0.76 (0.47) 0.11  –1.19 (0.61) 0.09 
2.0–3.5 –1.75 (0.59) 0.003  –1.07 (0.49) 0.03  –0.94 (0.63) 0.49 
>3.5 –2.27 (1.16) 0.04  –0.75 (0.63) 0.24  –0.72 (0.82) 0.96 
Cumulative iron intake (g) in time period 1¶         
0.0§   0.0§   0.0§  
0.1–1.9 –0.86 (1.21) 0.47  1.22 (0.74) 0.09  2.39 (1.03) 0.04 
2.0–3.0 –2.16 (1.25) 0.08  0.18 (0.74) 0.81  0.59 (1.01) 0.81 
>3.0 1.50 (1.24) 0.23  2.09 (0.82) 0.01  2.02 (1.11) 0.30 
Time-varying exposure Marginal structural model  Crude OLR* model  Adjusted OLR model† 
β (SE*) p value  β (SE) p value  β (SE) p value 
Cumulative iron intake (g) in time period 0‡         
0.0§   0.0§   0.0§  
0.1–1.9 –1.13 (0.61) 0.06  –0.76 (0.47) 0.11  –1.19 (0.61) 0.09 
2.0–3.5 –1.75 (0.59) 0.003  –1.07 (0.49) 0.03  –0.94 (0.63) 0.49 
>3.5 –2.27 (1.16) 0.04  –0.75 (0.63) 0.24  –0.72 (0.82) 0.96 
Cumulative iron intake (g) in time period 1¶         
0.0§   0.0§   0.0§  
0.1–1.9 –0.86 (1.21) 0.47  1.22 (0.74) 0.09  2.39 (1.03) 0.04 
2.0–3.0 –2.16 (1.25) 0.08  0.18 (0.74) 0.81  0.59 (1.01) 0.81 
>3.0 1.50 (1.24) 0.23  2.09 (0.82) 0.01  2.02 (1.11) 0.30 

* OLR, ordinary logistic regression; SE, standard error.

† Adjusted for confounder history through inclusion of covariates in the model as regressors.

‡ Time period 0 represents the period from the start of prenatal care to 24–29 weeks’ gestation.

§ Reference category.

¶ Time period 1 represents the period from 24–29 weeks’ gestation to delivery.

TABLE 2.

Relations between different iron supplementation regimes during pregnancy and anemia at delivery using a marginal structural model, a crude ordinary logistic regression model, and an adjusted ordinary logistic regression model, based on data from the Iron Supplementation Study (n = 234), Raleigh, North Carolina, 1997–1999

Cumulative iron intake (g) in time period 0* Cumulative iron intake (g) in time period 1† Marginal structural model  Crude OLR‡ model  Adjusted OLR model§ 
OR‡ 95% CI‡  OR 95% CI  OR 95% CI 
1.0   1.0   1.0  
0.1–1.9 0.32 0.10, 1.06  0.47 0.19, 1.18  0.31 0.09, 1.01 
2.0–3.5 0.17 0.05, 0.56  0.34 0.13, 0.89  0.39 0.11, 1.35 
>3.5 0.10 0.01, 1.00  0.47 0.14, 1.64  0.49 0.10, 2.44 
          
0.1–1.9 0.42 0.04, 4.54  3.39 0.79, 14.4  10.9 1.43, 82.7 
0.1–1.9 0.1–1.9 0.14 0.01, 1.37  1.58 0.43, 5.87  3.32 0.55, 19.9 
2.0–3.5 0.1–1.9 0.07 0.01, 0.67  1.16 0.31, 4.31  4.25 0.70, 25.9 
>3.5 0.1–1.9 0.04 0.01, 0.78  1.60 0.34, 7.53  5.30 0.68, 42.1 
          
2.0–3.0 0.12 0.01, 1.35  1.20 0.28, 5.12  1.81 0.25, 13.2 
0.1–1.9 2.0–3.0 0.04 0.003, 0.38  0.56 0.15, 2.12  0.55 0.09, 3.35 
2.0–3.5 2.0–3.0 0.02 0.01, 0.21  0.41 0.11, 1.56  0.71 0.12, 4.26 
>3.5 2.0–3.0 0.01 0.001, 0.27  0.57 0.12, 2.66  0.88 0.11, 6.59 
          
>3.0 4.49 0.39, 51.3  8.06 1.61, 40.3  7.59 0.86, 67.3 
0.1–1.9 >3.0 1.44 0.10, 20.3  3.78 0.87, 16.3  2.31 0.33, 16.4 
2.0–3.5 >3.0 0.78 0.06, 9.73  2.76 0.65, 11.8  2.97 0.42, 21.1 
>3.5 >3.0 0.46 0.02, 11.2  3.82 0.74, 19.8  3.70 0.43, 31.8 
Cumulative iron intake (g) in time period 0* Cumulative iron intake (g) in time period 1† Marginal structural model  Crude OLR‡ model  Adjusted OLR model§ 
OR‡ 95% CI‡  OR 95% CI  OR 95% CI 
1.0   1.0   1.0  
0.1–1.9 0.32 0.10, 1.06  0.47 0.19, 1.18  0.31 0.09, 1.01 
2.0–3.5 0.17 0.05, 0.56  0.34 0.13, 0.89  0.39 0.11, 1.35 
>3.5 0.10 0.01, 1.00  0.47 0.14, 1.64  0.49 0.10, 2.44 
          
0.1–1.9 0.42 0.04, 4.54  3.39 0.79, 14.4  10.9 1.43, 82.7 
0.1–1.9 0.1–1.9 0.14 0.01, 1.37  1.58 0.43, 5.87  3.32 0.55, 19.9 
2.0–3.5 0.1–1.9 0.07 0.01, 0.67  1.16 0.31, 4.31  4.25 0.70, 25.9 
>3.5 0.1–1.9 0.04 0.01, 0.78  1.60 0.34, 7.53  5.30 0.68, 42.1 
          
2.0–3.0 0.12 0.01, 1.35  1.20 0.28, 5.12  1.81 0.25, 13.2 
0.1–1.9 2.0–3.0 0.04 0.003, 0.38  0.56 0.15, 2.12  0.55 0.09, 3.35 
2.0–3.5 2.0–3.0 0.02 0.01, 0.21  0.41 0.11, 1.56  0.71 0.12, 4.26 
>3.5 2.0–3.0 0.01 0.001, 0.27  0.57 0.12, 2.66  0.88 0.11, 6.59 
          
>3.0 4.49 0.39, 51.3  8.06 1.61, 40.3  7.59 0.86, 67.3 
0.1–1.9 >3.0 1.44 0.10, 20.3  3.78 0.87, 16.3  2.31 0.33, 16.4 
2.0–3.5 >3.0 0.78 0.06, 9.73  2.76 0.65, 11.8  2.97 0.42, 21.1 
>3.5 >3.0 0.46 0.02, 11.2  3.82 0.74, 19.8  3.70 0.43, 31.8 

* Time period 0 represents the period from the start of prenatal care to 24–29 weeks’ gestation.

† Time period 1 represents the period from 24–29 weeks’ gestation to delivery.

‡ OLR, ordinary logistic regression; OR, odds ratio; CI, confidence interval.

§ Adjusted for confounder history through inclusion of covariates in the model as regressors.

References

1.
Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology.
Epidemiology
 
2000
;
11
:
550
–60.
2.
Robins JM. Marginal structural models. In: 1997 proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Association, 1998:1–10.
3.
Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran E, Berry D, eds. Statistical models in epidemiology: the environment and clinical trials. New York, NY: Springer Verlag, 1999:95–134.
4.
Robins JM. Correction for non-compliance in equivalence trials.
Stat Med
 
1998
;
17
:
269
–302.
5.
Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men.
Epidemiology
 
2000
;
11
:
561
–70.
6.
Cook NR, Cole SR, Hennekens CH. Use of a marginal structural model to determine the effect of aspirin on cardiovascular mortality in the Physicians’ Health Study.
Am J Epidemiol
 
2002
;
155
:
1045
–53.
7.
Choi HK, Hernán MA, Seeger JD, et al. Methotrexate and mortality in patients with rheumatoid arthritis: a prospective study.
Lancet
 
2002
;
359
:
1173
–7.
8.
Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the joint causal effect of nonrandomized treatments.
J Am Stat Assoc
 
2001
;
96
:
440
–8.
9.
Hernán MA, Brumback BA, Robins JM. Estimating the causal effect of zidovudine on CD4 count with a marginal structural model for repeated measures.
Stat Med
 
2002
;
21
:
1689
–709.
10.
Cole SR, Hernán MA, Robins JM, et al. Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models.
Am J Epidemiol
 
2003
;
158
:
687
–94.
11.
Siega-Riz AM, Harzema A, Thorp J, et al. Selective vs. universal iron supplementation during pregnancy: prevention of third-trimester anemia. (Abstract).
FASEB J
 
2001
;
15(suppl)
:
A974
.
12.
Centers for Disease Control and Prevention. Recommendations to prevent and control iron deficiency in the United States.
MMWR Recomm Rep
 
1998
;
47
:
1
–29.
13.
Little RJ, Rubin DB. Statistical analysis with missing data. Hoboken, NJ: Wiley-Interscience, 2002.
14.
Institute of Medicine. Nutrition during pregnancy. Washington, DC: National Academy Press, 1990.
15.
Pearl J. Causal diagrams for empirical research.
Biometrika
 
1995
;
82
:
669
–88.
16.
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research.
Epidemiology
 
1999
;
10
:
37
–48.
17.
Robins JM. Data, design, and background knowledge in etiologic inference.
Epidemiology
 
2001
;
12
:
313
–20.
18.
van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York, NY: Springer-Verlag, 2003.
19.
SAS Institute, Inc. SAS/STAT user’s guide. Version 8. Cary, NC: SAS Institute, Inc, 1999.
20.
Stata Corporation. Stata statistical software: release 7.0. College Station, TX: Stata Corporation, 1999.
21.
Murphy SA, van der Laan MJ, Robins JM, et al. Marginal mean models for dynamic regimes.
J Am Stat Assoc
 
2001
;
96
:
1410
–24.
22.
Robins JM. Structural nested failure time models. In: Armitage P, Colton T, eds. The encyclopedia of biostatistics. Chichester, United Kingdom: John Wiley and Sons Ltd, 1998:4372–89.