Abstract

In non-experimental research, a sensitivity analysis helps determine whether a causal conclusion could be easily reversed in the presence of hidden bias. A new approach to sensitivity analysis on the basis of weighting extends and supplements propensity score weighting methods for identifying the average treatment effect for the treated (ATT). In its essence, the discrepancy between a new weight that adjusts for the omitted confounders and an initial weight that omits them captures the role of the confounders. This strategy is appealing for a number of reasons including that, regardless of how complex the data generation functions are, the number of sensitivity parameters remains small and their forms never change. A graphical display of the sensitivity parameter values facilitates a holistic assessment of the dominant potential bias. An application to the well-known LaLonde data lays out the implementation procedure and illustrates its broad utility. The data offer a prototypical example of non-experimental evaluations of the average impact of job training programmes for the participant population.

1 INTRODUCTION

In causal evaluations that rely on statistical adjustment for covariates that confound the treatment effect on the outcome, omitted confounding is always a major concern. These are covariates with different distributions across the treated group and the untreated group. For example, in the well-known debate about whether smoking causes lung cancer, even after analysts have adjusted for a host of observed individual and environmental factors, critics have nonetheless raised the possibility that some unknown biological agents, presumably more prevalent among smokers, might both trigger smoking and increase one's susceptibility to lung cancer. In the existence of such omitted confounding, the lung cancer rate among smokers had they counterfactually not smoked cannot be assumed identical to the lung cancer rate among non-smokers, even though the two groups share the same observed pre-treatment characteristics after statistical adjustment. The remaining difference in the lung cancer rate between the two groups is the hidden bias due to the omitted confounding. This is illustrated in Figure 1. After a host of observed characteristics has been adjusted for, the difference in lung cancer rate between smokers and non-smokers (i.e. A–C) is the sum of the true effect (i.e. A–B) and the hidden bias (i.e. B–C). A causal conclusion is likely invalid if a hidden bias is large relative to the true effect; if a hidden bias is relatively small, the existence of such a bias may not threaten the causal validity of the initial conclusion.

Causal effect and hidden bias
FIGURE 1

Causal effect and hidden bias

In accordance with this logic, Cornfield et al. (1959) stressed the crucial role of sensitivity analysis in evaluating past evidence. Having conducted a comprehensive review of the existing studies, they reasoned that the causal claim that cigarette smoking is a cause of lung cancer was not highly sensitive to hidden bias: ‘The magnitude of the excess lung-cancer risk among cigarette smokers is so great that the results can not be interpreted as arising from an indirect association of cigarette smoking with some other agent or characteristic, since this hypothetical agent would have to be at least as strongly associated with lung cancer as cigarette use; no such agent has been found or suggested’ (p.173). This conclusion has continued to hold in the past six decades.

Simply put, a sensitivity analysis (SA) quantifies the amount of hidden bias associated with omitted confounders and evaluates whether removing such a bias would qualitatively change the initial analytic conclusion. It addresses two questions: How large would a hidden bias need to be to alter the initial conclusion? And is such a hidden bias scientifically plausible? (See Bross, 1966, for one of the earliest illustrations.) Conclusions that are harder to alter by a scientifically plausible hidden bias are expected to add a higher value to knowledge about causality.

There are two distinct criteria for judging whether the removal of a hidden bias would lead to a qualitative change: the first refers to a change in practical significance as quantified by effect size, whereas the second refers to a change in statistical significance that is typically determined by the p value in reference to a certain significance level (e.g. 0.05 is a conventional threshold adopted in the social and behavioural sciences). Notably, on the one hand, a change in the statistical significance of an analytic result (such as a change in p value from 0.051 to 0.049) may have little practical significance if the corresponding change in effect size is minimal; on the other hand, a great amount of change in effect size may not necessarily move the p value over an arbitrarily determined conventional threshold yet may have considerable implications for decision-making. For this reason, the sensitivity analysis literature has primarily focused on the former. Following this rationale, we prioritize the assessment of sensitivity in terms of a potential change in practical significance rather than in statistical significance. Nonetheless, we propose in the discussion section an additional procedure for assessing the latter.

Although researchers in the social, behavioural and health sciences have a long tradition of rightfully voicing scepticisms towards conclusions drawn from non-experimental research, most published studies have demonstrated little effort to empirically investigate potential sources of confounding and assess their impact. This is likely because, with a few exceptions, discussions about SA techniques have been primarily limited to the statistics community. To our knowledge, research societies or funding agencies generally have not included SA as a required element in their guidelines for grant application or journal publication.

In analysing non-experimental data, popular strategies for reducing selection bias include analysis of covariance, multiple regression and various propensity score-based techniques such as matching, sub-classification and weighting. Identification typically relies on the strong assumption that, after adjusting for a set of observed pre-treatment covariates, there are no omitted confounders. This is known as the strong ignorability assumption (Rosenbaum & Rubin, 1983b). Confounding covariates may be omitted because they are unobserved, because their functional forms are unknown or because under the constraint of sample size, the analyst must exclude a subset of observed covariates to avoid model overfitting.

Different SA approaches have been proposed in the past corresponding to different statistical adjustment methods employed in the initial analysis. Every SA approach represents the hidden bias associated with an omitted confounder or a set of omitted confounders as a function of a number of sensitivity parameters. In general, a causal conclusion is considered to be sensitive if relatively small values of the sensitivity parameters could lead to a qualitative change in inference; a causal conclusion is insensitive if only extreme values of the sensitivity parameters could alter the inference (Rosenbaum, 1995).

However, as Greenland (2005) pointed out, the complexity of the dependence of analytic results on sensitivity parameters that are often high dimensional due to multiple bias sources ‘render sensitivity analyses difficult to present without drastic (and potentially misleading) simplifications’ (p. 271). Most existing SA strategies have been developed under simplified setups or require additional assumptions that are necessary for reducing the number of sensitivity parameters to make the analysis feasible. The simplified setups inevitably limit the applicability of a particular SA strategy; meanwhile, violations of the simplifying assumptions in real data applications would likely invalidate the SA results. A major concern is that SA conclusions might themselves be sensitive to potential violations of the additional simplifying assumptions. This would lead to a tail-chasing game that might never end.

Moreover, in a study with a given set of covariates, an estimator that makes use of propensity score-based weighting will generally differ from a regression estimator or a matching estimator in the amount of bias that each contains. Hence, sensitivity analysis methods developed under the linear regression framework, for example, generally are not suitable when an initial analysis has been conducted through weighting.

This paper introduces a new weighting-based approach for evaluations of the average treatment effect for the treated (ATT). Average treatment effect for the treated is of theoretical interest when the treated population is a subset of the untreated population prior to the treatment and when the treatment has little relevance to many members of the untreated population. For example, a job training programme typically has no immediate relevance to individuals who have job security; nor is it relevant to those who are not participating in the labour force. A research question about the impact of the programme is applicable only to the unemployed or the underemployed who are arguably represented by the population of individuals who actually participate in the training programme. In non-experimental research, identification of ATT is subject to selection bias due to possible lack of comparability between the treated group and the untreated group prior to the treatment in numerous ways.

The new weighting-based SA approach builds on past SA research and offers a sensible and flexible alternative with broad applicability to non-experimental ATT evaluations that employ weighting. In a nutshell, the discrepancy between a new weight that adjusts for an omitted confounder and an initial weight that omits the confounder captures the role of the confounder that contributes to the bias. We will show that the weighting-based SA extends and supplements the propensity score weighting methods for identifying ATT. Importantly, unlike most of the existing SA methods, the new weighting-based SA strategy does not require additional simplifying assumptions.

We demonstrate the utility of the new SA strategy in the context of a re-evaluation of a job training programme. Non-experimental evaluations of job training programmes have frequently been used by econometricians and statisticians to illustrate new methods for causal inference. A study by LaLonde (1986), however, famously revealed the inadequacy of non-experimental analyses. LaLonde innovatively combined the treated group from an experimental study with comparison groups drawn from large-scale surveys. The combined data represent a typical non-experimental study. He then evaluated a number of non-experimental econometric procedures and concluded that many of them failed to remove selection bias. Dehejia and Wahba (1999) analysed a subset of this data with propensity score stratification or matching. They additionally conducted sensitivity analysis showing that the results were sensitive to the selection of pre-treatment covariates for each adjustment.

We re-analyse the non-experimental data from the LaLonde (1986) study because this application example is well known and, more importantly, because it exhibits typical threats to causal inference associated with omitted confounders. This application study assesses, for male participants in the National Supported Work (NSW) programme in the mid-1970s, the programme impact on earnings. Our non-experimental sample includes the participants in the original NSW-treated group and a comparison group drawn from the Current Population Survey (CPS). After presenting the new SA strategy, we conceptualize different sources of bias in the non-experimental analysis and then walk the reader through the concrete steps of using the weighting-based SA strategy to quantify each source of bias as well as providing a holistic illustration of multiple bias sources. In addition, we provide an example of setting the bounds for sensitivity parameters on the basis of scientific reasoning. We restrict the discussion to binary treatments as these are the most common setup for evaluating ATT.

2 SELECTION BIAS IN ATT EVALUATIONS

Under the potential outcomes framework (Holland, 1986; Neyman, 1935; Rubin, 1978), ATT is defined as
Here Z is the treatment indicator. In our application example, Z takes value 1 if an individual receives job training and 0 otherwise. Yz for z=0,1 is the potential outcome a year later corresponding to treatment condition z: Y(1) denotes an individual's potential earnings if receiving training; Y(0) denotes the same individual's potential earnings if not receiving training. Assuming that one's potential earnings cannot be affected by whether other people have received job training (known as the stable unit treatment value assumption, SUTVA, see Rubin, 1986,1990), the difference between these two potential outcomes defines the causal effect of job training on earnings for a treated person. This difference may vary from person to person because individuals may not receive the same amount of benefit from the job training. ATT is the average of this individual-specific causal effect over all individuals in the treated population. This conditional average is denoted by E·|Z=1. A causal effect in a population needs to be identifiable with observable population quantities and subsequently estimated by sample data (Heckman & Vytlacil, 2007; Hong, 2015). For a treated person, the observed outcome Y is equal to the first potential outcome Y1; however, the person's second potential outcome Y0 is counterfactual and must be inferred from data in the untreated group.
In non-experimental evaluations of ATT, selection bias may arise if the analyst makes a simple comparison between the average observed outcome of the treated group denoted by EY|Z=1 and that of the untreated group denoted by EY|Z=0. The selection bias is the difference between this bias-inflicted parameter and the causal effect of interest:
The second equation holds because Y and Y1 are equal in expectation for the treated and, similarly, Y and Y0 are equal in expectation for the untreated; moreover, the mean difference conditional on Z=1 is equal to the difference between the two conditional means. According to the above result, the bias arises if there is a pre-existing difference in the mean of Y0 between the treated population and the untreated population. In the job training example, this bias is the average difference between the treated group and the untreated group in potential earnings if counterfactually neither group would have access to training.

Once the analyst has adjusted for a set of observed pre-treatment covariates X, some selection bias will remain if a difference still exists between the treated and the untreated in the average potential earnings without training even though they have had the same distribution of X. This is analogous to ‘B – C’ in Figure 1.

3 EXISTING SA STRATEGIES FOR ATT EVALUATIONS

This section reviews several popular approaches to ATT evaluations and the accompanying SA strategies. These include regression, propensity score matching, propensity score stratification and propensity score weighting. We briefly explain the rationale of each SA strategy and highlight the additional simplifying assumptions that are often required in each case for implementation. Nearly all the existing SA strategies share a common consideration—that is, the hidden bias due to an unmeasured confounder U is related to at least two factors: one is the association between U and the treatment assignment Z; and the other is the association between U and the outcome Y under either or both treatment conditions. Different representations of these two factors seem to be appealing to applied researchers in different fields. For example, regression coefficients (Rosenbaum, 1986), correlations (Frank, 2000) and partial R2 (Imbens, 2003) are familiar to social scientists who operate within the regression framework while risk ratios (Cornfield et al, 1959; Ding & VanderWeele, 2016) are particularly meaningful to researchers in epidemiology or medicine. (Readers may also refer to a review of SA methods for ATE evaluations by Liu and colleagues, 2013, when the treatment, the outcome and the omitted confounder are all binary. Zhang et al. (2018) provide an up-to-date summary of a wide range of SA methods for ATE evaluation that are categorized according to whether information on unmeasured confounders is unavailable, is available within the study data or is available in the literature or in other data source).

3.1 Regression

In non-experimental research, a typical regression analysis regresses an observed outcome Y on a treatment indicator Z and a vector of observed pre-treatment covariates X. In the absence of omitted confounders, the coefficient for Z identifies the ATE that is averaged over the entire population rather than the treated population. Applying the same regression model to a causal analysis of ATT, the analyst must assume that the treatment effect is equal on average for the treated population and the untreated population. ATT and ATE are expected to be unequal when the treatment effect is heterogeneous and is a function of confounding covariates. If a confounding covariate U is omitted, the bias in ATT and that in ATE are expected to be unequal as well (see Appendix  A for an illustration). Thus in general, SA strategies for evaluations of ATE do not directly apply to evaluations of ATT.

Regression-based SA has been developed for assessing the potential consequences of omitted confounding. Rosenbaum (1986) proposed an SA approach in a setup of linear models with continuous outcomes when U and X are conditionally independent under each treatment condition. The simplifying assumptions include (a) that the outcome Y under each treatment condition is strictly a linear function of an omitted confounder U after adjustment for the observed covariates X, (b) that the relationship between U and Y does not differ across the treatment conditions and (c) nor does it differ across levels of X. ATT and ATE do not differ under these simplifying assumptions. The confounding impact of U can be represented as a product of two sensitivity parameters: the first is the linear association between U and Y under a given treatment condition and the second is the average difference in U between the two treatment groups after the adjustment for X. Yet, when the additivity assumption (b) does not hold, as Marcus (1997) has shown, two additional sensitivity parameters are required for the evaluation of ATE: one is the effect of the interaction between U and Z in predicting Y; and the other is the mean of U in the treated group. This will lead to a total of four sensitivity parameters. Following a similar logic, Lin and colleagues (1998) extended the results to binary outcomes and censored survival time data.

Others have utilized the same logic when all variables are standradized in a linear regression (Mauro, 1990). Frank (2000) further proposed a sensitivity index as the product of two conditional correlations, one between U and Y and the other between U and Z. On the continuum of this index, an impact threshold indicates the magnitude of a confounding impact necessary to change the causal conclusion. The reference distribution of the sensitivity index allows one to assess the probability of observing such an impact. A parallel development in econometrics involves specifying both an outcome model and a model for the latent treatment selection. The correlation between the two error terms is associated with the direction and the magnitude of bias (Copas & Li, 1997; Heckman, 1979). An alternative strategy, proposed by Imbens (2003), quantifies the proportion of unexplained variation in the treatment and that in the outcome attributable to U, each represented as a partial R2 value. If the partial R2 values required for altering the initial empirical conclusion is much greater than the reasonable values associated with the observed covariates, then the initial result is judged to be insensitive to omitted confounding. This strategy allows for a vector of unmeasured confounders; however, it does not lead to a closed form bias function.

To our knowledge, most regression-based SA strategies were developed for ATE rather than ATT evaluations. One exception is the SA strategy proposed by Carnegie, Harada and Hill (2016) who simulated hypothetical values of a potential confounder based on functional and distributional assumptions. In general, regression-based SA strictly requires that the functional form of the outcome model is known to the analyst (Frank, 2000; Imbens, 2003; Lin et al, 1998; Marcus, 1997; Rosenbaum, 1986; VanderWeele & Arah, 2011). Therefore, the SA results themselves are potentially sensitive to outcome model misspecifications. Finally, the number of regression-based sensitivity parameters and their forms typically depend on the number of omitted confounders and the assumed data generation functions.

3.2 Bounds for relative risks

Cornfield et al. (1959) were pioneers in deriving conditions under which the estimated effect of a treatment obtained from a non-experimental study could be completely explained away by an unmeasured confounder U when the treatment, the outcome and the confounder are all binary. In this setup, the treatment effect on the outcome and the two sensitivity parameters—the association between U and Z, and that between U and Y—are all represented by risk ratios. The classical Cornfield conditions determine the minimal values of the two sensitivity parameters that are necessary for altering the initial causal conclusion. Following this tradition, Lee (2011) made an extension to a categorical U; Poole (2010) and Ding and VanderWeele (2014) considered risk differences in addition to risk ratios. Allowing the association between U and Y to differ across the treatment conditions, Ding and VanderWeele (2016) derived a bounding factor (i.e. a bias index or a sensitivity threshold) as a function of the sensitivity parameters where U could be a single confounder or a vector. Following the same logic as that proposed by Manski (1990) and Frank (2000), VanderWeele and Ding (2017) further suggested that applied researchers calculate an E-value in the case that U would have the same magnitude of associations with the treatment and with the outcome. However, this strategy cannot accommodate the case in which a null finding produced in the initial analysis may be altered. In fact, many null findings suppressed due to ‘publication bias’ are likely sensitive to omitted confounding. Furthermore, when the treatment effect differs between the treated and the untreated, these bounding strategies designed for evaluating ATE are not suitable for ATT.

3.3 Propensity score matching

For non-experimental research, Rosenbaum and Rubin (1983a) proposed summarizing multivariate pre-treatment information in a unidimensional propensity score. In the example of job training, individuals with a relatively high propensity score are predicted to have a relatively high predisposition of participating in a job training programme. The propensity score can be estimated through logistic regression as a function of observed pre-treatment covariates. Propensity score matching is particularly suitable for ATT analysis when there is a large reservoir of sampled individuals in the untreated group that supplies potential matches to the treated individuals. The average within-pair difference in the observed outcome over all the matched pairs estimates the ATT. Under the simplifying assumption that an unmeasured confounder U is a near perfect predictor of a binary outcome and is within a finite range, Rosenbaum (1995) proposed an SA approach that involves a sensitivity index Γ. It denotes the odds ratio of receiving one treatment rather than another between two individuals within a matched pair and hence is related to the predictive relationship between U and Z. In expectation, Γ=1 if the treatment has been randomized. Because any given value of Γ corresponds to a range of treatment assignment probabilities, a sensitivity analysis computes the bounds for the test statistic and p values. These are known as ‘Rosenbaum bounds’. As Γ increases, however, the bounds become wider and therefore less informative, reflecting the increasing uncertainty about U. To consider a more general case in which U imperfectly predicts the outcome, an additional sensitivity index was introduced to represent the odds ratio of displaying the outcome value of interest between two individuals within a matched pair (Gastwirth et al, 1998). See Greenland (1996) for a review and Harding (2003) for an application. Ichino, Mealli, and Nannicini (2008) instead simulated a binary U in accordance with the characterization of its distribution, its association with Z and its association with Y. They included the simulated U in a set of matching covariates and then re-estimated the ATT.

3.4 Propensity score stratification

One may use the estimated propensity score to sort and sub-classify a sample. For an ATT evaluation, the sample may be divided into S strata such that there will be an equal number of treated individuals across all these strata. The treated group and the untreated group in the same stratum share similar values in the estimated propensity score. The within-stratum mean difference in the outcome, averaged over all the strata, estimates the ATT. Considering a binary unobserved confounder U and a binary outcome which simplified the setup, Rosenbaum and Rubin (1983b) highlighted three sensitivity parameters within each propensity stratum s: the distribution of U, the association between U and Z, and the association between U and Y that was assumed constant across the two treatment conditions. Different assumptions about U lead to different values of these sensitivity parameters; the maximum likelihood estimate of the causal effect will change accordingly. However, in a sample that has been divided into J propensity strata, there can be as many as 3 J sensitivity parameters. To limit the size of SA for practical reasons, they proposed another simplifying assumption that each sensitivity parameter be constant across all J strata.

3.5 Propensity score weighting

Propensity score-based weighting adjustment for observed confounding, known as inverse-probability-of-treatment weighting (IPTW) in marginal structural models, was initially proposed by Robins (1987) and Rosenbaum (1987). This method allows the analyst to estimate the population average potential outcome associated with each treatment one at a time before taking mean contrasts. An ATT analysis and an ATE analysis differ in how the weight is constructed. As we will formally show in Section 4, an ATT analysis requires a weight that transforms only the untreated group to approximate the distribution of the potential outcome of the treated group under the counterfactual control condition. In contrast, an ATE analysis requires a transformation of both the treated group and the untreated group.

Robins and colleagues (Brumback et al, 2004; Robins, 2000) developed a weighting-based SA approach that involves a non-identifiable user-supplied confounding bias function. The function contains a single sensitivity parameter and requires no consideration of whether an unmeasured confounder U is univariate or multivariate, continuous or discrete. However, various forms of the bias function correspond to different types of unmeasured confounding, and the analyst receives little guidance in choosing an appropriate functional form. As shown in an application (Brumback et al, 2004), the SA results are apparently sensitive to the form of the bias function specified somewhat arbitrarily by the analyst.

Ridgeway and colleagues (McCaffrey et al, 2004; Ridgeway, 2006) proposed a different SA strategy for weighting-based ATT analysis. These researchers considered, for each sampled individual, the ratio of a propensity score-based weight that makes additional adjustment for an omitted confounder to the initial weight that does not. For an omitted covariate to contribute to selection bias, this ratio of the two weights must be correlated with the outcome as well. After making an assumption about the parametric distribution of this ratio, the analyst simulates data by drawing random values from the assumed distribution and then obtains bounds for the sensitivity results.

4 WEIGHTING-BASED BIAS FORMULA FOR EVALUATING ATT

This article introduces an alternative weighting-based SA approach for ATT evaluations. Rather than considering a ratio of two weights, we use the discrepancy between a new weight that adjusts for the omitted confounders and an initial weight that omits them to capture the role of the confounders. Our SA strategy does not involve any distributional assumptions about these weights or about their discrepancy. We simply derive the effect size of a hidden bias as a product of two sensitivity parameters: one is the standard deviation of the weight discrepancy and the other the correlation between the weight discrepancy and the outcome. We will show that these two sensitivity parameters are meaningful and easy to work with in real data applications.

4.1 Weighting-based sensitivity parameters

In an initial analysis that adjusts for a set of observed pre-treatment covariates denoted by X, the analyst who employs propensity score-based weighting applies a constant weight 1.0 to the treated group members and applies the following weight to the untreated group members with observed pre-treatment covariate values x.
(1)
In the numerator, prZ=1|X=x is an individual's conditional probability (i.e. propensity) of receiving the treatment as a function of the individual's pre-treatment characteristics x; in the denominator, prZ=0|X=x is simply equal to 1-prZ=1|X=x. This weighting scheme transforms the joint distribution of X in the untreated group to resemble that in the treated group. prZ=0/prZ=1, the ratio of the proportion untreated to the proportion treated in the combined population, is a constant that normalizes the weight. Under the assumption that the treated group and the untreated group have the same distribution of Y0 within levels of X=x, it is easy to prove that EY0|Z=1=EWY|Z=0 (see Appendix  B for the proof); and hence dATT=EY|Z=1-EWY|Z=0 identifies ATT without bias.
However, if the identification assumption requires conditioning on another set of pre-treatment covariates P in addition to X, then the propensity score is a function of not only x but also p. ATT will be identified by dP.ATT=EY|Z=1-EWPY|Z=0 instead, where for untreated group members with pre-treatment covariates X=x and P=p,
(2)

The discrepancy between the weight that includes P and the weight that omits P for each individual, WP-W, is a random variable that plays a key role in determining how much bias is associated with the omission of P. This bias is dATT-dP.ATT and can be quantified as the covariance between the weight discrepancy and the outcome in the untreated group.

Theorem 1
In an evaluation of ATT that employs propensity score-based weighting to remove selection bias associated a set of observed pre-treatment covariates X, the remaining bias due to the omission of pre-treatment covariates P is  

Appendix  C gives the proof.

Theorem 2

In an evaluation of ATT that employs propensity score-based weighting to remove selection bias associated with a set of observed pre-treatment covariates X, the bias represented as covWP-W,Y|Z=0 is equal to zero if a set of omitted covariates P does not predict the treatment given X, or if P predicts the treatment but does not predict the outcome in the untreated group given X.

See Appendix  D for the proof. This result is consistent with the well-known logic that the omission of P will contribute a bias in an ATT evaluation only if P predicts the treatment and also predicts the outcome in the untreated group given X. However, our theoretical result does not require that the predictive relationship between P and Z and that between P and Y take any pre-specified functional forms.

Following the convention that uses the standard deviation of the outcome in the untreated group, σy|0=varY|Z=0, as the scaling unit and dividing BiasATT by σy|0, we obtain the effect size (ES) of BiasATT as a product of two sensitivity parameters:
The first sensitivity parameter is the standard deviation of the weight discrepancy in the untreated group, σ=varWP-W|Z=0; the second is the correlation between the weight discrepancy and the outcome in the untreated group, ρ=corrWP-W,Y|Z=0. Because σ is non-negative, the sign of ρ determines the direction of bias. When neither σ nor ρ is zero, the effect size of the bias increases with σ as the association between P and Z increases; the effect size of the bias also increases with ρ in magnitude as the association between P and Y increases under the untreated condition. The next subsection provides illustrations of these relationships.

4.2 Interpretation of the weighting-based sensitivity parameters

We reveal the meaning of each sensitivity parameter first in a relatively simple setup in which the potential outcomes and the treatment indicator are generated under a linear system that users tend to be most familiar with. Subsequently, we reveal the same pattern in a more complex setup involving non-linear data generation functions. We will show that, in both cases, the weighting-based sensitivity parameters are inherently related to the key parameters that generate the data.

Suppose that each potential outcome is a linear function of two predictors P and X and that both the X-Y relationship and the P-Y relationship may differ between the two treatment conditions. In other words, the treatment effect is a function of both P and X. The two potential outcomes may be generated as follows:
(3)
where Eε1|Z=1=Eε0|Z=1=0 in the absence of other omitted confounders. The predictors X and P are centred on their respective means in the treated population—that is, μ1X=EX|Z=1 and μ1P=EP|Z=1. Hence we have that
The first equation is obtained when we replace Y1 and Y0 with the respective models specified in Equation ((1)). The second equation holds because EX-μ1X|Z=1=EX|Z=1-μ1X=0 and EP-μ1P|Z=1=EP|Z=1-μ1P=0.
Which of the two potential outcomes would become observed for a given individual depends on the value of the treatment indicator Z. To keep it simple, we first let the probability of treatment assignment be a linear function of P and X:
(4)
Here μX=EX and μP=EP are the means of X and P, respectively, in the combined population of the treated and the untreated.

In Appendix  E, we reveal the mathematical connections between the weighting-based sensitivity parameters and the data generation parameters. Below is a summary of the results.

  • 1.

    σ  increases in magnitude with the predictive relationship between the omitted confounder and the treatment assignment. The latter is represented by the data generation parameter αp in model (4). Regardless of whether αp>0 or αp<0, σ always increases with the magnitude of αp. When αP=0, P no longer predicts Z, in which case σ reaches its minimum and becomes zero as well. The pattern is illustrated in Figure 2.

  • 2.

    When  σ  is fixed at a non-zero value, ρ  increases in magnitude with the predictive relationship between the omitted confounder and the potential outcome when untreated. The latter is represented by βp0 in model (3). On the basis of the derivations in Appendix  E, we conclude that ρ=0 when βp0=0. The sign of a non-zero ρ is determined by the sign of the product of αp and βp0 and is consistent with the direction of the bias. That is, ρ>0 indicates a positive bias, which arises when αp and βp0 are both positive or both negative; ρ<0 indicates a negative bias when αp and βp0 have opposite signs. Moreover, when αp is fixed, an increase in the magnitude of ρ corresponds to an increase in the magnitude of βp0 as well as an increase in the magnitude of the bias. Figure 3 shows these patterns.

Correspondence between σ and αp under model (4). σ and αp both indicate the predictive relationship between the omitted confounder and the treatment assignment. σ increases with the magnitude of αp when αp≠0; σ=0 when αp=0
FIGURE 2

Correspondence between σ and αp under model (4). σ and αp both indicate the predictive relationship between the omitted confounder and the treatment assignment. σ increases with the magnitude of αp when αp0; σ=0 when αp=0

Correspondence between ρ and βp0 under models (3) and (4). βp0 indicates the predictive relationship between the omitted confounder and the potential outcome when untreated. The sign of ρ is determined by the sign of αpβp0. When αp is fixed, ρ is a monotonic function of βp0; ρ=0 when βp0=0
FIGURE 3

Correspondence between ρ and βp0 under models (3) and (4). βp0 indicates the predictive relationship between the omitted confounder and the potential outcome when untreated. The sign of ρ is determined by the sign of αpβp0. When αp is fixed, ρ is a monotonic function of βp0; ρ=0 when βp0=0

Importantly, the above relationships continue to hold when the probability of treatment assignment fits a commonly used logistic regression model. Let the logit of the probability of treatment assignment be a linear function of P and X:
(5)

Under models (3) and (5), closed form relationships between the weighting-based sensitivity parameters and the data generation parameters are not obtained. However, their connections can be learned through numerical studies. Figures 4 and 5 show the monotonic relationships between σ and αp and between ρ and βp0.

Correspondence between σ and αp under the non-linear treatment assignment model (5). σ increases with the magnitude of αp when αp≠0; σ=0 when αp=0
FIGURE 4

Correspondence between σ and αp under the non-linear treatment assignment model (5). σ increases with the magnitude of αp when αp0; σ=0 when αp=0

Correspondence between ρ and βp0 under models (3) and (5). The sign of ρ is determined by the sign of αpβp0. When αp is fixed, ρ is a monotonic function of βp0; ρ=0 when βp0=0
FIGURE 5

Correspondence between ρ and βp0 under models (3) and (5). The sign of ρ is determined by the sign of αpβp0. When αp is fixed, ρ is a monotonic function of βp0; ρ=0 when βp0=0

Although the weighting-based sensitivity parameters σ and ρ are inherently connected to the familiar regression coefficients under the above setups of data generation, the weighting-based SA strategy does not require the simplifying assumptions. This is because weighting-based adjustments in their basic forms do not involve specifying the covariate-outcome relationships, the distribution of the omitted covariate, the distribution of the errors or the distribution of the weight. In contrast, other SA strategies typically require that the analyst choose among different model-based assumptions, distributional assumptions or assumptions about the form of an unknown sensitivity function; and these analytic decisions are potentially consequential. Moreover, unlike many of the SA methods that are restricted to a binary outcome and a binary unmeasured confounder, this new SA strategy shares an important feature with the other propensity score weighting-based SA strategies: it conveniently assesses bias associated with one or more omitted confounders and places no constraints on measurement scales for unmeasured confounders. Finally, the number of weighting-based sensitivity parameters and their forms do not change regardless of how the data are generated.

The new weighting-based SA strategy is consistent with all weighting-based approaches to causal inference and can be naturally incorporated as a logical step in an ATT evaluation. It assesses the potential remaining bias when the initial analysis has also been conducted through propensity score-based weighting. Weighting-based SA does not apply when the initial analysis has been conducted through other approaches such as regression or propensity score matching. Similarly, sensitivity analysis methods developed for statistical adjustment methods other than weighting are not suitable when an initial analysis has been conducted through weighting. This is because, in general, we do not expect that the bias in the regression estimator or the matching estimator be the same as the bias in the IPTW estimator of a causal effect. The mathematical results in Appendix  E can be used to clarify this point. In a scenario of data generation that is even simpler than the models specified in Equation ((3)), let βp1=βp0=βp. The effect size of bias in a regression estimator due to the omission of U is αpβp/σy|0. In contrast, in a weighting-based ATT analysis, the effect size of bias is σρ where σ is a non-linear function of αp and where ρ is a non-linear function of βp. In general, we do not expect αpβp/σy|0 and σρ to be equal.

5 APPLICATION

We use the LaLonde (1986) data to illustrate the new weighting-based SA strategy for ATT evaluations. Programme participation occurred in 1976–1977. The experimental result suggested that the training programme likely increased earnings in 1978. Our non-experimental sample includes the 297 male participants in the original NSW-treated group and a non-equivalent comparison group of 15,992 males from the Current Population Survey (CPS).

5.1 Initial Weighted ATT analysis

A number of pre-treatment characteristics including age, education, black, Hispanic, marriage status, high school completion and earnings in 1975 were measured for all the participants and for those in the comparison group. These are arguably some of the most important predictors of the outcome. On average, the treated group tended to be younger, less educated, more likely to be black or Hispanic, unmarried, without a high school degree (i.e. high school dropout), and earned less than the comparison group in 1975. With a standard deviation of the outcome in the CPS comparison group as the scaling unit ($8628), a naïve comparison shows that the treated group earned about one standard deviation less than the CPS comparison group on average.

To identify the ATT effect, we estimate for each individual in the CPS comparison group an IPTW weight defined in Equation ((1)) as a function of the above covariates. Using the minimum and maximum values of the estimated propensity score in the treated group as the lower and upper bounds of the common support, we exclude from the subsequent analysis 7984 comparison group members whose estimated propensity scores were lower than that of all the treated group members. These individuals in the comparison group apparently had an almost zero probability of participating in the NSW programme. Applying this weight to the data within the common support, we obtain an estimate of the effect size of ATT −0.050 with confidence interval (−0.159, 0.059), which indicates a largely negative effect of the job training programme on earnings. We now investigate whether this non-experimental result is sensitive to several different types of omitted potential confounders.

5.2 Potential sources of omitted confounding

In general, two types of omitted confounding are likely in an ATT analysis:

A observed covariates omitted from the initial adjustment due to concerns about model overfitting, including omitted non-linear or other higher-order functions of the observed covariates;

B unobserved covariates.

Type A omission may exist when the number of observed pre-treatment covariates in a study is relatively large and exceeds 10% of the sample size in either the treated group or the untreated group. To avoid model overfitting as it would introduce excessive noise to the estimation, the analyst is usually advised to select a subset of the pre-treatment covariates on theoretical grounds. It is likely that some of the unselected covariates may contribute to the remaining bias. This is not an issue in the analysis of the LaLonde data which contain a very limited number of observed covariates. However, because the functional form of a true propensity score model is unknown to the analyst, non-linear or other higher-order functions of the observed covariates are typically omitted from the user-specified propensity score model. These may include, for example, polynomial terms of the continuous covariates or multi-way interactions between the covariates. A close inspection of the data may further reveal non-linear patterns that do not fit polynomial functions, in which case non-parametric step functions may provide a useful alternative.

Type B omission can hardly be ruled out in the analyses of non-experimental data. In the current application, critics may easily point out some unmeasured individual characteristics that may predict both programme participation and future earnings even after the analyst has adjusted for all the observed covariates. Additional confounders may include, for example, social-emotional skills, social network resources and various stressors in life. Theoretical reasoning and evidence from other existing datasets may provide information on how these unobserved covariates may be compared to the observed ones. Such information is necessary for determining the range of plausible values of the sensitivity parameters.

5.3 Weighting-based SA procedure

An analytic procedure that implements the proposed weighting-based SA strategy includes five major steps:

  • Step 1. Obtain the initial weight and the effect size of the ATT estimate;

  • Step 2. Obtain a new weight according to the type of omission;

  • Step 3. Estimate σ and ρ each as a function of the weight discrepancy;

  • Step 4. Estimate the effect size of bias σρ;

  • Step 5. Obtain an adjusted estimate of the effect size of ATT.

To take into account uncertainty in the estimation of the sensitivity parameters, we repeat the above procedure over 1000 bootstrapped samples. This procedure, however, does not assess whether statistical inferences are sensitive to plausible hidden bias, a topic that we leave to the discussion section.

In Step 2, for type A omission, a new weight can be easily obtained by including in the propensity score model one or more observed covariates that have been omitted or a non-linear function of the observed covariates. For type B omission, we follow the logic that if two independent covariates have similar confounding impacts, omitting either one or the other would lead to similar bias. Therefore, if the confounding impact of an unobserved covariate is arguably comparable to that of an observed covariate, the analyst may use the latter to generate referent sensitivity parameter values for the former (e.g. Carnegie et al, 2016; Imbens, 2003). The discrepancy between a weight that includes the referent covariate and an alternative weight that excludes the referent covariate is computed to help quantify the bias attributable to an unobserved covariate arguably comparable to the referent covariate. By inspecting a range of potential omissions, the analyst would obtain a sense of plausible values of bias in some worst conceivable scenarios.

5.4 Display of weighting-based SA results

Table 1 presents the SA results for assessing bias due to type A omission. We use U1–U8 to represent eight different sets of omitted forms of the observed covariates. These include the step functions for age, education and earnings in 1975, and interactions between the categorical covariates. For example, the standard deviation of the discrepancy between the new weight that includes the step function for age and the initial weight is 4.205 while the correlation between the weight discrepancy and the outcome is 0.007. The product of σ^ and ρ^ is equal to 0.03, which estimates the effect size of the bias due to misspecifying a non-linear function of age as linear. Once this bias is removed, the effect size of the adjusted estimate of ATT becomes −0.08. Among the pairs of σ^ and ρ^ obtained from 1,000 bootstrapped samples, only 6.9% of them correspond to an effect size of negative bias >−0.05 in magnitude. In other words, the majority of the estimated values of the effect size of bias associated with this omission are not great enough to reverse the sign of the initial ATT estimate. These results are listed in the first row corresponding to U1. Similarly, we estimate the effect size of the bias associated with each of the other omitted terms. These omissions lead to either a positive or a negative bias; yet, none of them contributes a bias large enough to be consequential.

TABLE 1

Assessing bias due to type A omission

Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
U1: step functions for Age4.2050.0070.030−0.0806.9
U2: step functions for Education0.8900.0130.012−0.0527.9
U3: step functions for re750.336−0.016−0.005−0.04813.7
U4: black × married0.0590.0400.002−0.05211.0
U5: hispanic × married0.0380.0180.001−0.05111.1
U6: black × nodegree0.104−0.038−0.004−0.04614.6
U7: hispanic × nodegree0.0720.0260.002−0.05210.1
U8: nodegree × married0.2500.0300.008−0.0587.8
Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
U1: step functions for Age4.2050.0070.030−0.0806.9
U2: step functions for Education0.8900.0130.012−0.0527.9
U3: step functions for re750.336−0.016−0.005−0.04813.7
U4: black × married0.0590.0400.002−0.05211.0
U5: hispanic × married0.0380.0180.001−0.05111.1
U6: black × nodegree0.104−0.038−0.004−0.04614.6
U7: hispanic × nodegree0.0720.0260.002−0.05210.1
U8: nodegree × married0.2500.0300.008−0.0587.8

re75 is the variable name for earnings in 1975. The last column of this table displays the percentage of bootstrapped samples that produce a negative estimate of the effect size of bias large enough to change the sign of the estimated ATT.

TABLE 1

Assessing bias due to type A omission

Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
U1: step functions for Age4.2050.0070.030−0.0806.9
U2: step functions for Education0.8900.0130.012−0.0527.9
U3: step functions for re750.336−0.016−0.005−0.04813.7
U4: black × married0.0590.0400.002−0.05211.0
U5: hispanic × married0.0380.0180.001−0.05111.1
U6: black × nodegree0.104−0.038−0.004−0.04614.6
U7: hispanic × nodegree0.0720.0260.002−0.05210.1
U8: nodegree × married0.2500.0300.008−0.0587.8
Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
U1: step functions for Age4.2050.0070.030−0.0806.9
U2: step functions for Education0.8900.0130.012−0.0527.9
U3: step functions for re750.336−0.016−0.005−0.04813.7
U4: black × married0.0590.0400.002−0.05211.0
U5: hispanic × married0.0380.0180.001−0.05111.1
U6: black × nodegree0.104−0.038−0.004−0.04614.6
U7: hispanic × nodegree0.0720.0260.002−0.05210.1
U8: nodegree × married0.2500.0300.008−0.0587.8

re75 is the variable name for earnings in 1975. The last column of this table displays the percentage of bootstrapped samples that produce a negative estimate of the effect size of bias large enough to change the sign of the estimated ATT.

Table 2 lists the SA results for assessing bias due to type B omission. We use each of the seven observed covariates in the initial propensity score model denoted by X1–X7 as referents for comparable unobserved covariates. Among them, earnings in 1975 (variable name: re75) is the most important confounder. An omitted covariate comparable to re75 in its confounding impact would contribute a negative bias with its effect size as great as −0.256. Further removal of such a potential bias would lead to a positive estimate of the effect size of the ATT equal to 0.206. In this particular case, all the pairs of σ^ and ρ^ obtained from the 1000 bootstrapped samples correspond to an effect size of negative bias >−0.05 in magnitude. Unobserved covariates comparable to most of the other observed covariates except for the indicator for being black, however, would not contribute a bias great enough to qualitatively change the initial conclusion.

TABLE 2

Assessing bias due to type B omission

Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
X1: age0.7200.0020.001−0.05112.8
X2: education0.3780.0260.010−0.0608.7
X3: black3.743−0.025−0.0940.04484.9
X4: hispanic0.4700.0150.007−0.0579.0
X5: married1.0510.0080.009−0.0599.9
X6: nodegree1.3670.0020.003−0.05311.9
X7: re751.426−0.180−0.2560.206100.0
Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
X1: age0.7200.0020.001−0.05112.8
X2: education0.3780.0260.010−0.0608.7
X3: black3.743−0.025−0.0940.04484.9
X4: hispanic0.4700.0150.007−0.0579.0
X5: married1.0510.0080.009−0.0599.9
X6: nodegree1.3670.0020.003−0.05311.9
X7: re751.426−0.180−0.2560.206100.0
TABLE 2

Assessing bias due to type B omission

Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
X1: age0.7200.0020.001−0.05112.8
X2: education0.3780.0260.010−0.0608.7
X3: black3.743−0.025−0.0940.04484.9
X4: hispanic0.4700.0150.007−0.0579.0
X5: married1.0510.0080.009−0.0599.9
X6: nodegree1.3670.0020.003−0.05311.9
X7: re751.426−0.180−0.2560.206100.0
Omitted covariateσ^ρ^Estimated ES of biasES of adjusted estimate% Bootstrapped results consequential
X1: age0.7200.0020.001−0.05112.8
X2: education0.3780.0260.010−0.0608.7
X3: black3.743−0.025−0.0940.04484.9
X4: hispanic0.4700.0150.007−0.0579.0
X5: married1.0510.0080.009−0.0599.9
X6: nodegree1.3670.0020.003−0.05311.9
X7: re751.426−0.180−0.2560.206100.0

According to the SA results listed in Tables 1 and 2, an omitted confounder or a set of omitted confounders comparable to either black (X3) or re75 (X7) in the confounding impact would introduce a negative bias large enough to change the direction of the ATT estimate from negative to positive. In particular, removing a negative bias comparable to the confounding impact of re75 would further change the practical significance of the causal conclusion as it would indicate that the job training programme increased earnings by about one-fifth of a standard deviation. The plausible values of bias due to most other actual or potential omissions are rather small in magnitude and differ in signs; hence, the cumulative impact of these other omissions is likely negligible.

Figure 6 provides the holistic graphical display of the SA results in this application. Here σ has a lower bound 0 and an upper bound with some finite value; ρ takes values between −1 and 1. The estimated effect size of bias is zero when σ^=0 or when ρ^=0. In the current application, the effect size of the initial ATT estimate is −0.05. The result would become positive if the effect size of bias associated with an omitted confounder is negative with a magnitude greater than 0.05. The dotted curve in Figure 6 corresponds to this threshold bias value. The referent covariates X3 and X7 provide two examples of negative bias values that are great enough to be consequential as both are located to the left of the threshold.

Graphical Display of Sensitivity Analysis Results. Note: The contours associated with non-zero values of σ and ρ each represent a fixed amount of bias equal to σρ, the value of which is marked at the end of a contour. The dotted curve corresponds to the threshold value −0.05. Removing a bias beyond this threshold value would change the sign of the ATT estimate from negative to positive. A possible omission would lead to an amount of bias as indicated in this figure. Each ‘*’ corresponds to a covariate or a set of covariates that, if omitted, would contribute a bias great enough to change the analytic conclusion; each ‘Δ’ corresponds to a covariate or a set of covariates, the omission of which would be inconsequential. These covariates are listed in Tables 1 and 2
FIGURE 6

Graphical Display of Sensitivity Analysis Results. Note: The contours associated with non-zero values of σ and ρ each represent a fixed amount of bias equal to σρ, the value of which is marked at the end of a contour. The dotted curve corresponds to the threshold value −0.05. Removing a bias beyond this threshold value would change the sign of the ATT estimate from negative to positive. A possible omission would lead to an amount of bias as indicated in this figure. Each ‘*’ corresponds to a covariate or a set of covariates that, if omitted, would contribute a bias great enough to change the analytic conclusion; each ‘Δ’ corresponds to a covariate or a set of covariates, the omission of which would be inconsequential. These covariates are listed in Tables 1 and 2

5.5 Sensitivity bounds informed by theory and evidence

How plausible is it that an omitted covariate would be comparable to earnings in 1975 in its confounding impact? An answer to this question requires careful scientific reasoning. For example, citing the economics literature (Ashenfelter & Card, 1985), Dehejia and Wahba (2002) pointed out that many applicants for job training programmes had suffered from a sudden drop in earnings in the previous year but not necessarily the year before. In other words, for the treated individuals who had become unemployed or underemployed in 1975, their earnings in 1974 might provide additional predictive information about their potential earnings in 1978. Because selection bias may originate in the longitudinal trend of earnings, additional adjustment for between-group differences in earnings in 1974 is arguably necessary. A challenge, however, is that the experimental study failed to collect information on earnings in 1974 from nearly 40% of the participants. Due to the large amount of missing data in this covariate in the treated group, one cannot make direct statistical adjustment for earnings in 1974. Therefore, this covariate was omitted in the econometric analyses of the non-experimental data by LaLonde (1986). The exclusion of these 40% of the observations by Dehejia and Wahba in order to incorporate this additional variable into their propensity score model, as Smith and Todd (2001) pointed out, had ‘a strong effect on their results’ (p. 113). Our goal is to assess the potential consequence of this omission without excluding any observations. For simplicity, we use the original variable names re74, re75 and re78 to represent earnings in 1974, 1975 and 1978, respectively.

We have shown in section 4 of this article that sensitivity parameter ρ is monotonically related to the predictive relationship between the omitted covariate and the outcome under the untreated condition. Given that re74 and re75 are highly correlated, our first goal is to assess the predictive relationship between re74 and re78 in the untreated group when re75 has already been adjusted for. The estimated partial correlation between re74 and re78 is 0.17 conditioning on re75; and that between re75 and re78 is 0.29 conditioning on re74. Hence we reason that, the value of ρ for re74 conditioning on re75 is likely smaller in magnitude than the value of ρ for re75 conditioning on re74; and the latter is expected to be smaller in magnitude than the value of ρ for re75 when re74 is omitted, the estimate of which is −0.180 as shown in Table 2 and is adopted as the lower bound for the value of ρ for re74 in the sensitivity analysis.

Section 4 has also shown that sensitivity parameter σ is monotonically related to the predictive relationship between the omitted covariate and the treatment assignment. Because re74 is only partially observed in the treated group, we are unable to directly determine the value of σ for this covariate. Yet, the predictive relationship between re75 and the treatment assignment is stronger among those whose re74 was observed than among those whose re74 was missing. We reason that a similar pattern would likely hold for re74 in predicting the treatment assignment. In other words, should we obtain a new weight that would make an additional adjustment for re74, the value of σ for re74 would likely be smaller if re74 had been observed for all the treated individuals than if the analysis would be restricted to the subsample of individuals whose re74 was actually observed. In the latter case, we obtain an estimate of σ equal to 0.247 for re74. Hence, it seems reasonable to adopt 0.247 as the upper bound of σ for re74 in this application.

We now bound σ for re74 between 0 and 0.247 and bound ρ for re74 between −0.180 and 0. Therefore, the effect size of bias associated with re74 is to be bounded between −0.044 and 0. Once this bias is removed, the effect size of the ATT estimate will be bounded between −0.050 and −0.006. This result suggests that, although the confounding impact of re74 may not be as great as that of re75, we are certain that the omission of re74 has led to a considerable amount of negative bias in the initial ATT estimate.

In many application studies, an important confounder may be partially observed or entirely unobserved in a given dataset. Yet, the current data or other existing secondary data may provide useful information enabling the analyst to set up informative bounds for its sensitivity parameters. We underscore the necessity of bringing scientific reasoning to bear on the utilization of such information.

6 CONCLUSION AND DISCUSSION

Sensitivity analysis is an indispensable step in evaluating ATT when the data are non-experimental. This is because statistical adjustment for observed pre-treatment covariates is often inadequate for removing all the selection bias. This article introduces a new weighting-based SA approach that provides a useful alternative to the existing ones. The new approach applies when the initial ATT analysis has been conducted through propensity score-based weighting. Although this paper primarily considers a binary treatment and a continuous outcome, the same rationale can be extended to studies of a treated group with multiple untreated comparison groups and to categorical outcomes. A similar logic has been applied to weighting-based SA in causal mediation analysis (Hong, Qin, & Yang, 2018; Qin et al, 2019).

This new weighting-based SA strategy is relatively straightforward and intuitive, and demonstrates important advantages. First of all, unlike many of the existing SA strategies that often invoke additional simplifying assumptions that tend to be overly strong and restrictive, this new strategy does not rely on additional assumptions. Even though an analyst may choose to invoke parametric assumptions in specifying the propensity score model for treatment assignment, such assumptions are not required by this new SA procedure. This is because the analyst may freely choose to estimate the propensity score through semi-parametric or non-parametric weighting. Second, regardless of how complex the data generation functions are, the number of weighting-based sensitivity parameters remains small and their forms never change in ATT evaluations. Third, weighting-based SA are unconstrained by the measurement scales of the omitted confounders. Fourth, weighting-based SA can conveniently assess the aggregate bias associated with multiple omitted confounders. And finally, a graphical display provides a holistic view that may suggest whether the bias due to omitted confounding is predominantly positive or negative and the extent to which positive and negative biases may cancel out. We make available online the application data along with the R code for the analysis reported in Section 5.

This paper has focused on the weighting-based SA strategy for ATT evaluation. We are conducting research that extends this weighting-based SA framework to ATE evaluation. In a non-experimental evaluation of ATT, bias arises when the treated group and the control group differ in the mean potential outcome associated with the control condition only; in an ATE evaluation, a second bias arises when the treated group and the control group additionally differ in the mean potential outcome associated with the experimental condition. Hence, the number of sensitivity parameters will increase from 2 to 4. Nonetheless, the same advantages of weighting-based SA will remain salient.

Similar to most existing SA methods, this weighting-based SA strategy focuses on assessing the contribution of omitted confounders to bias in identification rather than assessing the impact of such omissions on efficiency in estimation. This is particularly the case with type B omission when theoretically important covariates are not directly observed in a given application. Also consistent with the bulk of the SA literature, emphasis is placed on qualitative changes in practical significance rather than in statistical significance. Yet in general, additional adjustment for a confounding covariate may change not only the expectation but also the variance of the ATT estimator. The sensitivity of statistical inference is important in its own regard especially when decision-making is contingent on the statistical significance of an analytic result. For type A omissions, once an important source of bias has been detected among the observed covariates or among their non-linear functions that were omitted from the initial analysis, a modified ATT analysis that makes additional adjustment for ‘the bias contributor’ will generate a new point estimate and its standard error. A closed form of the asymptotic variance of the ATT estimator can be obtained that takes into account the estimation uncertainty in the weight. This is done through simultaneously solving a set of mean zero estimating equations for the ATT and for the coefficients in the propensity score models (Newey, 1984; Robins et al, 1995; Schafer & Kang, 2008). Alternatively, bootstrapping may be employed to empirically quantify the estimation uncertainty. For type B omissions, researchers have proposed simulating an unmeasured confounder by taking random draws from its conditional distribution given its conditional associations with the treatment and the outcome. The analyst may then adjust for the simulated confounder and obtain an adjusted estimate of the causal effect along with its variance (Carnegie, Harada, & Hill, 2016; Ichino, Mealli & Nannicini, 2008). This simulation strategy may be incorporated into the new weighting-based SA procedure.

Weighting-based SA shares the same limitations as those of the weighting-based causal inference strategies. The propensity score weighting-based method for ATT evaluations generally does not require outcome model specification. Nonetheless, it requires specifying the functional form of the propensity score model. It is well known that parametric weighting is vulnerable to possible misspecifications of the functional form of each propensity score model. A misspecified propensity score model may lead to biased estimates of the propensity scores and of the weights (Schafer & Kang, 2008). When the relationship between an omitted confounder and the treatment assignment indicator is misspecified in the propensity score model, we anticipate that parametric weighting-based SA may become problematic as well. In such cases, semi-parametric or non-parametric weighting may provide a remedy. One form of semi-parametric weighting, analogous to post-stratification in survey methodology, is relatively easy to implement. The analyst may stratify the sample on the estimated propensity score and then re-estimate the propensity score according to the observed frequency distribution of treatment assignment within each stratum (Hong, 2012, 2015; Huang et al, 2005). Alternatively, the analyst may estimate propensity scores non-parametrically through analysing generalized boosted models (McCaffrey, Ridgeway, & Morral, 2004). Past research has shown that semi-parametric and non-parametric weighting results are often robust even when the propensity score models are misspecified. We hypothesize that the property of robustness may characterize semi-parametric weighting-based SA as well.

ACKNOWLEDGMENT

The authors thank Ken Frank, Xiao-Li Meng, Stephen Raudenbush, participants of the Department of Public Health Sciences Colloquium and the Education Workshop at the University of Chicago, participants of the biostatistics seminar at the University of Colorado Denver Department of Biostatistics and Informatics, and participants of the University of Notre Dame Department of Applied and Computational Mathematics and Statistics Seminar for their comments on earlier versions of this article. We are particularly grateful for the many valuable suggestions offered by the Associate Editor and the two reviewers. This study was supported by a grant from the National Science Foundation (SES 1659935). In addition, the third author received a Quantitative Methods in Education and Human Development Research Predoctoral Fellowship from the University of Chicago and a National Academy of Education/Spencer Foundation Dissertation Fellowship. This article reflects the views of the authors and does not represent the opinions of the funding agencies.

REFERENCES

Ashenfelter
,
O.
&
Card
,
D.
(
1985
)
Using the longitudinal structure of earnings to estimate the effect of training programs
.
Review of Economics and Statistics
,
67
(
4
),
648
660
.

Bross
,
I.D.J.
(
1966
)
Spurious effects from an extraneous variable
.
Journal of Chronic Disease
,
19
,
637
647
.

Brumback
,
B.A.
,
Hernan
,
M.A.
,
Haneuse
,
S.J.P.A.
&
Robins
,
J.M.
(
2004
)
Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures
.
Statistics in Medicine
,
23
,
749
767
.

Carnegie
,
N.B.
,
Harada
,
M.
&
Hill
,
J.L.
(
2016
)
Assessing sensitivity to unmeasured confounding using a simulated potential confounder
.
Journal of Research on Educational Effectiveness
,
9
(
3
),
395
420
.

Copas
,
J.B.
&
Li
,
H.G.
(
1997
)
Inference for non-random samples
.
Journal of the Royal Statistical Society. Series B (Methodological)
,
59
(
1
),
55
95
.

Cornfield
,
J.
,
Haenszel
,
W.
,
Hammond
,
E.C.
,
Lilienfeld
,
A.M.
,
Shimkin
,
M.B.
&
Wynder
,
E.L.
(
1959
)
Smoking and lung cancer: Recent evidence and a discussion of some questions
.
Journal of the National Cancer Institute
,
22
(
4
),
173
203
.

Dehejia
,
R.H.
&
Wahba
,
S.
(
1999
)
Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs
.
Journal of the American Statistical Association
,
94
(
448
),
1053
1062
.

Dehejia
,
R.H.
&
Wahba
,
S.
(
2002
)
Propensity score-matching methods for nonexperimental causal studies
.
Review of Economics and Statistics
,
84
(
1
),
151
161
.

Ding
,
P.
&
VanderWeele
,
T.J.
(
2014
)
Generalized Cornfield conditions for the risk difference
.
Biometrika
,
101
(
4
),
971
977
.

Ding
,
P.
&
VanderWeele
,
T.J.
(
2016
)
Sensitivity analysis without assumptions
.
Epidemiology
,
27
(
3
),
368
377
.

Frank
,
K.A.
(
2000
)
Impact of a confounding variable on the inference of a regression coefficient
.
Sociological Methods and Research
,
29
(
2
),
147
194
.

Gastwirth
,
J.
,
Krieger
,
A.
&
Rosenbaum
,
P.
(
1998
)
Dual and simultaneous sensitivity analysis for matched pairs
.
Biometrika
,
85
,
907
920
.

Greenland
,
S.
(
1996
)
Basic methods for sensitivity analysis of biases
.
International Journal of Epidemiology
,
25
,
1107
1116
.

Greenland
,
S.
(
2005
)
Multiple-bias modeling for analysis of observational data
.
Journal of the Royal Statistical Society Series A
,
168
,
267
291
.

Harding
,
D.
(
2003
)
Counterfactual models of neighborhood effects: The effect of neighborhood poverty on dropping out and teenage pregnancy
.
The American Journal of Sociology
,
109
,
676
719
.

Heckman
,
J.J.
(
1979
)
Sample selection bias as a specification error
.
Econometrica
,
47
,
153
161
.

Heckman
,
J.J.
&
Vytlacil
,
E.J.
(
2007
) Econometric evaluation of social programs, part I: Causal models, structural models and econometric policy evaluation. In:
Heckman
,
J.J.
and
Leamer
,
E.
(Eds.)
Handbook of econometrics
, vol. 6B,
Amsterdam
:
Elsevier
, pp.
4779
4874
.

Holland
,
P.
(
1986
)
Statistics and causal inference
.
Journal of the American Statistical Association
,
81
(
396
),
945
960
.

Hong
,
G.
(
2010
)
Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data
.
Journal of Educational and Behavioral Statistics
,
35
(
5
),
499
531
.

Hong
,
G.
(
2012
)
Marginal mean weighting through stratification: A generalized method for evaluating multi-valued and multiple treatments with non-experimental data
.
Psychological Methods
,
17
(
1
),
44
60
.

Hong
,
G.
(
2015
)
Causality in a social world: Moderation, mediation and spill-over
.
West Sussex, UK
:
John Wiley & Sons Ltd
.

Hong
,
G.
,
Qin
,
X.
&
Yang
,
F.
(
2018
)
Weighting-based sensitivity analysis in causal mediation studies
.
Journal of Educational and Behavioral Statistics
,
43
(
1
),
32
56
.

Huang
,
I.-C.
,
Frangakis
,
C.
,
Dominici
,
F.
,
Diette
,
G.B.
and
Wu
,
A.W.
(
2005
)
Approach for risk adjustment in profiling multiple physician groups on asthma care
.
Health Services Research
,
40
,
253
278
.

Ichino
,
A.
,
Mealli
,
F.
&
Nannicini
,
T.
(
2008
)
From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity?
Journal of Applied Econometrics
,
23
(
3
),
305
327
.

Imbens
,
G.W.
(
2003
)
Sensitivity to exogeneity assumptions in program evaluation
.
American Economic Review
,
93
(
2
),
126
132
.

LaLonde
,
R.
(
1986
)
Evaluating the econometric evaluations of training programs
.
American Economic Review
,
76
(
4
),
604
620
.

Lee
,
W.C.
(
2011
)
Bounding the bias of unmeasured factors with confounding and effect-modifying potentials
.
Statististics in Medicine
,
30
,
1007
1017
.

Lin
,
D.Y.
,
Psaty
,
B.M.
&
Kronmal
,
R.A.
(
1998
)
Assessing the sensitivity of regression results to unmeasured confounders in observational studies
.
Biometrics
,
54
,
948
963
.

Liu
,
W.
,
Kuramoto
,
S.J.
&
Stuart
,
E.A.
(
2013
)
An introduction to sensitivity analysis for unobserved confounding in nonexperimental prevention research
.
Prevention Science
,
14
(
6
),
570
580
.

Manski
,
C.F.
(
1990
)
Nonparametric bounds on treatment effects
.
The American Economic Review
,
82
(
2
),
319
323
.

Marcus
,
S.M.
(
1997
)
Using omitted variable bias to assess uncertainty in the estimation of an AIDS education treatment effect
.
Journal of Educational and Behavioral Statistics
,
22
(
2
),
193
201
.

Mauro
,
R.
(
1990
)
Understanding L.O.V.E. (left out variables error): A method of estimating the effects of omitted variables
.
Psychological Bulletin
,
108
,
314
329
.

McCaffrey
,
D.F.
,
Ridgeway
,
G.
&
Morral
,
A.R.
(
2004
)
Propensity score estimation with boosted regression for evaluating causal effects in observational studies
.
Psychological Methods
,
9
,
403
425
.

Newey
,
W.K.
(
1984
)
A method of moments interpretation of sequential estimators
.
Economics Letters
,
14
,
201
206
.

Neyman, J.
&
with cooperation of
K. Iwaskiewicz
and
St. Kolodziejczyk
. (
1935
)
Statistical problems in agricultural experimentation (with discussion)
.
Supplement to the Journal of the Royal Statistical Society, Series B
,
2
(
2
),
107
180
.

Poole
,
C.
(
2010
)
On the origin of risk relativism
.
Epidemiology
,
21
,
3
9
.

Qin
,
X.
,
Hong
,
G.
,
Deutsch
,
J.
&
Bein
,
E.
(
2019
)
Multisite causal mediation analysis in the presence of complex sample and survey designs and non-random nonresponse
.
Journal of the Royal Statistical Society, Series A
.
182
(
Part 4
),
1343
1370
.

Ridgeway
,
G.
(
2006
)
Effect of race bias in post-traffic stop outcomes using propensity scores
.
Journal of Quantitative Criminology
,
22
(
1
),
1
29
.

Robins
,
J.M.
(
1987
)
Addendum to “A new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy worker survivor effect”
.
Computers and Mathematics with Applications
,
14
,
923
945
.

Robins
,
J.M.
(
2000
) Marginal structural models versus structural nested models as tools for causal inference. In:
Halloran
,
M.E.
and
Berry
,
D.
(Eds.)
Statistical models in epidemiology, the environment, and clinical trials
, vol
116.
New York
:
Springer
, pp.
95
134
.

Robins
,
J.M.
,
Rotnitzky
,
A.
&
Zhao
,
L.P.
(
1995
)
Analysis of semiparametric regression models for repeated outcomes in the presence of missing data
.
Journal of the American Statistical Association
,
90
,
106
121
.

Rosenbaum
,
P.R.
(
1986
)
Dropping out of high school in the United States: An observational study
.
Journal of Educational Statistics
,
11
,
207
224
.

Rosenbaum
,
P.R.
(
1987
)
Model based direct adjustment
.
Journal of the American Statistical Association
,
82
(
398
),
387
394
.

Rosenbaum
,
P.R.
(
1995
)
Observational studies
.
New York
:
Springer
.

Rosenbaum
,
P.R.
&
Rubin
,
D.B.
(
1983a
)
The central role of the propensity score in observational studies for causal effects
.
Biometrika
,
70
,
41
55
.

Rosenbaum
,
P.R.
&
Rubin
,
D.B.
(
1983b
)
Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome
.
Journal of the Royal Statistical Society, Series B
,
45
,
212
218
.

Rubin
,
D.B.
(
1978
)
Bayesian inference for causal effects: The role of randomization
.
Annals of Statistics
,
6
(
1
),
34
58
.

Rubin
,
D.B.
(
1986
)
Statistics and causal inference: Comment: which ifs have causal answers
.
Journal of the American Statistical Association
,
81
(
396
),
961
962
.

Rubin
,
D.B.
(
1990
).
[On the application of probability theory to agricultural experiments. Essay on the principles. Section 9.] Comment: Neyman (1923) and causal inference in experiments and observational studies
.
Statistical Science
,
5
(
4
),
472
480
.

Schafer
,
J.L.
&
Kang
,
J.
(
2008
)
Average causal effects from nonrandomized studies: A practical guide and simulated example
.
Psychological Methods
,
13
(
4
),
279
313
.

Smith
,
J.A.
&
Todd
,
P.E.
(
2001
)
Reconciling conflicting evidence on the performance of propensity-score matching methods
.
American Economic Review
,
91
(
2
),
112
118
.

VanderWeele
,
T.J.
&
Arah
,
O.A.
(
2011
)
Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders
.
Epidemiology
,
22
,
42
52
.

VanderWeele
,
T.J.
&
Ding
,
P.
(
2017
)
Sensitivity analysis in observational research: Introducing the E-value
.
Annuals of Internal Medicine
,
167
(
4
),
268
275
.

Zhang
,
X.
,
Faries
,
D.E.
,
Li
,
H.
,
Stamey
,
J.D.
&
Imbens
,
G.W.
(
2018
)
Addressing unmeasured confounding in comparative observational research
.
Pharmacoepidemiol Drug Safety
,
27
,
373
382
.

APPENDIX A

We show that, when the treatment effect is a function of an omitted confounder U, the bias differs between the regression-based ATE evaluation and ATT evaluation. We specify the potential outcome models each as a function of U and an observed covariate X as follows:
 
Here Eε1=Eε1|Z=z=0 and Eε0=Eε0|Z=z=0, which indicates that there are no omitted confounders other than U. According to the above specification, the predictive relationship between U and Y depends on the treatment condition. We can derive ATE and ATT according to their respective definitions:
 
When U is omitted, the analytic model in a regression analysis is
It is easy to show that
Let U and X be independent under each treatment condition, in which case we have that β1=α1. The above can be simplified as
In the ATE evaluation, the bias is
In the ATT evaluation, the bias is

Clearly, the bias in the ATE evaluation and that in the ATT evaluation are unequal when βU1 and βU0 are unequal.

APPENDIX B

We show that EY0|Z=1=EWY|Z=0 under the assumption that the treatment assignment is independent of Y0 within levels of the observed pre-treatment covariates X=x.
where

APPENDIX C

Here we prove the weighting-based bias formula in identifying ATT as summarized in Theorem 1.
The last equation holds because EWP-W|Z=0=0, which is proved as follows:

APPENDIX D

Here we prove Theorem 2 that the bias represented as a function of WP-W will become zero if a set of omitted covariates P does not predict the treatment assignment or if P does not predict the outcome in the untreated group.

The proof is provided under the worst-case scenario in which, under each treatment condition, P is independent of the set of covariates X that has already been adjusted for through the initial weight W. The result is generalizable to the case in which P and X are not independent because, in this latter case, P can be partitioned into two orthogonal pieces P^x and P-P^x. Here P^x is strictly a function of x and does not introduce additional confounding once X has been adjusted for; while P-P^x is independent of X and is to be considered in a sensitivity analysis.

To prove Theorem 2, first of all, we can easily show that, when P does not predict Z given X, then prZ=z|X=x,P=p=prZ=z|X=x for z=0,1, in which case WP=W and hence BiasATT=covWP-W,Y|Z=0=0.

We next show that, when P and X are independent given Z=z for z=0,1,
 
 
 
This derivation has made clear that W=fx|Z=1fx|Z=0 and hence
As we have shown in Appendix  C, the weighting-based bias formula can be alternatively written as
Let Y=WY. The weighting-based bias formula is then equal to
 
where hp=fp|Z=1fp|Z=0. Because P is independent of X and Y in the untreated group, hp is independent of Y given Z=0. And hence
We can prove that Ehp|Z=0=1. This is because

Therefore, we have proved that, when P does not predict Y under the control condition, BiasATT=0.

APPENDIX E

This appendix summarizes our deviations showing that, when data are generated under a linear system as shown in models (3) and (4), the sensitivity parameters σ and ρ in identifying ATT can be easily related to the data generation parameters. Under model (3), βp0 represents the linear predictive relationship between an unmeasured covariate P and the potential outcome of the untreated condition Y(0) given the observed covariates X. Under model (4), αP represents the linear predictive relationship between P and the treatment indicator Z. Also from model (4), we have that prZ=1=α0, prZ=1|X=x=α0+α1X-μX, and prZ=1|X=x,P=p=α0+α1X-μX+αpP-μP. To simplify the derivation, we let P and X be marginally independent and let P be binary, in which case μP=prP=1.

Taking the derivative of σ with respect to αP, we obtain
whereKσ=1-α0α02σEP-μP2prZ=0|X=x,P=p2prZ=0|X=x. When σ is non-zero, Kσ is always positive. Hence there is a monotonic relationship between σ and αp. The monotonic relationship is positive when αp>0 and is negative when αp<0.
Taking the derivative of ρ with respect to βp0, we obtain
whereKρ=μP1-μPα0σY|0σE1prZ=0|X=x. Similarly, when σ is non-zero, Kρ is always positive. Hence ρ and βp0 have a monotonic relationship once αp is fixed. The monotonic relationship is positive when αp and βp0 have the same sign and is negative when αp and βp0 have opposite signs.
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)