-
PDF
- Split View
-
Views
-
Cite
Cite
Guanglei Hong, Fan Yang, Xu Qin, Did you Conduct a Sensitivity Analysis? A New Weighting-Based Approach for Evaluations of the Average Treatment Effect for the Treated, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 184, Issue 1, January 2021, Pages 227–254, https://doi.org/10.1111/rssa.12621
- Share Icon Share
Abstract
In non-experimental research, a sensitivity analysis helps determine whether a causal conclusion could be easily reversed in the presence of hidden bias. A new approach to sensitivity analysis on the basis of weighting extends and supplements propensity score weighting methods for identifying the average treatment effect for the treated (ATT). In its essence, the discrepancy between a new weight that adjusts for the omitted confounders and an initial weight that omits them captures the role of the confounders. This strategy is appealing for a number of reasons including that, regardless of how complex the data generation functions are, the number of sensitivity parameters remains small and their forms never change. A graphical display of the sensitivity parameter values facilitates a holistic assessment of the dominant potential bias. An application to the well-known LaLonde data lays out the implementation procedure and illustrates its broad utility. The data offer a prototypical example of non-experimental evaluations of the average impact of job training programmes for the participant population.
1 INTRODUCTION
In causal evaluations that rely on statistical adjustment for covariates that confound the treatment effect on the outcome, omitted confounding is always a major concern. These are covariates with different distributions across the treated group and the untreated group. For example, in the well-known debate about whether smoking causes lung cancer, even after analysts have adjusted for a host of observed individual and environmental factors, critics have nonetheless raised the possibility that some unknown biological agents, presumably more prevalent among smokers, might both trigger smoking and increase one's susceptibility to lung cancer. In the existence of such omitted confounding, the lung cancer rate among smokers had they counterfactually not smoked cannot be assumed identical to the lung cancer rate among non-smokers, even though the two groups share the same observed pre-treatment characteristics after statistical adjustment. The remaining difference in the lung cancer rate between the two groups is the hidden bias due to the omitted confounding. This is illustrated in Figure 1. After a host of observed characteristics has been adjusted for, the difference in lung cancer rate between smokers and non-smokers (i.e. A–C) is the sum of the true effect (i.e. A–B) and the hidden bias (i.e. B–C). A causal conclusion is likely invalid if a hidden bias is large relative to the true effect; if a hidden bias is relatively small, the existence of such a bias may not threaten the causal validity of the initial conclusion.

In accordance with this logic, Cornfield et al. (1959) stressed the crucial role of sensitivity analysis in evaluating past evidence. Having conducted a comprehensive review of the existing studies, they reasoned that the causal claim that cigarette smoking is a cause of lung cancer was not highly sensitive to hidden bias: ‘The magnitude of the excess lung-cancer risk among cigarette smokers is so great that the results can not be interpreted as arising from an indirect association of cigarette smoking with some other agent or characteristic, since this hypothetical agent would have to be at least as strongly associated with lung cancer as cigarette use; no such agent has been found or suggested’ (p.173). This conclusion has continued to hold in the past six decades.
Simply put, a sensitivity analysis (SA) quantifies the amount of hidden bias associated with omitted confounders and evaluates whether removing such a bias would qualitatively change the initial analytic conclusion. It addresses two questions: How large would a hidden bias need to be to alter the initial conclusion? And is such a hidden bias scientifically plausible? (See Bross, 1966, for one of the earliest illustrations.) Conclusions that are harder to alter by a scientifically plausible hidden bias are expected to add a higher value to knowledge about causality.
There are two distinct criteria for judging whether the removal of a hidden bias would lead to a qualitative change: the first refers to a change in practical significance as quantified by effect size, whereas the second refers to a change in statistical significance that is typically determined by the p value in reference to a certain significance level (e.g. 0.05 is a conventional threshold adopted in the social and behavioural sciences). Notably, on the one hand, a change in the statistical significance of an analytic result (such as a change in p value from 0.051 to 0.049) may have little practical significance if the corresponding change in effect size is minimal; on the other hand, a great amount of change in effect size may not necessarily move the p value over an arbitrarily determined conventional threshold yet may have considerable implications for decision-making. For this reason, the sensitivity analysis literature has primarily focused on the former. Following this rationale, we prioritize the assessment of sensitivity in terms of a potential change in practical significance rather than in statistical significance. Nonetheless, we propose in the discussion section an additional procedure for assessing the latter.
Although researchers in the social, behavioural and health sciences have a long tradition of rightfully voicing scepticisms towards conclusions drawn from non-experimental research, most published studies have demonstrated little effort to empirically investigate potential sources of confounding and assess their impact. This is likely because, with a few exceptions, discussions about SA techniques have been primarily limited to the statistics community. To our knowledge, research societies or funding agencies generally have not included SA as a required element in their guidelines for grant application or journal publication.
In analysing non-experimental data, popular strategies for reducing selection bias include analysis of covariance, multiple regression and various propensity score-based techniques such as matching, sub-classification and weighting. Identification typically relies on the strong assumption that, after adjusting for a set of observed pre-treatment covariates, there are no omitted confounders. This is known as the strong ignorability assumption (Rosenbaum & Rubin, 1983b). Confounding covariates may be omitted because they are unobserved, because their functional forms are unknown or because under the constraint of sample size, the analyst must exclude a subset of observed covariates to avoid model overfitting.
Different SA approaches have been proposed in the past corresponding to different statistical adjustment methods employed in the initial analysis. Every SA approach represents the hidden bias associated with an omitted confounder or a set of omitted confounders as a function of a number of sensitivity parameters. In general, a causal conclusion is considered to be sensitive if relatively small values of the sensitivity parameters could lead to a qualitative change in inference; a causal conclusion is insensitive if only extreme values of the sensitivity parameters could alter the inference (Rosenbaum, 1995).
However, as Greenland (2005) pointed out, the complexity of the dependence of analytic results on sensitivity parameters that are often high dimensional due to multiple bias sources ‘render sensitivity analyses difficult to present without drastic (and potentially misleading) simplifications’ (p. 271). Most existing SA strategies have been developed under simplified setups or require additional assumptions that are necessary for reducing the number of sensitivity parameters to make the analysis feasible. The simplified setups inevitably limit the applicability of a particular SA strategy; meanwhile, violations of the simplifying assumptions in real data applications would likely invalidate the SA results. A major concern is that SA conclusions might themselves be sensitive to potential violations of the additional simplifying assumptions. This would lead to a tail-chasing game that might never end.
Moreover, in a study with a given set of covariates, an estimator that makes use of propensity score-based weighting will generally differ from a regression estimator or a matching estimator in the amount of bias that each contains. Hence, sensitivity analysis methods developed under the linear regression framework, for example, generally are not suitable when an initial analysis has been conducted through weighting.
This paper introduces a new weighting-based approach for evaluations of the average treatment effect for the treated (ATT). Average treatment effect for the treated is of theoretical interest when the treated population is a subset of the untreated population prior to the treatment and when the treatment has little relevance to many members of the untreated population. For example, a job training programme typically has no immediate relevance to individuals who have job security; nor is it relevant to those who are not participating in the labour force. A research question about the impact of the programme is applicable only to the unemployed or the underemployed who are arguably represented by the population of individuals who actually participate in the training programme. In non-experimental research, identification of ATT is subject to selection bias due to possible lack of comparability between the treated group and the untreated group prior to the treatment in numerous ways.
The new weighting-based SA approach builds on past SA research and offers a sensible and flexible alternative with broad applicability to non-experimental ATT evaluations that employ weighting. In a nutshell, the discrepancy between a new weight that adjusts for an omitted confounder and an initial weight that omits the confounder captures the role of the confounder that contributes to the bias. We will show that the weighting-based SA extends and supplements the propensity score weighting methods for identifying ATT. Importantly, unlike most of the existing SA methods, the new weighting-based SA strategy does not require additional simplifying assumptions.
We demonstrate the utility of the new SA strategy in the context of a re-evaluation of a job training programme. Non-experimental evaluations of job training programmes have frequently been used by econometricians and statisticians to illustrate new methods for causal inference. A study by LaLonde (1986), however, famously revealed the inadequacy of non-experimental analyses. LaLonde innovatively combined the treated group from an experimental study with comparison groups drawn from large-scale surveys. The combined data represent a typical non-experimental study. He then evaluated a number of non-experimental econometric procedures and concluded that many of them failed to remove selection bias. Dehejia and Wahba (1999) analysed a subset of this data with propensity score stratification or matching. They additionally conducted sensitivity analysis showing that the results were sensitive to the selection of pre-treatment covariates for each adjustment.
We re-analyse the non-experimental data from the LaLonde (1986) study because this application example is well known and, more importantly, because it exhibits typical threats to causal inference associated with omitted confounders. This application study assesses, for male participants in the National Supported Work (NSW) programme in the mid-1970s, the programme impact on earnings. Our non-experimental sample includes the participants in the original NSW-treated group and a comparison group drawn from the Current Population Survey (CPS). After presenting the new SA strategy, we conceptualize different sources of bias in the non-experimental analysis and then walk the reader through the concrete steps of using the weighting-based SA strategy to quantify each source of bias as well as providing a holistic illustration of multiple bias sources. In addition, we provide an example of setting the bounds for sensitivity parameters on the basis of scientific reasoning. We restrict the discussion to binary treatments as these are the most common setup for evaluating ATT.
2 SELECTION BIAS IN ATT EVALUATIONS
Once the analyst has adjusted for a set of observed pre-treatment covariates X, some selection bias will remain if a difference still exists between the treated and the untreated in the average potential earnings without training even though they have had the same distribution of X. This is analogous to ‘B – C’ in Figure 1.
3 EXISTING SA STRATEGIES FOR ATT EVALUATIONS
This section reviews several popular approaches to ATT evaluations and the accompanying SA strategies. These include regression, propensity score matching, propensity score stratification and propensity score weighting. We briefly explain the rationale of each SA strategy and highlight the additional simplifying assumptions that are often required in each case for implementation. Nearly all the existing SA strategies share a common consideration—that is, the hidden bias due to an unmeasured confounder U is related to at least two factors: one is the association between U and the treatment assignment Z; and the other is the association between U and the outcome Y under either or both treatment conditions. Different representations of these two factors seem to be appealing to applied researchers in different fields. For example, regression coefficients (Rosenbaum, 1986), correlations (Frank, 2000) and partial R2 (Imbens, 2003) are familiar to social scientists who operate within the regression framework while risk ratios (Cornfield et al, 1959; Ding & VanderWeele, 2016) are particularly meaningful to researchers in epidemiology or medicine. (Readers may also refer to a review of SA methods for ATE evaluations by Liu and colleagues, 2013, when the treatment, the outcome and the omitted confounder are all binary. Zhang et al. (2018) provide an up-to-date summary of a wide range of SA methods for ATE evaluation that are categorized according to whether information on unmeasured confounders is unavailable, is available within the study data or is available in the literature or in other data source).
3.1 Regression
In non-experimental research, a typical regression analysis regresses an observed outcome Y on a treatment indicator Z and a vector of observed pre-treatment covariates X. In the absence of omitted confounders, the coefficient for Z identifies the ATE that is averaged over the entire population rather than the treated population. Applying the same regression model to a causal analysis of ATT, the analyst must assume that the treatment effect is equal on average for the treated population and the untreated population. ATT and ATE are expected to be unequal when the treatment effect is heterogeneous and is a function of confounding covariates. If a confounding covariate U is omitted, the bias in ATT and that in ATE are expected to be unequal as well (see Appendix A for an illustration). Thus in general, SA strategies for evaluations of ATE do not directly apply to evaluations of ATT.
Regression-based SA has been developed for assessing the potential consequences of omitted confounding. Rosenbaum (1986) proposed an SA approach in a setup of linear models with continuous outcomes when U and X are conditionally independent under each treatment condition. The simplifying assumptions include (a) that the outcome Y under each treatment condition is strictly a linear function of an omitted confounder U after adjustment for the observed covariates X, (b) that the relationship between U and Y does not differ across the treatment conditions and (c) nor does it differ across levels of X. ATT and ATE do not differ under these simplifying assumptions. The confounding impact of U can be represented as a product of two sensitivity parameters: the first is the linear association between U and Y under a given treatment condition and the second is the average difference in U between the two treatment groups after the adjustment for X. Yet, when the additivity assumption (b) does not hold, as Marcus (1997) has shown, two additional sensitivity parameters are required for the evaluation of ATE: one is the effect of the interaction between U and Z in predicting Y; and the other is the mean of U in the treated group. This will lead to a total of four sensitivity parameters. Following a similar logic, Lin and colleagues (1998) extended the results to binary outcomes and censored survival time data.
Others have utilized the same logic when all variables are standradized in a linear regression (Mauro, 1990). Frank (2000) further proposed a sensitivity index as the product of two conditional correlations, one between U and Y and the other between U and Z. On the continuum of this index, an impact threshold indicates the magnitude of a confounding impact necessary to change the causal conclusion. The reference distribution of the sensitivity index allows one to assess the probability of observing such an impact. A parallel development in econometrics involves specifying both an outcome model and a model for the latent treatment selection. The correlation between the two error terms is associated with the direction and the magnitude of bias (Copas & Li, 1997; Heckman, 1979). An alternative strategy, proposed by Imbens (2003), quantifies the proportion of unexplained variation in the treatment and that in the outcome attributable to U, each represented as a partial R2 value. If the partial R2 values required for altering the initial empirical conclusion is much greater than the reasonable values associated with the observed covariates, then the initial result is judged to be insensitive to omitted confounding. This strategy allows for a vector of unmeasured confounders; however, it does not lead to a closed form bias function.
To our knowledge, most regression-based SA strategies were developed for ATE rather than ATT evaluations. One exception is the SA strategy proposed by Carnegie, Harada and Hill (2016) who simulated hypothetical values of a potential confounder based on functional and distributional assumptions. In general, regression-based SA strictly requires that the functional form of the outcome model is known to the analyst (Frank, 2000; Imbens, 2003; Lin et al, 1998; Marcus, 1997; Rosenbaum, 1986; VanderWeele & Arah, 2011). Therefore, the SA results themselves are potentially sensitive to outcome model misspecifications. Finally, the number of regression-based sensitivity parameters and their forms typically depend on the number of omitted confounders and the assumed data generation functions.
3.2 Bounds for relative risks
Cornfield et al. (1959) were pioneers in deriving conditions under which the estimated effect of a treatment obtained from a non-experimental study could be completely explained away by an unmeasured confounder U when the treatment, the outcome and the confounder are all binary. In this setup, the treatment effect on the outcome and the two sensitivity parameters—the association between U and Z, and that between U and Y—are all represented by risk ratios. The classical Cornfield conditions determine the minimal values of the two sensitivity parameters that are necessary for altering the initial causal conclusion. Following this tradition, Lee (2011) made an extension to a categorical U; Poole (2010) and Ding and VanderWeele (2014) considered risk differences in addition to risk ratios. Allowing the association between U and Y to differ across the treatment conditions, Ding and VanderWeele (2016) derived a bounding factor (i.e. a bias index or a sensitivity threshold) as a function of the sensitivity parameters where U could be a single confounder or a vector. Following the same logic as that proposed by Manski (1990) and Frank (2000), VanderWeele and Ding (2017) further suggested that applied researchers calculate an E-value in the case that U would have the same magnitude of associations with the treatment and with the outcome. However, this strategy cannot accommodate the case in which a null finding produced in the initial analysis may be altered. In fact, many null findings suppressed due to ‘publication bias’ are likely sensitive to omitted confounding. Furthermore, when the treatment effect differs between the treated and the untreated, these bounding strategies designed for evaluating ATE are not suitable for ATT.
3.3 Propensity score matching
For non-experimental research, Rosenbaum and Rubin (1983a) proposed summarizing multivariate pre-treatment information in a unidimensional propensity score. In the example of job training, individuals with a relatively high propensity score are predicted to have a relatively high predisposition of participating in a job training programme. The propensity score can be estimated through logistic regression as a function of observed pre-treatment covariates. Propensity score matching is particularly suitable for ATT analysis when there is a large reservoir of sampled individuals in the untreated group that supplies potential matches to the treated individuals. The average within-pair difference in the observed outcome over all the matched pairs estimates the ATT. Under the simplifying assumption that an unmeasured confounder U is a near perfect predictor of a binary outcome and is within a finite range, Rosenbaum (1995) proposed an SA approach that involves a sensitivity index . It denotes the odds ratio of receiving one treatment rather than another between two individuals within a matched pair and hence is related to the predictive relationship between U and Z. In expectation, if the treatment has been randomized. Because any given value of corresponds to a range of treatment assignment probabilities, a sensitivity analysis computes the bounds for the test statistic and p values. These are known as ‘Rosenbaum bounds’. As increases, however, the bounds become wider and therefore less informative, reflecting the increasing uncertainty about U. To consider a more general case in which U imperfectly predicts the outcome, an additional sensitivity index was introduced to represent the odds ratio of displaying the outcome value of interest between two individuals within a matched pair (Gastwirth et al, 1998). See Greenland (1996) for a review and Harding (2003) for an application. Ichino, Mealli, and Nannicini (2008) instead simulated a binary U in accordance with the characterization of its distribution, its association with Z and its association with Y. They included the simulated U in a set of matching covariates and then re-estimated the ATT.
3.4 Propensity score stratification
One may use the estimated propensity score to sort and sub-classify a sample. For an ATT evaluation, the sample may be divided into S strata such that there will be an equal number of treated individuals across all these strata. The treated group and the untreated group in the same stratum share similar values in the estimated propensity score. The within-stratum mean difference in the outcome, averaged over all the strata, estimates the ATT. Considering a binary unobserved confounder U and a binary outcome which simplified the setup, Rosenbaum and Rubin (1983b) highlighted three sensitivity parameters within each propensity stratum s: the distribution of U, the association between U and Z, and the association between U and Y that was assumed constant across the two treatment conditions. Different assumptions about U lead to different values of these sensitivity parameters; the maximum likelihood estimate of the causal effect will change accordingly. However, in a sample that has been divided into J propensity strata, there can be as many as 3 J sensitivity parameters. To limit the size of SA for practical reasons, they proposed another simplifying assumption that each sensitivity parameter be constant across all J strata.
3.5 Propensity score weighting
Propensity score-based weighting adjustment for observed confounding, known as inverse-probability-of-treatment weighting (IPTW) in marginal structural models, was initially proposed by Robins (1987) and Rosenbaum (1987). This method allows the analyst to estimate the population average potential outcome associated with each treatment one at a time before taking mean contrasts. An ATT analysis and an ATE analysis differ in how the weight is constructed. As we will formally show in Section 4, an ATT analysis requires a weight that transforms only the untreated group to approximate the distribution of the potential outcome of the treated group under the counterfactual control condition. In contrast, an ATE analysis requires a transformation of both the treated group and the untreated group.
Robins and colleagues (Brumback et al, 2004; Robins, 2000) developed a weighting-based SA approach that involves a non-identifiable user-supplied confounding bias function. The function contains a single sensitivity parameter and requires no consideration of whether an unmeasured confounder U is univariate or multivariate, continuous or discrete. However, various forms of the bias function correspond to different types of unmeasured confounding, and the analyst receives little guidance in choosing an appropriate functional form. As shown in an application (Brumback et al, 2004), the SA results are apparently sensitive to the form of the bias function specified somewhat arbitrarily by the analyst.
Ridgeway and colleagues (McCaffrey et al, 2004; Ridgeway, 2006) proposed a different SA strategy for weighting-based ATT analysis. These researchers considered, for each sampled individual, the ratio of a propensity score-based weight that makes additional adjustment for an omitted confounder to the initial weight that does not. For an omitted covariate to contribute to selection bias, this ratio of the two weights must be correlated with the outcome as well. After making an assumption about the parametric distribution of this ratio, the analyst simulates data by drawing random values from the assumed distribution and then obtains bounds for the sensitivity results.
4 WEIGHTING-BASED BIAS FORMULA FOR EVALUATING ATT
This article introduces an alternative weighting-based SA approach for ATT evaluations. Rather than considering a ratio of two weights, we use the discrepancy between a new weight that adjusts for the omitted confounders and an initial weight that omits them to capture the role of the confounders. Our SA strategy does not involve any distributional assumptions about these weights or about their discrepancy. We simply derive the effect size of a hidden bias as a product of two sensitivity parameters: one is the standard deviation of the weight discrepancy and the other the correlation between the weight discrepancy and the outcome. We will show that these two sensitivity parameters are meaningful and easy to work with in real data applications.
4.1 Weighting-based sensitivity parameters
The discrepancy between the weight that includes P and the weight that omits P for each individual, , is a random variable that plays a key role in determining how much bias is associated with the omission of P. This bias is and can be quantified as the covariance between the weight discrepancy and the outcome in the untreated group.
Theorem 1In an evaluation of ATT that employs propensity score-based weighting to remove selection bias associated a set of observed pre-treatment covariates X, the remaining bias due to the omission of pre-treatment covariates P isAppendix C gives the proof.
Theorem 2In an evaluation of ATT that employs propensity score-based weighting to remove selection bias associated with a set of observed pre-treatment covariates X, the bias represented as is equal to zero if a set of omitted covariates P does not predict the treatment given X, or if P predicts the treatment but does not predict the outcome in the untreated group given X.
See Appendix D for the proof. This result is consistent with the well-known logic that the omission of P will contribute a bias in an ATT evaluation only if P predicts the treatment and also predicts the outcome in the untreated group given X. However, our theoretical result does not require that the predictive relationship between P and Z and that between P and Y take any pre-specified functional forms.
4.2 Interpretation of the weighting-based sensitivity parameters
We reveal the meaning of each sensitivity parameter first in a relatively simple setup in which the potential outcomes and the treatment indicator are generated under a linear system that users tend to be most familiar with. Subsequently, we reveal the same pattern in a more complex setup involving non-linear data generation functions. We will show that, in both cases, the weighting-based sensitivity parameters are inherently related to the key parameters that generate the data.
In Appendix E, we reveal the mathematical connections between the weighting-based sensitivity parameters and the data generation parameters. Below is a summary of the results.
- 1.
increases in magnitude with the predictive relationship between the omitted confounder and the treatment assignment. The latter is represented by the data generation parameter in model (4). Regardless of whether or , always increases with the magnitude of . When , P no longer predicts Z, in which case reaches its minimum and becomes zero as well. The pattern is illustrated in Figure 2.
- 2.
When is fixed at a non-zero value, increases in magnitude with the predictive relationship between the omitted confounder and the potential outcome when untreated. The latter is represented by in model (3). On the basis of the derivations in Appendix E, we conclude that when . The sign of a non-zero is determined by the sign of the product of and and is consistent with the direction of the bias. That is, indicates a positive bias, which arises when and are both positive or both negative; indicates a negative bias when and have opposite signs. Moreover, when is fixed, an increase in the magnitude of corresponds to an increase in the magnitude of as well as an increase in the magnitude of the bias. Figure 3 shows these patterns.

Correspondence between and under model (4). and both indicate the predictive relationship between the omitted confounder and the treatment assignment. increases with the magnitude of when ; when

Correspondence between and under models (3) and (4). indicates the predictive relationship between the omitted confounder and the potential outcome when untreated. The sign of is determined by the sign of . When is fixed, is a monotonic function of ; when
Under models (3) and (5), closed form relationships between the weighting-based sensitivity parameters and the data generation parameters are not obtained. However, their connections can be learned through numerical studies. Figures 4 and 5 show the monotonic relationships between and and between and .

Correspondence between and under the non-linear treatment assignment model (5). increases with the magnitude of when ; when

Correspondence between and under models (3) and (5). The sign of is determined by the sign of . When is fixed, is a monotonic function of ; when
Although the weighting-based sensitivity parameters and are inherently connected to the familiar regression coefficients under the above setups of data generation, the weighting-based SA strategy does not require the simplifying assumptions. This is because weighting-based adjustments in their basic forms do not involve specifying the covariate-outcome relationships, the distribution of the omitted covariate, the distribution of the errors or the distribution of the weight. In contrast, other SA strategies typically require that the analyst choose among different model-based assumptions, distributional assumptions or assumptions about the form of an unknown sensitivity function; and these analytic decisions are potentially consequential. Moreover, unlike many of the SA methods that are restricted to a binary outcome and a binary unmeasured confounder, this new SA strategy shares an important feature with the other propensity score weighting-based SA strategies: it conveniently assesses bias associated with one or more omitted confounders and places no constraints on measurement scales for unmeasured confounders. Finally, the number of weighting-based sensitivity parameters and their forms do not change regardless of how the data are generated.
The new weighting-based SA strategy is consistent with all weighting-based approaches to causal inference and can be naturally incorporated as a logical step in an ATT evaluation. It assesses the potential remaining bias when the initial analysis has also been conducted through propensity score-based weighting. Weighting-based SA does not apply when the initial analysis has been conducted through other approaches such as regression or propensity score matching. Similarly, sensitivity analysis methods developed for statistical adjustment methods other than weighting are not suitable when an initial analysis has been conducted through weighting. This is because, in general, we do not expect that the bias in the regression estimator or the matching estimator be the same as the bias in the IPTW estimator of a causal effect. The mathematical results in Appendix E can be used to clarify this point. In a scenario of data generation that is even simpler than the models specified in Equation ((3)), let . The effect size of bias in a regression estimator due to the omission of U is . In contrast, in a weighting-based ATT analysis, the effect size of bias is where is a non-linear function of and where is a non-linear function of . In general, we do not expect and to be equal.
5 APPLICATION
We use the LaLonde (1986) data to illustrate the new weighting-based SA strategy for ATT evaluations. Programme participation occurred in 1976–1977. The experimental result suggested that the training programme likely increased earnings in 1978. Our non-experimental sample includes the 297 male participants in the original NSW-treated group and a non-equivalent comparison group of 15,992 males from the Current Population Survey (CPS).
5.1 Initial Weighted ATT analysis
A number of pre-treatment characteristics including age, education, black, Hispanic, marriage status, high school completion and earnings in 1975 were measured for all the participants and for those in the comparison group. These are arguably some of the most important predictors of the outcome. On average, the treated group tended to be younger, less educated, more likely to be black or Hispanic, unmarried, without a high school degree (i.e. high school dropout), and earned less than the comparison group in 1975. With a standard deviation of the outcome in the CPS comparison group as the scaling unit ($8628), a naïve comparison shows that the treated group earned about one standard deviation less than the CPS comparison group on average.
To identify the ATT effect, we estimate for each individual in the CPS comparison group an IPTW weight defined in Equation ((1)) as a function of the above covariates. Using the minimum and maximum values of the estimated propensity score in the treated group as the lower and upper bounds of the common support, we exclude from the subsequent analysis 7984 comparison group members whose estimated propensity scores were lower than that of all the treated group members. These individuals in the comparison group apparently had an almost zero probability of participating in the NSW programme. Applying this weight to the data within the common support, we obtain an estimate of the effect size of ATT −0.050 with confidence interval (−0.159, 0.059), which indicates a largely negative effect of the job training programme on earnings. We now investigate whether this non-experimental result is sensitive to several different types of omitted potential confounders.
5.2 Potential sources of omitted confounding
In general, two types of omitted confounding are likely in an ATT analysis:
A observed covariates omitted from the initial adjustment due to concerns about model overfitting, including omitted non-linear or other higher-order functions of the observed covariates;
B unobserved covariates.
Type A omission may exist when the number of observed pre-treatment covariates in a study is relatively large and exceeds 10% of the sample size in either the treated group or the untreated group. To avoid model overfitting as it would introduce excessive noise to the estimation, the analyst is usually advised to select a subset of the pre-treatment covariates on theoretical grounds. It is likely that some of the unselected covariates may contribute to the remaining bias. This is not an issue in the analysis of the LaLonde data which contain a very limited number of observed covariates. However, because the functional form of a true propensity score model is unknown to the analyst, non-linear or other higher-order functions of the observed covariates are typically omitted from the user-specified propensity score model. These may include, for example, polynomial terms of the continuous covariates or multi-way interactions between the covariates. A close inspection of the data may further reveal non-linear patterns that do not fit polynomial functions, in which case non-parametric step functions may provide a useful alternative.
Type B omission can hardly be ruled out in the analyses of non-experimental data. In the current application, critics may easily point out some unmeasured individual characteristics that may predict both programme participation and future earnings even after the analyst has adjusted for all the observed covariates. Additional confounders may include, for example, social-emotional skills, social network resources and various stressors in life. Theoretical reasoning and evidence from other existing datasets may provide information on how these unobserved covariates may be compared to the observed ones. Such information is necessary for determining the range of plausible values of the sensitivity parameters.
5.3 Weighting-based SA procedure
An analytic procedure that implements the proposed weighting-based SA strategy includes five major steps:
Step 1. Obtain the initial weight and the effect size of the ATT estimate;
Step 2. Obtain a new weight according to the type of omission;
Step 3. Estimate and each as a function of the weight discrepancy;
Step 4. Estimate the effect size of bias ;
Step 5. Obtain an adjusted estimate of the effect size of ATT.
To take into account uncertainty in the estimation of the sensitivity parameters, we repeat the above procedure over 1000 bootstrapped samples. This procedure, however, does not assess whether statistical inferences are sensitive to plausible hidden bias, a topic that we leave to the discussion section.
In Step 2, for type A omission, a new weight can be easily obtained by including in the propensity score model one or more observed covariates that have been omitted or a non-linear function of the observed covariates. For type B omission, we follow the logic that if two independent covariates have similar confounding impacts, omitting either one or the other would lead to similar bias. Therefore, if the confounding impact of an unobserved covariate is arguably comparable to that of an observed covariate, the analyst may use the latter to generate referent sensitivity parameter values for the former (e.g. Carnegie et al, 2016; Imbens, 2003). The discrepancy between a weight that includes the referent covariate and an alternative weight that excludes the referent covariate is computed to help quantify the bias attributable to an unobserved covariate arguably comparable to the referent covariate. By inspecting a range of potential omissions, the analyst would obtain a sense of plausible values of bias in some worst conceivable scenarios.
5.4 Display of weighting-based SA results
Table 1 presents the SA results for assessing bias due to type A omission. We use U1–U8 to represent eight different sets of omitted forms of the observed covariates. These include the step functions for age, education and earnings in 1975, and interactions between the categorical covariates. For example, the standard deviation of the discrepancy between the new weight that includes the step function for age and the initial weight is 4.205 while the correlation between the weight discrepancy and the outcome is 0.007. The product of and is equal to 0.03, which estimates the effect size of the bias due to misspecifying a non-linear function of age as linear. Once this bias is removed, the effect size of the adjusted estimate of ATT becomes −0.08. Among the pairs of and obtained from 1,000 bootstrapped samples, only 6.9% of them correspond to an effect size of negative bias >−0.05 in magnitude. In other words, the majority of the estimated values of the effect size of bias associated with this omission are not great enough to reverse the sign of the initial ATT estimate. These results are listed in the first row corresponding to U1. Similarly, we estimate the effect size of the bias associated with each of the other omitted terms. These omissions lead to either a positive or a negative bias; yet, none of them contributes a bias large enough to be consequential.
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
U1: step functions for Age | 4.205 | 0.007 | 0.030 | −0.080 | 6.9 |
U2: step functions for Education | 0.890 | 0.013 | 0.012 | −0.052 | 7.9 |
U3: step functions for re75 | 0.336 | −0.016 | −0.005 | −0.048 | 13.7 |
U4: black × married | 0.059 | 0.040 | 0.002 | −0.052 | 11.0 |
U5: hispanic × married | 0.038 | 0.018 | 0.001 | −0.051 | 11.1 |
U6: black × nodegree | 0.104 | −0.038 | −0.004 | −0.046 | 14.6 |
U7: hispanic × nodegree | 0.072 | 0.026 | 0.002 | −0.052 | 10.1 |
U8: nodegree × married | 0.250 | 0.030 | 0.008 | −0.058 | 7.8 |
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
U1: step functions for Age | 4.205 | 0.007 | 0.030 | −0.080 | 6.9 |
U2: step functions for Education | 0.890 | 0.013 | 0.012 | −0.052 | 7.9 |
U3: step functions for re75 | 0.336 | −0.016 | −0.005 | −0.048 | 13.7 |
U4: black × married | 0.059 | 0.040 | 0.002 | −0.052 | 11.0 |
U5: hispanic × married | 0.038 | 0.018 | 0.001 | −0.051 | 11.1 |
U6: black × nodegree | 0.104 | −0.038 | −0.004 | −0.046 | 14.6 |
U7: hispanic × nodegree | 0.072 | 0.026 | 0.002 | −0.052 | 10.1 |
U8: nodegree × married | 0.250 | 0.030 | 0.008 | −0.058 | 7.8 |
re75 is the variable name for earnings in 1975. The last column of this table displays the percentage of bootstrapped samples that produce a negative estimate of the effect size of bias large enough to change the sign of the estimated ATT.
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
U1: step functions for Age | 4.205 | 0.007 | 0.030 | −0.080 | 6.9 |
U2: step functions for Education | 0.890 | 0.013 | 0.012 | −0.052 | 7.9 |
U3: step functions for re75 | 0.336 | −0.016 | −0.005 | −0.048 | 13.7 |
U4: black × married | 0.059 | 0.040 | 0.002 | −0.052 | 11.0 |
U5: hispanic × married | 0.038 | 0.018 | 0.001 | −0.051 | 11.1 |
U6: black × nodegree | 0.104 | −0.038 | −0.004 | −0.046 | 14.6 |
U7: hispanic × nodegree | 0.072 | 0.026 | 0.002 | −0.052 | 10.1 |
U8: nodegree × married | 0.250 | 0.030 | 0.008 | −0.058 | 7.8 |
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
U1: step functions for Age | 4.205 | 0.007 | 0.030 | −0.080 | 6.9 |
U2: step functions for Education | 0.890 | 0.013 | 0.012 | −0.052 | 7.9 |
U3: step functions for re75 | 0.336 | −0.016 | −0.005 | −0.048 | 13.7 |
U4: black × married | 0.059 | 0.040 | 0.002 | −0.052 | 11.0 |
U5: hispanic × married | 0.038 | 0.018 | 0.001 | −0.051 | 11.1 |
U6: black × nodegree | 0.104 | −0.038 | −0.004 | −0.046 | 14.6 |
U7: hispanic × nodegree | 0.072 | 0.026 | 0.002 | −0.052 | 10.1 |
U8: nodegree × married | 0.250 | 0.030 | 0.008 | −0.058 | 7.8 |
re75 is the variable name for earnings in 1975. The last column of this table displays the percentage of bootstrapped samples that produce a negative estimate of the effect size of bias large enough to change the sign of the estimated ATT.
Table 2 lists the SA results for assessing bias due to type B omission. We use each of the seven observed covariates in the initial propensity score model denoted by X1–X7 as referents for comparable unobserved covariates. Among them, earnings in 1975 (variable name: re75) is the most important confounder. An omitted covariate comparable to re75 in its confounding impact would contribute a negative bias with its effect size as great as −0.256. Further removal of such a potential bias would lead to a positive estimate of the effect size of the ATT equal to 0.206. In this particular case, all the pairs of and obtained from the 1000 bootstrapped samples correspond to an effect size of negative bias >−0.05 in magnitude. Unobserved covariates comparable to most of the other observed covariates except for the indicator for being black, however, would not contribute a bias great enough to qualitatively change the initial conclusion.
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
X1: age | 0.720 | 0.002 | 0.001 | −0.051 | 12.8 |
X2: education | 0.378 | 0.026 | 0.010 | −0.060 | 8.7 |
X3: black | 3.743 | −0.025 | −0.094 | 0.044 | 84.9 |
X4: hispanic | 0.470 | 0.015 | 0.007 | −0.057 | 9.0 |
X5: married | 1.051 | 0.008 | 0.009 | −0.059 | 9.9 |
X6: nodegree | 1.367 | 0.002 | 0.003 | −0.053 | 11.9 |
X7: re75 | 1.426 | −0.180 | −0.256 | 0.206 | 100.0 |
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
X1: age | 0.720 | 0.002 | 0.001 | −0.051 | 12.8 |
X2: education | 0.378 | 0.026 | 0.010 | −0.060 | 8.7 |
X3: black | 3.743 | −0.025 | −0.094 | 0.044 | 84.9 |
X4: hispanic | 0.470 | 0.015 | 0.007 | −0.057 | 9.0 |
X5: married | 1.051 | 0.008 | 0.009 | −0.059 | 9.9 |
X6: nodegree | 1.367 | 0.002 | 0.003 | −0.053 | 11.9 |
X7: re75 | 1.426 | −0.180 | −0.256 | 0.206 | 100.0 |
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
X1: age | 0.720 | 0.002 | 0.001 | −0.051 | 12.8 |
X2: education | 0.378 | 0.026 | 0.010 | −0.060 | 8.7 |
X3: black | 3.743 | −0.025 | −0.094 | 0.044 | 84.9 |
X4: hispanic | 0.470 | 0.015 | 0.007 | −0.057 | 9.0 |
X5: married | 1.051 | 0.008 | 0.009 | −0.059 | 9.9 |
X6: nodegree | 1.367 | 0.002 | 0.003 | −0.053 | 11.9 |
X7: re75 | 1.426 | −0.180 | −0.256 | 0.206 | 100.0 |
Omitted covariate . | . | . | Estimated ES of bias . | ES of adjusted estimate . | % Bootstrapped results consequential . |
---|---|---|---|---|---|
X1: age | 0.720 | 0.002 | 0.001 | −0.051 | 12.8 |
X2: education | 0.378 | 0.026 | 0.010 | −0.060 | 8.7 |
X3: black | 3.743 | −0.025 | −0.094 | 0.044 | 84.9 |
X4: hispanic | 0.470 | 0.015 | 0.007 | −0.057 | 9.0 |
X5: married | 1.051 | 0.008 | 0.009 | −0.059 | 9.9 |
X6: nodegree | 1.367 | 0.002 | 0.003 | −0.053 | 11.9 |
X7: re75 | 1.426 | −0.180 | −0.256 | 0.206 | 100.0 |
According to the SA results listed in Tables 1 and 2, an omitted confounder or a set of omitted confounders comparable to either black (X3) or re75 (X7) in the confounding impact would introduce a negative bias large enough to change the direction of the ATT estimate from negative to positive. In particular, removing a negative bias comparable to the confounding impact of re75 would further change the practical significance of the causal conclusion as it would indicate that the job training programme increased earnings by about one-fifth of a standard deviation. The plausible values of bias due to most other actual or potential omissions are rather small in magnitude and differ in signs; hence, the cumulative impact of these other omissions is likely negligible.
Figure 6 provides the holistic graphical display of the SA results in this application. Here has a lower bound 0 and an upper bound with some finite value; takes values between −1 and 1. The estimated effect size of bias is zero when or when . In the current application, the effect size of the initial ATT estimate is −0.05. The result would become positive if the effect size of bias associated with an omitted confounder is negative with a magnitude greater than 0.05. The dotted curve in Figure 6 corresponds to this threshold bias value. The referent covariates X3 and X7 provide two examples of negative bias values that are great enough to be consequential as both are located to the left of the threshold.

Graphical Display of Sensitivity Analysis Results. Note: The contours associated with non-zero values of and each represent a fixed amount of bias equal to , the value of which is marked at the end of a contour. The dotted curve corresponds to the threshold value −0.05. Removing a bias beyond this threshold value would change the sign of the ATT estimate from negative to positive. A possible omission would lead to an amount of bias as indicated in this figure. Each ‘*’ corresponds to a covariate or a set of covariates that, if omitted, would contribute a bias great enough to change the analytic conclusion; each ‘’ corresponds to a covariate or a set of covariates, the omission of which would be inconsequential. These covariates are listed in Tables 1 and 2
5.5 Sensitivity bounds informed by theory and evidence
How plausible is it that an omitted covariate would be comparable to earnings in 1975 in its confounding impact? An answer to this question requires careful scientific reasoning. For example, citing the economics literature (Ashenfelter & Card, 1985), Dehejia and Wahba (2002) pointed out that many applicants for job training programmes had suffered from a sudden drop in earnings in the previous year but not necessarily the year before. In other words, for the treated individuals who had become unemployed or underemployed in 1975, their earnings in 1974 might provide additional predictive information about their potential earnings in 1978. Because selection bias may originate in the longitudinal trend of earnings, additional adjustment for between-group differences in earnings in 1974 is arguably necessary. A challenge, however, is that the experimental study failed to collect information on earnings in 1974 from nearly 40% of the participants. Due to the large amount of missing data in this covariate in the treated group, one cannot make direct statistical adjustment for earnings in 1974. Therefore, this covariate was omitted in the econometric analyses of the non-experimental data by LaLonde (1986). The exclusion of these 40% of the observations by Dehejia and Wahba in order to incorporate this additional variable into their propensity score model, as Smith and Todd (2001) pointed out, had ‘a strong effect on their results’ (p. 113). Our goal is to assess the potential consequence of this omission without excluding any observations. For simplicity, we use the original variable names re74, re75 and re78 to represent earnings in 1974, 1975 and 1978, respectively.
We have shown in section 4 of this article that sensitivity parameter is monotonically related to the predictive relationship between the omitted covariate and the outcome under the untreated condition. Given that re74 and re75 are highly correlated, our first goal is to assess the predictive relationship between re74 and re78 in the untreated group when re75 has already been adjusted for. The estimated partial correlation between re74 and re78 is 0.17 conditioning on re75; and that between re75 and re78 is 0.29 conditioning on re74. Hence we reason that, the value of for re74 conditioning on re75 is likely smaller in magnitude than the value of for re75 conditioning on re74; and the latter is expected to be smaller in magnitude than the value of for re75 when re74 is omitted, the estimate of which is −0.180 as shown in Table 2 and is adopted as the lower bound for the value of for re74 in the sensitivity analysis.
Section 4 has also shown that sensitivity parameter is monotonically related to the predictive relationship between the omitted covariate and the treatment assignment. Because re74 is only partially observed in the treated group, we are unable to directly determine the value of for this covariate. Yet, the predictive relationship between re75 and the treatment assignment is stronger among those whose re74 was observed than among those whose re74 was missing. We reason that a similar pattern would likely hold for re74 in predicting the treatment assignment. In other words, should we obtain a new weight that would make an additional adjustment for re74, the value of for re74 would likely be smaller if re74 had been observed for all the treated individuals than if the analysis would be restricted to the subsample of individuals whose re74 was actually observed. In the latter case, we obtain an estimate of equal to 0.247 for re74. Hence, it seems reasonable to adopt 0.247 as the upper bound of for re74 in this application.
We now bound for re74 between 0 and 0.247 and bound for re74 between −0.180 and 0. Therefore, the effect size of bias associated with re74 is to be bounded between −0.044 and 0. Once this bias is removed, the effect size of the ATT estimate will be bounded between −0.050 and −0.006. This result suggests that, although the confounding impact of re74 may not be as great as that of re75, we are certain that the omission of re74 has led to a considerable amount of negative bias in the initial ATT estimate.
In many application studies, an important confounder may be partially observed or entirely unobserved in a given dataset. Yet, the current data or other existing secondary data may provide useful information enabling the analyst to set up informative bounds for its sensitivity parameters. We underscore the necessity of bringing scientific reasoning to bear on the utilization of such information.
6 CONCLUSION AND DISCUSSION
Sensitivity analysis is an indispensable step in evaluating ATT when the data are non-experimental. This is because statistical adjustment for observed pre-treatment covariates is often inadequate for removing all the selection bias. This article introduces a new weighting-based SA approach that provides a useful alternative to the existing ones. The new approach applies when the initial ATT analysis has been conducted through propensity score-based weighting. Although this paper primarily considers a binary treatment and a continuous outcome, the same rationale can be extended to studies of a treated group with multiple untreated comparison groups and to categorical outcomes. A similar logic has been applied to weighting-based SA in causal mediation analysis (Hong, Qin, & Yang, 2018; Qin et al, 2019).
This new weighting-based SA strategy is relatively straightforward and intuitive, and demonstrates important advantages. First of all, unlike many of the existing SA strategies that often invoke additional simplifying assumptions that tend to be overly strong and restrictive, this new strategy does not rely on additional assumptions. Even though an analyst may choose to invoke parametric assumptions in specifying the propensity score model for treatment assignment, such assumptions are not required by this new SA procedure. This is because the analyst may freely choose to estimate the propensity score through semi-parametric or non-parametric weighting. Second, regardless of how complex the data generation functions are, the number of weighting-based sensitivity parameters remains small and their forms never change in ATT evaluations. Third, weighting-based SA are unconstrained by the measurement scales of the omitted confounders. Fourth, weighting-based SA can conveniently assess the aggregate bias associated with multiple omitted confounders. And finally, a graphical display provides a holistic view that may suggest whether the bias due to omitted confounding is predominantly positive or negative and the extent to which positive and negative biases may cancel out. We make available online the application data along with the R code for the analysis reported in Section 5.
This paper has focused on the weighting-based SA strategy for ATT evaluation. We are conducting research that extends this weighting-based SA framework to ATE evaluation. In a non-experimental evaluation of ATT, bias arises when the treated group and the control group differ in the mean potential outcome associated with the control condition only; in an ATE evaluation, a second bias arises when the treated group and the control group additionally differ in the mean potential outcome associated with the experimental condition. Hence, the number of sensitivity parameters will increase from 2 to 4. Nonetheless, the same advantages of weighting-based SA will remain salient.
Similar to most existing SA methods, this weighting-based SA strategy focuses on assessing the contribution of omitted confounders to bias in identification rather than assessing the impact of such omissions on efficiency in estimation. This is particularly the case with type B omission when theoretically important covariates are not directly observed in a given application. Also consistent with the bulk of the SA literature, emphasis is placed on qualitative changes in practical significance rather than in statistical significance. Yet in general, additional adjustment for a confounding covariate may change not only the expectation but also the variance of the ATT estimator. The sensitivity of statistical inference is important in its own regard especially when decision-making is contingent on the statistical significance of an analytic result. For type A omissions, once an important source of bias has been detected among the observed covariates or among their non-linear functions that were omitted from the initial analysis, a modified ATT analysis that makes additional adjustment for ‘the bias contributor’ will generate a new point estimate and its standard error. A closed form of the asymptotic variance of the ATT estimator can be obtained that takes into account the estimation uncertainty in the weight. This is done through simultaneously solving a set of mean zero estimating equations for the ATT and for the coefficients in the propensity score models (Newey, 1984; Robins et al, 1995; Schafer & Kang, 2008). Alternatively, bootstrapping may be employed to empirically quantify the estimation uncertainty. For type B omissions, researchers have proposed simulating an unmeasured confounder by taking random draws from its conditional distribution given its conditional associations with the treatment and the outcome. The analyst may then adjust for the simulated confounder and obtain an adjusted estimate of the causal effect along with its variance (Carnegie, Harada, & Hill, 2016; Ichino, Mealli & Nannicini, 2008). This simulation strategy may be incorporated into the new weighting-based SA procedure.
Weighting-based SA shares the same limitations as those of the weighting-based causal inference strategies. The propensity score weighting-based method for ATT evaluations generally does not require outcome model specification. Nonetheless, it requires specifying the functional form of the propensity score model. It is well known that parametric weighting is vulnerable to possible misspecifications of the functional form of each propensity score model. A misspecified propensity score model may lead to biased estimates of the propensity scores and of the weights (Schafer & Kang, 2008). When the relationship between an omitted confounder and the treatment assignment indicator is misspecified in the propensity score model, we anticipate that parametric weighting-based SA may become problematic as well. In such cases, semi-parametric or non-parametric weighting may provide a remedy. One form of semi-parametric weighting, analogous to post-stratification in survey methodology, is relatively easy to implement. The analyst may stratify the sample on the estimated propensity score and then re-estimate the propensity score according to the observed frequency distribution of treatment assignment within each stratum (Hong, 2012, 2015; Huang et al, 2005). Alternatively, the analyst may estimate propensity scores non-parametrically through analysing generalized boosted models (McCaffrey, Ridgeway, & Morral, 2004). Past research has shown that semi-parametric and non-parametric weighting results are often robust even when the propensity score models are misspecified. We hypothesize that the property of robustness may characterize semi-parametric weighting-based SA as well.
ACKNOWLEDGMENT
The authors thank Ken Frank, Xiao-Li Meng, Stephen Raudenbush, participants of the Department of Public Health Sciences Colloquium and the Education Workshop at the University of Chicago, participants of the biostatistics seminar at the University of Colorado Denver Department of Biostatistics and Informatics, and participants of the University of Notre Dame Department of Applied and Computational Mathematics and Statistics Seminar for their comments on earlier versions of this article. We are particularly grateful for the many valuable suggestions offered by the Associate Editor and the two reviewers. This study was supported by a grant from the National Science Foundation (SES 1659935). In addition, the third author received a Quantitative Methods in Education and Human Development Research Predoctoral Fellowship from the University of Chicago and a National Academy of Education/Spencer Foundation Dissertation Fellowship. This article reflects the views of the authors and does not represent the opinions of the funding agencies.
REFERENCES
APPENDIX A
Clearly, the bias in the ATE evaluation and that in the ATT evaluation are unequal when and are unequal.
APPENDIX B
APPENDIX C
APPENDIX D
Here we prove Theorem 2 that the bias represented as a function of will become zero if a set of omitted covariates P does not predict the treatment assignment or if P does not predict the outcome in the untreated group.
The proof is provided under the worst-case scenario in which, under each treatment condition, P is independent of the set of covariates X that has already been adjusted for through the initial weight . The result is generalizable to the case in which P and X are not independent because, in this latter case, P can be partitioned into two orthogonal pieces and . Here is strictly a function of x and does not introduce additional confounding once X has been adjusted for; while is independent of X and is to be considered in a sensitivity analysis.
To prove Theorem 2, first of all, we can easily show that, when P does not predict Z given X, then for , in which case and hence .
Therefore, we have proved that, when P does not predict Y under the control condition,
APPENDIX E
This appendix summarizes our deviations showing that, when data are generated under a linear system as shown in models (3) and (4), the sensitivity parameters and in identifying ATT can be easily related to the data generation parameters. Under model (3), represents the linear predictive relationship between an unmeasured covariate P and the potential outcome of the untreated condition Y(0) given the observed covariates X. Under model (4), represents the linear predictive relationship between P and the treatment indicator Z. Also from model (4), we have that , , and . To simplify the derivation, we let P and X be marginally independent and let P be binary, in which case .