## Abstract

How a treatment causes a particular outcome is a focus of inquiry in political science. When treatment data are either nonrandomly assigned or missing, the analyst will often invoke ignorability assumptions: that is, both the treatment and missingness are assumed to be as if randomly assigned, perhaps conditional on a set of observed covariates. But what if these assumptions are wrong? What if the analyst does not know why—or even if—a particular subject received a treatment? Building on Manski, Molinari offers an approach for calculating nonparametric identification bounds for the average treatment effect of a binary treatment under general missingness or nonrandom assignment. To make these bounds substantively more informative, Molinari’s technique permits adding monotonicity assumptions (e.g., assuming that treatment effects are weakly positive). Given the potential importance of these assumptions, we develop a new Bayesian method for performing sensitivity analysis regarding them. This sensitivity analysis allows analysts to interpret the assumptions’ consequences quantitatively and visually. We apply this method to two problems in political science, highlighting the method’s utility for applied research.

## 1 Introduction

How does job loss affect vote choice? Can democracy impede war onset? Do single member districts impact political party structure? Whether using experimental data with random assignment or observational data with unknown assignment mechanisms, political scientists seek to infer the causal effect of treatments, *z*, on outcomes, *y*. Unfortunately, in many, if not most, studies involving observational data, the treatment is not randomly assigned or, even worse, is missing for some observations.

Nonrandom treatment assignment is the defining feature of observational data (Cochran and Rubin 1973; Rubin 2006). Without random assignment, variables that affect other than may be distributed differently across the treated and control groups in ways that cannot be statistically determined. In other words, in observational data the treatment is generally confounded with other covariates, and only some of the potential confounders may be observable. Consequently, “it is virtually impossible in many practical circumstances to be convinced that the estimates of the effects of treatments are in fact unbiased” (Cochran and Rubin 1973, 30). Nevertheless, analysts must sometimes rely on observational data, particularly when random treatment assignment would be impossible (e.g., the impact of a coup d’état on democratization, or studies based on historical data).

Missing treatment data are also a pervasive problem, particularly in surveys. For example, Sharp and Joslyn (2001), in attempting to identify the impact of viewing sexually explicit material on support for regulation of pornography, note that respondents chose not to report the true amount of pornography they viewed in the past year. Similarly, Molinari (2010) highlights a study of the effect of drinking on birth outcomes, where a question asking about drinking behavior during pregnancy had a nonresponse rate of 6–14% by female respondents. Given a societal stigma associated with drinking while pregnant, nonresponses in the survey are probably related to beliefs and behaviors that affect birth outcomes.

When faced with these two limitations—nonrandom assignment and missing treatment data—analysts frequently turn to modeling assumptions. Most typically, the analyst makes an ignorability assumption using observable covariates. This assumption implies that, conditional on a covariate profile, treatment assignment can be treated as if randomly assigned and missingness is at random or ignorable. Since ignorability is a strong and, in many cases, implausible assumption, analysts may attempt to model the treatment assignment and missingness mechanisms, as with inverse probability weighting methods (Glynn and Quinn 2010).

We advocate an alternative and, in some respects, complementary approach: to embrace uncertainty. In particular, Molinari (2010) offers an approach for identifying and calculating the interval in which the average treatment effect (ATE) of a binary treatment must be contained even when treatment data are nonrandom and missing.^{1} This approach requires no assumptions of ignorability and, beginning with the same basic setup as Manski’s derivation of bounds for the ATE, is completely nonparametric (Manski 1990, 1995, 1997).

Like other approaches that compute identification bounds for treatment effects, the bounds, though sharp, are not always substantively informative.^{2} Consequently, Molinari proposes adding two monotonicity assumptions when appropriate. One assumption holds that when the treatment has a nonzero effect, the effect is only and always in one direction: the effect is assumed to be either weakly positive (positive monotonicity) or weakly negative (negative monotonicity). For instance, in the above pregnancy example, one might assume that drinking only negatively affects birth weight. The other assumption is that the average potential outcomes of those treated will be either weakly greater (positive monotonicity) or weakly lesser (negative monotonicity) than those of the untreated. Continuing with the pregnancy example, the assumption might be that women who drink more were likely to have lower birthweight children for other reasons.

Including monotonicity assumptions can shrink the bounds, but such assumptions are purely theoretical; the motivation for these assumptions must be developed *a priori*. Hence the analyst may wish to determine the extent to which these assumptions drive the results. We develop a technique for determining the sensitivity of the bounds to the monotonicity assumptions by considering that there may be subpopulations in which there are different kinds of effects.

With motivations somewhat different from ours, others have critically studied sensitivity analyses (Rosenbaum 2002; Quinn 2009; Imai, Keele, and Yamamoto 2010; Imai et al. 2011) and bounds (Yamamoto 2012) in the assessment of causal effects. For instance, Quinn (2009) presents a Bayesian analysis of the effect of confounders on what are essentially Manski bounds for complete data without monotonicity assumptions, whereas Imai et al. (2011) develop a sensitivity analysis for key assumptions underlying mediation analysis. For the most part we do not attempt to measure the consequences of confounders nor to discriminate direct and indirect effects. Others have applied these techniques (Blattman 2009; Poast 2012). For example, Poast (2012) uses Rosenbaum sensitivity analysis to identify the extent to which unobserved confounders drive his finding that trade linkage offers increase the probability of states ending an alliance negotiation in agreement. Although confounders are always a challenge to inference with observational data, bounds can show the range of effect estimates that may be found when the data are analyzed with a model.

This article has four goals. First, given the potential for both nonignorable treatment assignment and missing treatment data in many settings, we introduce to political scientists Molinari’s technique for computing the identification bounds on the ATE. We consider the case of a binary outcome.^{3} This technique offers a relatively simple and nonparametric way to check the robustness of effects estimated with any given statistical model. Second, we develop a technique for determining the sensitivity of the bounds to the monotonicity assumptions. Our innovation is to treat the population of interest as being composed of two separate subpopulations—one in which a given monotonicity assumption is met and another in which the assumption is not met—even though it is unknown to which subpopulation any particular observation belongs. This innovation allows the analyst to compute regions of bounds that explicitly incorporate uncertainty and determine the extent to which these assumptions drive the results. Third, we consider complications associated with having data from a random sample and not the full population. Finally, to show how substantive assumptions are involved when using Molinari’s technique and our sensitivity analysis, we apply both to two problems in political science.

## 2 Manski and Molinari Identification Bounds

When faced with nonrandom treatment assignment and missing treatment data, obtaining precise estimates may require rather strong assumptions, perhaps stronger than what one is actually willing to defend. Manski (1990, 1995) offers an approach for defining identification bounds for a treatment effect: the interval between such bounds corresponds to the set of values that are logically compatible with the data and any explicitly stated assumptions. These assumptions are often extremely weak and, at a minimum, require only the laws of probability. Using bounds in this way lays bare the influence that assumptions have on the ability to estimate an effect.

### 2.1 Identification Bounds without Missing Treatment Data

Manski (1990, 1995) computes identification bounds for the treatment effect by assuming a binary treatment represented by 0 or 1.^{4} Manski lets be the outcome associated with a unit receiving treatment 1 and lets be the outcome associated with a unit receiving treatment 0. Thus, denotes the distribution of outcomes that would be realized if all units with characteristics received treatment 1 and denotes the distribution of outcomes that would be realized if all units with characteristics received treatment 0. If each member of the population receives one of the two treatments, let the binary variable indicate which of the two treatments a unit received, where or and is the distribution of the treatment variable among all units with characteristics . For each unit, one observes the value of , the value of , and the outcome resulting from the treatment, , which takes the value if or if . For parsimony in presentation, we suppress notation indicating that we are conditioning on the covariates represented by . Henceforth we write in place of , in place of , in place of , and so forth.^{5}

The quantity of interest is the difference , which is the ATE. By the law of total probability, can be reexpressed as and can be reexpressed as We have no information about and , other than that each lies between 0 and 1. Thus, by Manski (1995, 39–41),

Without further information, sharp bounds for are defined by### 2.2 Identification Bounds with Missing Treatment Data

Molinari (2010) extends Manski bounds to develop a procedure for identifying treatment effects when treatment data are missing.^{6} The notation is the same as above, except Molinari introduces a new variable, , which is a binary variable equal to 1 if the treatment received by a unit is observed, 0 otherwise. The analyst knows two distributions: , the probability that the treatment is observed, and , the conditional distribution of realized treatments given realized outcomes and the observability of the realized treatment.

Given this setup, what can be learned about the ATE, ? Using the law of total probability, Molinari shows^{7} that the sharp bounds for the treatment effect in the absence of knowledge of are , where

These bounds are always at least 100% points wide (e.g., ), and always include 0. We can narrow the identification interval with further assumptions but can never rule out an ATE of 0. Although perhaps unsatisfactory for some analysts, this possibility represents a fundamental characteristic of the data and the assumptions that we are imposing. It is for this reason that Manski (1995, 8) calls for analysts to have “a greater tolerance for ambiguity.”^{8}

### 2.3 Monotonicity Assumptions

To identify the ATE more precisely, Molinari, following Manski (1997) and Manski and Pepper (2000), suggests assuming monotonicity in treatment response and treatment selection, referred to, respectively, as the *Weak Monotone Treatment Response* (MTR) assumption and the *Weak Monotone Treatment Selection* (MTS) assumption. MTR implies that the outcomes associated with treatments for each unit satisfy a particular ordering. MTS implies that units that have particular kinds of outcomes select (or are selected) to have particular kinds of treatments.

The MTR assumption refers to the treatment effect for each unit. With MTR, one assumes that the treatment has a monotone effect on the outcome. This means that the lower (upper) bound of the treatment effect is 0 when one assumes positive (negative) MTR. Formally, positive MTR implies that for each unit . MTR refers to the different possible outcome values that exist for each unit.

The MTS assumption relies not just on unit level characteristics but describes a property of sets of units. With MTS, one assumes that if the population of observed units is divided into two groups according to the treatment, then the average potential outcome of the group that received the treatment is greater than the average potential outcome of the group that did not receive the treatment (Morgan and Winship 2007, 177). Formally, following Molinari (2007, 10), positive MTS is the assumption that and , . MTS refers to the outcomes for different units: units that receive the treatment valued tend to have higher outcomes than those who receive treatment even if, counterfactually, both types of units receive the same kind of treatment. A classic application of this assumption is found in Manski and Pepper (2000, 1003), to identify the causal effect of education on wages. MTS implies we assume that individuals with higher education would have, on average, received higher wages than individuals with lower education even if both had, in a counterfactual world, the same levels of education. Even though the treatment variable distinguishes those with higher outcomes from those with lower outcomes, the treatment variable may not be the reason for the outcome difference.

These monotonicity assumptions can be positive or negative. For the sake of simplicity, we will show the results only for weakly positive monotonicity. Following Molinari (2007), if the analyst first assumes only MTR, then the bounds become

Under the combination of the weak MTR and weak MTS assumptions, the bounds become

These assumptions together imply a narrower identification interval.

The data sometimes rule out both MTR and MTS holding (Manski and Pepper 2000). If

then it is logically impossible for the outcomes for units under treatment to be as large as those for units under control (for observed units, on average). Since MTR implies that the treatment potential outcomes are at least as large as the control potential outcomes, invoking both MTR and MTS in such a case implies a logical contradiction.## 3 Extending Molinari Bounds: Sensitivity Analysis

A limitation of Molinari bounds is that while MTR and MTS are critical for obtaining informative bounds, analysts may not know how sensitive their results are to violations of the assumptions. We generalize Molinari bounds by incorporating uncertainty over the MTR and MTS assumptions. It is worth noting that all sensitivity analyses developed in this section may also be applied to Manski bounds through application to a population with no missing treatment data.

What if the analyst is not confident about the MTR or MTS assumptions, or both, for some proportion of the data? Our innovation is to consider that the population of interest (not just the missing treatment data) is composed of two separate subpopulations—one in which a given monotonicity assumption is met and another in which the assumption is not met—but it is not known which observations are in which subpopulation. This is one way to assess how robust inferences are to the two assumptions (cf. Arnborg 2006). One computes bounds that explicitly incorporate the analyst’s uncertainty.

### 3.1 MTR Assumption

We begin with the case where the analyst is unsure of applying the MTR assumption on the population of data, but applies no MTS assumption. A vector of five parameters is observable:

is a realization of a random variable that is distributed with some probability over the standard 4-simplex .^{9}measures the configuration of the observable data. We add the parameter . If unit follows the MTR assumption, , if not, . Define a 5-element vector corresponding to , which is a realization of a random variable that is distributed over the unit 5-cube .

^{10}represents the conditional distribution of given the configuration of the observable data. The unconditional probability that a unit does not follow the MTR assumption is the inner product

Given , it is straightforward to compute identification bounds. The upper bound is unchanged by the MTR assumption, so the upper bound associated with any is . The MTR assumption changes the lower bound, with its value depending on the components of . The lower bound is a convex combination of the bounds for the data that follow the MTR assumption and for the data that do not follow the MTR assumption. The former bound is 0; therefore, given and , the lower bound is

where denotes the first element of .^{11}

But what if we only had firm knowledge of , the proportion of the data that do not follow the MTR assumption, but not any particular ? Prior to the data and being observed, knowledge of does not provide any constraints on . However, once is observed, knowledge of does imply constraints on , through equation (4). Equation (4) can imply a likelihood function for any given : using to denote the indicator function,

Given and , we may rule out many possibilities for .By expressing our prior uncertainty over the components in as a prior probability distribution, we may construct expected values and posterior intervals for the lower bound. One option, which we adopt, is to assume a prior where is distributed uniformly over , such that all values in its support are equally likely.^{12} The posterior probability of given is

### 3.2 MTS Assumption

The logic behind assessing the MTS assumption directly follows from that of the MTR assumption. We now assume that the entire population follows the MTR assumption (), but that some proportion of the population does not follow the MTS assumption. We add a parameter such that if unit follows the MTS assumption and if not. Let . Define a vector of five elements,

which is a realization of a random variable that is also distributed over . Now, . The method for articulating bounds is fundamentally unchanged from that described in Section 3.1, and we will therefore avoid much of the exposition from Section 3.1.For all , since the MTR assumption holds, the lower bound is . That is, by assumption, the lower bound is 0. The upper bound is a convex combination of the upper bounds for the data that do not follow the MTS assumption and for the data that follow the MTS assumption. Assuming , this gives^{13}

Much like the MTR case, a likelihood function identifies plausible values of :

Recalling (3), the condition reflects the fact that the observed treatment outcomes must be, on average, larger for the observed treated outcomes for the MTS-respecting subpopulation if MTR and MTS jointly hold. If this condition is violated, the posterior probability on is 0.

### 3.3 Computation for Sensitivity Analysis

We now discuss computation of salient quantities from the posterior distribution. To estimate the posterior distributions of bounds, we perform a procedure analogous to rejection sampling. To perform a sensitivity analysis for the MTR assumption, we sample a random value of from the uniform prior distribution on , then compute and . For the MTS assumption, we sample a random draw of from the prior distribution, reject the draw if , and, if not rejected, we record and . We repeat these procedures for a large number of replications.

We use the fact that and are continuous and smooth functions of arguments and , respectively, to compute expected values and quantiles of the posterior distributions. The expectations and quantiles are also continuous and smooth functions of the arguments, so we may estimate the expected values and quantiles for each or using nonparametric curve fitting. To estimate an expectation for the lower bound at or the upper bound at , we use smoothing splines in a generalized additive model (Hastie 1992).^{14} To estimate quantiles, we use neural network quantile regression (Taylor 2000). The predicted values from these fits can then be presented graphically or examined numerically.

## 4 Inference under Random Sampling

If we have sample survey data and not population data, then the probabilities or must be estimated instead of simply observed. Bounds on the ATE may be estimated using the sample analog of the population bounds, but construction of confidence intervals (CIs) is somewhat more involved. We describe the method, based on a bootstrap, that is developed by Molinari (2010) and gives an identification interval that asymptotically has probability of including the population identification interval. One might describe this as a % CI.

In the case of our sensitivity analysis, sample inference needs to be combined with the Bayesian inference that underpins our assessment of the two monotonicity assumptions. We combine Molinari’s bootstrap method with a method for estimating the sampling uncertainty in the spline and quantile curve estimates. At each of a large number of values of , we compute a CI for the identification interval for the ATE. The same theory that motivates Molinari’s CIs shows that the closure of the union of all such CIs gives a confidence region that asymptotically has probability of including the corresponding region for the population.

### 4.1 Estimating CIs

Citing results from Beresteanu and Molinari (2008) for set-valued random variables, Molinari (2010, 91) emphasizes that the focus of identification is on the identification interval between lower bound and upper bound , not on the bounds per se. To estimate a CI for this identification interval using sample data, the theory of Beresteanu and Molinari (2008) allows the sample inference problem to focus on the bounds. Consistent estimates for the bounds are based on the sample expected values of the probabilities. Added to these initial estimates are quantities determined by a bootstrap method. If , , , and are, respectively, population and estimated lower and upper bounds for the interval , and if for a sequence of sample sizes

then is an estimator for that may be obtained using a bootstrap, such that a CI that asymptotically covers with probability is (Molinari 2010, 91).^{15}

### 4.2 Confidence Regions for Sensitivity Analysis

Our method for sample data combines our Bayesian method for weakening the monotonicity assumptions with the asymptotic sample-based method derived by Beresteanu and Molinari (2008) (a bootstrap). The combination can be considered analogous to a fully Bayesian method for a set of special cases. First, we implicitly invoke an assumption that units are exchangeable, given , in the way we use the same exchangeable prior uniform distributions for and with all observations. Moreover, we assume observations are exchangeable in the way we make our sample inferences. Because we have exchangeable priors, treat the samples as simple random samples, and use only symmetric statistics (like proportions and functions of proportions), the posterior distributions are independent of the sampling design (Scott and Smith 1973, 59). If we had informative sampling designs, or perhaps if we knew more about how the sample data we are using were produced, we might gain from using different sample inference methods (Sugden and Smith 1984). As it is, the independence assumptions and the fact that all possible values of the variables are observed can support a Bayesian motivation for a bootstrap (Rubin 1981).

The complication that stands in the way of a fully Bayesian treatment is the fact that, following Beresteanu and Molinari (2008), we are dealing with set-valued random variables. Although Molinari (2010) considers CIs for identification intervals, we consider confidence regions for identification regions (defined immediately below). Bayesian bootstrap methods such as those that Newton and Raftery (1994) and Efron (2011) discuss have so far considered only scalar- and vector-valued random variables. We say more about this complication after defining the identification regions and confidence regions we use.

We associate and with four identification regions to connect with Molinari’s bootstrap method: , where denotes the Bayesian expectation of evaluated at each ; , where denotes the Bayesian expectation of evaluated at each ; , where denotes the Bayesian -quantile of evaluated at each ; and , where denotes the Bayesian -quantile of evaluated at each .^{16} So for MTR, for instance, we are interested in using the sample data to estimate the region between the spline curve that represents along and the line associated with the upper bound point.

Using arguments from Wang and Wahba (1994) and Beresteanu and Molinari (2008), we define a bootstrap based on elements , , that gives a confidence region that asymptotically with probability covers the region that would arise if we had population data. Let the smoothing spline model be written for bound values , function ,^{17} design points , and disturbance , so that the smoothing spline in the original sample is for estimate of smoothing parameter (Wang and Wahba 1994). We can implement Molinari’s bootstrap corrections in our extension of her model by using bootstrap samples to compute replicates of the original sample estimate , , and then to estimate smoothing splines using from each of the bootstrap replicates (with the draws of or from the prior and the design points held constant) to give bootstrap estimates . Values can then be estimated at each design point: call each of the estimated values , where each is identified with its design point, ; if and denote the upper and lower bound values estimated using the original sample at each design point, and denote estimates computed using the th bootstrap resample and and , then is the order statistic of .^{18} Across all the values estimated at each design point give bootstrap CI estimates, , that have the average coverage probability property that Wang and Wahba (1994) demonstrate for several kinds of bootstrap intervals for smoothing splines. The “confidence collection” result of Beresteanu and Molinari (2008, 775) then implies that the confidence region that is the closure of the union of these CIs has asymptotically the desired coverage probability. Denote the interval estimated at each design point after effecting the closure operation by . Not pointwise but on average across all , asymptotically covers with probability the corresponding population interval .

For instance, for MTR, we find a % confidence region that covers the region that would arise with population data between the spline curve over that represents and the line . The quantiles may be bootstrapped similarly (Molinari 2005; supplementary appendix B). In this case, we find confidence regions that asymptotically with probability cover the set of 95% posterior intervals for . To ensure that the confidence region for expectations is compatible with the region for the quantiles, the bootstrap method produces values that give conservative confidence regions that simultaneously cover both regions: asymptotically with probability at least , the confidence regions cover the posterior expectation and quantile regions that would arise if we had population data.^{19}

Viewed as a conventional estimation problem, Wang and Wahba (1994) suggest that the bootstrap method we use produces CIs comparable to and as good as the Bayesian “confidence intervals” derived for smoothing splines by Wahba (1983). From that perspective, the posterior expectations we compute are likely similar to the ones we would derive if we used a nominally Bayesian “confidence interval.” Of course we do not have a conventional estimation problem, because Molinari’s bootstrap method is connected to the set-valued inference problem considered by Beresteanu and Molinari (2008), and the spline estimates concern only edges of the identification region that is of actual interest to us.

## 5 Applications

To demonstrate the applicability of Molinari bounds, we consider two examples. We use these examples because (1) they are emblematic of data used in political science; (2) the treatment is not randomly assigned; and (3) some values for the treatment variable are missing. The first example draws from observational data on government torture and civil war, using population data. The second example utilizes random samples collected as part of the popular World Values Survey (WVS).

### 5.1 Effect of Torture on Civil War

A growing segment of civil-war scholars focus on state repression as a causal factor (Lichbach 1987; Tarrow 1989; Petersen 2002; Sambanis and Zinn 2006; Davenport 2007a, 2007b). State repression includes acts such as direct rule of ethnic minorities, extrajudicial killings, and torture. According to Sambanis and Zinn (2006), government repression can lead to civil war via a number of channels (Sambanis and Zinn 2006, 8). For one, state repression lowers the opportunity cost of violent actions against the state (if one is going to be repressed anyhow, there is less to lose by attacking the state). Alternatively, state repression bolsters the resolve of society’s more radical elements. In either case, repression can motivate the taking up of arms against the state.

To test these claims in a systematic way, scholars draw upon extensive data sets that measure civil war onset and state repression. The Cingranelli–Richards (CIRI) Human Rights Dataset (Cingranelli and Richards 2010) includes as an indicator of state repression the variable *Torture*, which records whether torture is practiced frequently, occasionally, or has not occurred in a given year within a given country from 1981 to 2006. The UCDP/PRIO Armed Conflict Dataset (Gleditsch et al. 2002) measures instances of civil war onset from 1946 to 2004. We examine the effect that torture in one year has on the onset of civil war in the following year during the years these data sets overlap: 1981–2004.

We address two of the major impediments standing in the way of testing the causal relationship between torture and civil war onset. First, the existence of torture within countries is not randomly assigned. Political leaders systematically assess the relative costs and benefits associated with repression, the availability of other options, and the likelihood of success (Davenport 2007b, 490). Second, data on the incidence of torture are sometimes missing, and such missing data are likely not missing at random. Table 1, which presents the joint distribution of the *Torture* and *Civil War* variables, shows that 541 of the 3886 values for the *Torture* variable are missing. In contrast, there are no missing values for the *Civil War* dependent variable. We let (meaning we observe the treatment) when *Torture* has a value of either 0 or 1, and for those observations for which *Torture* is coded as missing.

Civil war in year t | |||
---|---|---|---|

Torture in year | No (0) | Yes (1) | Total |

No (0) | 1840 | 150 | 1998 |

Yes (1) | 948 | 399 | 1347 |

Missing | 474 | 67 | 541 |

Total | 3262 | 624 | 3886 |

Civil war in year t | |||
---|---|---|---|

Torture in year | No (0) | Yes (1) | Total |

No (0) | 1840 | 150 | 1998 |

Yes (1) | 948 | 399 | 1347 |

Missing | 474 | 67 | 541 |

Total | 3262 | 624 | 3886 |

We begin our analysis by computing the Manski bounds based on ignoring the missing torture data. Because Table 1 shows that 541 observations have missing treatment data, these observations are excluded. The Manski bounds, , are displayed in Table 2. Because these bounds ignore the missing observations, it describes identification bounds for the causal effect in a substantively meaningless subpopulation: country-years with nonmissing torture data.

Effect of torture on civil war | ||
---|---|---|

Method | ATE bounds | |

Manski (excluding missing data) | 0.67 | |

Molinari | 0.72 | |

Molinari (MTR) | 0 | 0.72 |

Molinari (MTR and MTS) | 0 | 0.33 |

Effect of torture on civil war | ||
---|---|---|

Method | ATE bounds | |

Manski (excluding missing data) | 0.67 | |

Molinari | 0.72 | |

Molinari (MTR) | 0 | 0.72 |

Molinari (MTR and MTS) | 0 | 0.33 |

Because the population of interest includes both missing and nonmissing data, we compute Molinari bounds. The Molinari bounds for the ATE, reported in Table 2, are . Because they incorporate greater uncertainty, the Molinari bounds are wider than the Manski bounds.

We may achieve narrower bounds through additional assumptions. Because theory suggests that torture will have either a positive or no impact on the probability of civil war, it is plausible to apply a weakly positive MTR assumption. Alone, this assumption merely raises the lower bound on the treatment effect from to 0. A further assumption that may be plausible is that countries in which torture occurs are more likely to experience civil war (MTS) regardless of whether torture actually occurs. Given the MTR assumption, this amounts to the weakly positive MTS assumption. One argument that can support the MTS assumption is based on the likelihood that governments apply torture when there are individuals or groups that hold grievances against the government. If the presence of such groups importantly contributes to civil war onset (Cederman, Weidmann, and Gleditsch 2011), then the MTS assumption is plausible. Table 2 shows that making both monotonicity assumptions implies that torture increases the probability of civil war by no more than 0.33.

Because we may have uncertainty about the MTR and MTS assumptions, it is useful to assess how sensitive the apparent causal effect is to deviations from the MTR and MTS assumptions. Figure 1 displays results from the sensitivity analysis. The vertical axis of Fig. 1 shows the values of the upper and lower bounds on the ATE: the value of the upper bound is shown above 0; the value of the lower bound is shown below 0. The top horizontal axis reports the values of , whereas the bottom horizontal axis reports the values of . In Fig. 1, the solid black lines represent expected values of bounds assuming uniform priors for and over , and the solid gray lines represent the equal-tailed 95% posterior intervals.

The graph shows how the identification bounds for the ATE are sensitive to the monotonicity assumptions.^{20} The part of the graph below 0 shows the lower bound assuming that proportion of all the data do not follow the MTR assumption (without making any MTS assumption). As rises, the posterior expected value first gradually declines, and the posterior 95% intervals associated with gradually expand then gradually narrow as approaches 1. The lower bound of the posterior 95% interval () is only slightly greater than when . The upper bound of the 95% posterior interval () does not vary much for , but declines steadily as increases beyond that. The part of the graph above 0 shows the upper bounds when MTR is assumed for all the data () and proportion of the data do not satisfy the MTS assumption. The expected value of the upper bound gradually rises with . For example, when , the expected value is 0.55. As increases, the 95% posterior intervals associated with gradually expand to admit a wide range of ATE values, and as the probability of the MTS assumption being true approaches 0, the bounds gradually shrink to admit a narrower range of values. The upper bound of the 95% interval () is near for .

Considering both sets of bounds, inference about the ATE may not depend all that critically on the monotonicity assumptions. Fairly weak assumptions about MTR suffice to give similar results: the lower bound of the ATE decreases gradually as the proportion of the data believed to satisfy MTR decreases; the lower bound does not vary much for . The upper bound ATE results are slightly more sensitive to the MTS assumption: the lower bound of the 95% interval for is greater than the upper bound of the 95% interval for .

### 5.2 Effect of Familial Immigrants on Immigration Policy Attitudes

Several recent studies consider the policy implications of attitudes toward immigrant workers (Hainmueller and Hiscox 2007, 2010). Focusing primarily on how skill level affects a respondent’s views toward immigration policy, these studies also consider how having friends who are immigrants affect the respondent’s views (Hainmueller and Hiscox 2007, 430). In a probit regression model estimated using survey data, Hainmueller and Hiscox (2007, 432) find that individuals who have immigrant friends tend to hold more favorable views toward immigration.

To examine the relationship between having immigrant friends or family and views on immigration policy, we use data from the 2005 wave of the WVS, a comprehensive survey of political and sociocultural attitudes covering more than 50 countries (World Values Survey Association 2009). We consider separately WVS data from Germany () and from Italy (). To measure the outcome variable—attitudes toward immigrants—we use question 45 of the survey, which asks whether respondents agree, disagree, neither, or could provide no response to the question “When jobs are scarce, employers should give priority to [British] people over immigrants,” where the nationality of citizens of the referent country is used in place of the word “[British].” We restrict our population to respondents who either agreed or disagreed with question 45.

The treatment variable is whether the respondent has an immigrant parent, measured using question 215 (“Is your father an immigrant of this country or not?”) and question 216 (“Is your mother an immigrant of this country or not?”). To each of these questions the respondent could provide answers of “not an immigrant,” “immigrant,” or “don’t know/could not provide an answer.” A respondent who answered yes to either was recorded as having an immigrant parent, and a respondent who answered no to both was recorded as not having an immigrant parent. A respondent who did not answer yes to either, but had at least one missing value, was recorded as having missing treatment data.

Analysts may have two concerns regarding their ability to draw causal inferences in this application. First, the immigration status of a respondent’s parents is not randomly assigned. Second, some individuals might be unwilling to divulge such information (perhaps, e.g., out of fear of deportation). As highlighted by a recent study on how vague legal status affects the well-being of children in Canada, children living with an undocumented parent viewed teachers, nurses, police officers, and other authority figures with distrust. The children feared that such figures could have their parents detained or deported, and, if that occurred, “they will never be reunited with their parents” (Bernhard et al. 2007, 103). Such motivations might lead to nonresponses to the parental immigration status questions. This concern appears supported by the WVS data. Table 3 shows that in each country 13 of the individuals who responded to question 45 have missing treatment data, even though these individuals provided other demographic information. To identify the effect of having immigrant parents on an individual’s immigration policy views, one must account for both the nonrandom assignment of the treatment and the missing treatment data.

Germany | Italy | |||||
---|---|---|---|---|---|---|

Support for anti-immigrant policies | Support for anti-immigrant policies | |||||

Immigrant parent | No (0) | Yes (1) | Total | No (0) | Yes (1) | Total |

No (0) | 450 | 1053 | 1503 | 184 | 608 | 792 |

Yes (1) | 78 | 65 | 143 | 3 | 9 | 12 |

Missing | 5 | 8 | 13 | 1 | 12 | 13 |

Total | 533 | 1126 | 1659 | 188 | 629 | 817 |

Germany | Italy | |||||
---|---|---|---|---|---|---|

Support for anti-immigrant policies | Support for anti-immigrant policies | |||||

Immigrant parent | No (0) | Yes (1) | Total | No (0) | Yes (1) | Total |

No (0) | 450 | 1053 | 1503 | 184 | 608 | 792 |

Yes (1) | 78 | 65 | 143 | 3 | 9 | 12 |

Missing | 5 | 8 | 13 | 1 | 12 | 13 |

Total | 533 | 1126 | 1659 | 188 | 629 | 817 |

As with the previous example, we begin by computing the Manski bounds which exclude the observations with missing treatment data. Estimates of the Manski bounds and % CIs (, bootstrap replications) are reported in Table 4. ATE bounds are similar for both Germany and Italy, but the CI is wider for Italy () than for Germany (), a reflection of the difference in sample sizes. To account for the missing treatment data, we compute the Molinari bounds (Table 4). The Molinari bounds are very slightly wider, due to the fact that Molinari bounds incorporate greater uncertainty, and again they are very similar between the two countries. The CI is again wider for Italy () than for Germany ().

Germany | Italy | |||||||
---|---|---|---|---|---|---|---|---|

Method | ATE Bounds | 95% CI | ATE Bounds | 95% CI | ||||

Effect of immigrant parent on support for anti-immigrant policies | ||||||||

Manski (excluding missing data) | −0.69 | 0.31 | −0.72 | 0.35 | −0.76 | 0.24 | −0.81 | 0.29 |

Molinari | −0.69 | 0.32 | −0.73 | 0.36 | −0.76 | 0.25 | −0.81 | 0.30 |

Molinari (MTR) | −0.69 | 0 | −0.73 | 0 | −0.76 | 0 | −0.81 | 0 |

Molinari (MTR and MTS) | −0.25 | 0 | −0.38 | 0 | −0.03 | 0 | −0.46 | 0 |

Germany | Italy | |||||||
---|---|---|---|---|---|---|---|---|

Method | ATE Bounds | 95% CI | ATE Bounds | 95% CI | ||||

Effect of immigrant parent on support for anti-immigrant policies | ||||||||

Manski (excluding missing data) | −0.69 | 0.31 | −0.72 | 0.35 | −0.76 | 0.24 | −0.81 | 0.29 |

Molinari | −0.69 | 0.32 | −0.73 | 0.36 | −0.76 | 0.25 | −0.81 | 0.30 |

Molinari (MTR) | −0.69 | 0 | −0.73 | 0 | −0.76 | 0 | −0.81 | 0 |

Molinari (MTR and MTS) | −0.25 | 0 | −0.38 | 0 | −0.03 | 0 | −0.46 | 0 |

We may again achieve narrower bounds through additional assumptions. Because we expect that having an immigrant parent can only make the respondent *less* likely to support government policies that discriminate against immigrants, we apply a weakly negative MTR assumption. Alone, this assumption merely reduces the upper bound on the ATE from 0.36 or 0.30 to 0. To add the corresponding negative MTS assumption, one must assume that, on average, individuals with at least one immigrant parent would have been less likely to support government policies that favor nationals even if they did not have that immigrant parent. This counterfactual may be more difficult to comprehend, particularly if one thinks of an individual’s parentage as an attribute of the individual (Holland 1986), but it is at least conceivable to conjecture that, even in the absence of an immigrant parent, a counterfactual respondent who was otherwise in the same social position as the respondent would be more likely to have connections to immigrants and this exposure would reduce support for policies that discriminate against immigrants.

The intervals for the Molinari bounds shrink considerably once we invoke the MTR and MTS assumptions. Making both of these weakly negative monotonicity assumptions and using the CI estimates, Table 4 shows that having immigrant parents decreases the probability of supporting government policies that favor nationals by no more than 0.38 in Germany and no more than 0.46 in Italy. Accounting for sampling error is particularly important. The ATE lower bound estimate ignoring sampling error is in Germany and in Italy, dramatically different from the CI lower bounds of and . Clearly the MTS assumption makes a big difference, but not taking sampling error into account would lead to substantially overestimating the difference.

The results from using sensitivity analysis to reflect uncertainty about the MTR and MTS assumptions appear in Fig. 2. Figure 2 presents the sets of bounds associated with the conservative % confidence regions defined for the sensitivity analysis, with . Adapting previously defined notation to refer to the negative monotonicity case, the four regions we consider when estimating confidence regions are and , pertaining to MTR, and and , pertaining to MTR-MTS. As in Fig. 1, the vertical axis of Fig. 2 shows the values of the upper and lower bounds on the ATE: the value of the upper bound is shown above 0; the value of the lower bound is shown below 0. Because negative and not positive monotonicity assumptions are of interest, the upper part of Fig. 2 now reports deviations from MTR, whereas the lower part reports deviations from MTS. Therefore, unlike Fig. 1, the top horizontal axis now reports the values of whereas the bottom horizontal axis reports the values of . As with Fig. 1, the solid black lines represent expected values of bounds assuming uniform priors for and over , and the solid gray lines represent the equal-tailed 95% posterior intervals.

On the left in Fig. 2—for Germany—the estimated bounds on the ATE show a gradual response to changes in beliefs about the MTR assumption. The upper bound first jumps rapidly up from 0. This reflects the effect of sampling: the upper bound at is exactly 0, but for every value of the confidence region’s boundary increases by as a consequence of sampling uncertainty. After the estimated expected value of the upper bound, , initially jumps to a value of at , both and the estimated upper boundary of the 95% posterior interval, , gradually increase as increases, whereas the estimated lower boundary of the 95% posterior interval, , slightly decreases from its post-jump value of . increases until , when , then remains relatively constant until at which point resumes increasing up to . increases up to , where , after which it remains relatively level up to . continues decreasing down to , when , then steadily increases up to . Necessarily, . The posterior expected value and quantiles are nearly constant for , change gradually outside that range, and the initial jump up right above 0 is not all that large.

In contrast, the graph on the right in Fig. 2, for Italy, shows that ATE bounds respond suddenly to changes in beliefs about the MTR assumption. The jump in the upper bound just above is large: jumps to , and then remains relatively constant until , when it starts to decline; but at the point where reaches its minimum, at , we find , not much smaller than the post-jump value . Moreover, , again not much bigger than . The 95% posterior intervals expand around gradually as increases from 0 and converge back gradually to as approaches 1. The distance between and remains fairly constant for , with the maximum of being and the minimum of being . So, the jump from the upper bound of 0 at to the value of the estimated upper bound at is large relative to the magnitude the upper bound has throughout the range of . Slightly weakening the MTR assumption would produce almost as large an effect on the upper bound as giving it up entirely. The MTR assumption would be more influential—and the consequent analysis less robust—for Italy than for Germany.

Given MTR, the implication of invoking MTS seems sudden in both countries, albeit sudden in a different sense: uncertainty about the location of the lower bound increases substantially and immediately as increases above 0. For Germany, referring to the left part of Fig. 2 below 0, the estimated expected value, , first rises from through to , then declines steadily down to . But at , the estimated upper boundary of the posterior 95% interval, , is —which is near the maximum value —and the estimated lower boundary of the posterior 95% interval, , is , not that far from the minimum of .

Uncertainty about the lower bound increases rapidly and substantially in both countries as grows even slightly greater than 0, but in Italy even changes substantially for slight departures from MTS. For Italy, referring to the right part of Fig. 2, the estimated expected value, , first rises from through to , then declines steadily down to . But at is —near the maximum value of —and is , well on the way toward the minimum of .^{21}

Even though CI estimates for the ATE bounds do not differ all that much between the two countries, assessing the effect of weakening the MTR and MTS assumptions shows the countries to be very different. For Germany, slight departures from a homogeneous MTR assumption would not affect inferences about the ATE all that much, whereas for Italy such departures would have a sudden and large effect. Given MTR, in both countries slight departures from a homogeneous MTS assumption would immediately widen the scope for modeling assumptions to affect ATE estimates. In light of the posterior 95% intervals, assuming MTS prevails throughout almost all of the population would not be much more restrictive than assuming MTS does not hold at all.

## 6 Discussion

Causal inference is the product of both assumptions and data. We introduced a variation of Manski identification bounds for the ATE developed by Molinari (2010) that offers a relatively simple and nonparametric way to check the robustness of an estimated effect when treatment data are both nonrandomly assigned and missing. Because monotonicity assumptions may be critical for making Molinari bounds informative, we also developed a technique for determining the sensitivity of the bounds to these assumptions. This provides a systematic approach for determining the extent to which the monotonicity assumptions drive the results.

The implications of several notions of monotonicity have been studied in situations when treatments or outcomes are not binary (Manski 1997), but even with merely binary treatments and outcomes combinations of assumptions other than MTR or MTR-MTS may be considered. For instance, Manski and Pepper (2000, 1001) demonstrate bounds when MTS alone is assumed. One can imagine other patterns in which the monotonicity assumptions might be weakened besides the ones considered here. For example, within the context of the torture and civil war application, this might amount to assuming that the government applies torture when there are individuals or groups that hold grievances against the government, that these groups importantly contribute to civil war onset, but that torture can either effectively reduce the probability of civil war (perhaps because it provides usable intelligence to the government) or increase the probability of civil war. The substantive scientific basis for such alternative combinations would need to be clarified in order to motivate these investigations.

Referring to the monotonicity assumptions we consider, Manski (1997) presents a thorough discussion of MTR. One point worth noting is that MTR seems inherently more plausible than the assumption used in common practice with, for instance, regression analysis, in which one commits to the idea that an effect is the same for every observation but then consults the data to learn the sign and magnitude of the effect. If the previous supporting science is strong enough to show an effect is the same for every observation, then it ought to be strong enough to say whether the homogeneous effect is positive or negative. MTR does not entail any commitment to homogeneity, but using it does imply that prior relevant science can pin down an effect’s sign.

A basic challenge to inferences about treatment effects with observational data is the near certainty that there is confounding, meaning that other covariates besides the treatment variable of interest are associated with both the treatment and the outcome. Indeed, this basic problem arises whenever treatments are not randomly assigned. The goal of the identification methods we have considered is to show the range of all possible effects that might be found with a model that takes relevant confounders into account. The goal is not to assess the effects of such confounders. The fact that identification regions can be computed after conditioning on a covariate profile (recall footnote 5) means that, if data are sufficiently extensive, observable distinctions in identification regions can be addressed. An illustration of this occurs in our analysis of World Values Survey data, when we separate the data by country. The range of effects in Italy does appear importantly different from the effects in Germany: the MTR assumption seems less brittle in Germany.

Another concern may be that dependencies between observations are being ignored. In the civil war example, for instance, one may think there is temporal or spatial dependence in civil war, in torture practices, or in both. In the other example, beliefs about immigration policy may be contagious in communities such that some people’s policy beliefs depend on the immigrant-parent status of their neighbors. To some extent these possibilities are ruled out by the ITR assumption. If that assumption is dubious, treatment effects are much more difficult to assess (Manski 2011; Aronow and Samii 2012; Bowers, Fredrickson, and Panagopoulos 2012).

Relying on Wahba (1990) and Wang and Wahba (1994), our method for randomly sampled data can be considered Bayesian in the same sense as Wahba’s (1983) Bayesian “confidence intervals” with their average coverage property, but some may nonetheless object to using the bootstrap in this way. Is it possible to avoid use of the bootstrap while retaining Beresteanu and Molinari’s (2008) approach of associating identification regions with set-valued random variables? One approach may be to use credal set theory (Levi 1983; Karlsson 2010) to combine methods for weakening monotonicity assumptions with empirical updates from the sample data. A credal set is a convex set of probability functions or of likelihood functions (Karlsson 2010, 22), and so it is framing information with a topology comparable to the convex sets studied in Beresteanu and Molinari (2008)—the kinds of intervals and regions that arise when there is partial identification. Bayesian updating can dominate credal operator updating when central tendencies are in focus (Karlsson, Johansson, and Andler 2011), but with partial identification one is concerned with the entirety of identification regions—with their convex hulls in Beresteanu and Molinari’s (2008) “confidence collection” result—so the “second-order” properties of credal sets (Karlsson, Johansson, and Andler 2011) are likely to be important. Such developments are likely to have a practically important effect with sample data only when the sampling design is informative or when observations are not treated as exchangeable. Developing such ideas goes beyond the scope of our work here.

*Authors’ note*: Thanks to Peter Aronow for many important contributions. Comments from Don Green, Arthur Spirling, Allison Sovey, Rory Truex, and the participants of the 2011 MPSA Annual National Conference and the 2011 Annual Meeting of the Society for Political Methodology are greatly appreciated. Supplementary materials for this article are available on the *Political Analysis* Web site. Replication materials are in study hdl:1902.1/19368 at IQSS Dataverse Network.

^{1}Many of the results in Molinari (2010) hold if there are multiple treatments, and the results we use can be adapted if notions of monotonicity can be defined (c.f. Manski 1997). For instance, perhaps pairwise comparisons of treatments are of interest and there are reasons to think monotonicity holds between pairs.

^{2}“Sharp” describes the largest interval that is consistent with all available information.

^{3}Molinari (2010) considers outcomes in a bounded set with known bounds. Using such outcomes adds technical complexity but no new insights relative to our purpose to develop methods for assessing monotonicity assumptions. To get a sense of the technical differences involved in using outcomes in a bounded set instead of binary outcomes, compare our equation (2) to the equation in Proposition 4 in Molinari (2010, 86).

^{4}Supplementary appendix A formally describes the Manski framework, including a discussion of the assumption of “individualistic treatment response,” an assumption analogous to Rubin’s (1978) stable unit treatment value assumption.

^{5}All of the calculations in this article may be made conditional on a covariate profile without any meaningful complications to the mathematics. The variable would represent the covariates capturing the characteristics of all members of the population under study. For all probabilities, , we would write in its place.

^{6}Molinari’s method for treatment effect identification with missing treatment data assumes Individualistic Treatment Response (ITR): the unit’s treatment response is a function only of the value of its own treatment and not of the treatment realized for any other unit, which is a close analog to the stable-unit-treatment-value-assumption (SUTVA) in the Rubin (1978, 1991) Causal Model. Molinari assumes that each member of a population has “a specific response function mapping treatments into outcomes” (Molinari 2010, 83) and so focuses on individualistic effects. Molinari’s method and the sensitivity tests we introduce may still be useful in the absence of ITR if the data can be arranged so that an assessment of individualistic treatment effects is meaningful. In this case, all of will not be identifiable.

^{7}For a fuller discussion, see supplementary appendix B.

^{8}To help intuition, supplementary appendix C works through a simple example of calculating Manski and Molinari bounds.

^{9}contains all sets of real numbers such that and, for all , .

^{10}contains all sets of real numbers such that, for all , .

^{11}Derivations for the sensitivity analyses for the MTR assumptions are in supplementary appendix D.

^{12}One could use other distributions if there were appropriate prior information, but in general we believe no such information would exist.

^{13}Derivations are in supplementary appendix E.

^{14}Wahba (1990, 16–8) discusses smoothing splines as Bayes estimates.

^{15}denotes and denotes .

^{16}Bounds are appropriately reversed if the assumptions concern negative monotonicity. See Section 5.2.

^{17}where (Wang and Wahba 1994).

^{18}See Beresteanu and Molinari (2008, 773, Algorithm 2.1). Our function corresponds to Beresteanu and Molinari’s .

^{19}To ensure that the confidence region for posterior expectations is compatible with the region for the posterior quantiles, the bootstraps for the two regions are combined. Let denote bootstrap differences for expectations and differences for the corresponding quantiles. We use values given by the order statistics of to obtain confidence regions that have asymptotically probability at least of covering the population regions.

^{20}Smoothing spline and quantile estimates are based on 25,000 draws from the prior distributions of and .

^{21}Sensitivity analysis figures are based on 25,000 draws from the prior distributions. Using instead 100,000 prior draws produces a slightly smoother-looking graph for Italy that covers the same regions except the stairstep pattern in for is smoothed out. See supplementary appendix F.

## References

*WORLD VALUES SURVEY 2005 OFFICIAL DATA FILE v.20090901*. Aggregate File Producer: ASEP/JDS, Madrid