Summary

This study provides a template for multisite causal mediation analysis using a comprehensive weighting-based analytic procedure that enhances external and internal validity. The template incorporates a sample weight to adjust for complex sample and survey designs, adopts an inverse probability of treatment weight to adjust for differential treatment assignment probabilities, employs an estimated non-response weight to account for non-random non-response and utilizes a propensity-score-based weighting strategy to decompose flexibly not only the population average but also the between-site heterogeneity of the total programme impact. Because the identification assumptions are not always warranted, a weighting-based balance checking procedure assesses the remaining overt bias, whereas a weighting-based sensitivity analysis further evaluates the potential bias related to omitted confounding or to propensity score model misspecification. We derive the asymptotic variance of the estimators for the causal effects that account for the sampling uncertainty in the estimated weights. The method is applied to a reanalysis of the data from the National Job Corps Study.

1 Introduction

Many programme evaluations simply report the estimated average treatment effects without explicitly testing the theories explaining how a programme produces its intended effect. One way to test specific theories about programme mechanisms is mediation analysis that, in its simplest form, decomposes the total programme impact into an indirect effect—transmitted through a hypothesized focal mediator—and a direct effect—attributable to all other possible pathways. Multisite randomized trials, in which individuals are randomly assigned to treatment and control groups within each site, offer unique opportunities for further testing programme theories across a wide range of settings in which a programme is implemented. Just as treatment effects may vary across sites, causal mechanisms may differ across sites because of differences in local contexts, in participant composition and in treatment implementation (Weiss et al., 2014). Hence, assessing between-site variation in the causal mechanisms may generate important information for understanding heterogeneity in the total programme impact, may reveal a need to revisit the programme theory and may suggest specific site level modifications of the intervention practice. However, because of some important constraints of existing analytic tools, analysts have rarely investigated between-site heterogeneity of mediation mechanisms in multisite programme evaluations.

In a single-site study, the population of individuals residing at the site is naturally the target of inference. The causal parameter of interest is generally the treatment effect averaged over all the individuals in this site-specific population. In a multisite study, however, there are two potential targets of inference: the population of sites and the overall population of individuals, which is the union of all the site-specific subpopulations (Raudenbush and Bloom, 2015; Raudenbush and Schwartz, 2018). When researchers are primarily interested in how a programme is implemented at the site level and whether the programme impact depends on local settings, the population of sites clearly becomes the target of inference. In such a case, the population-average treatment effect is defined as the average of the site-specific average effect over all the sites. Henceforth we call this the average effect for the population of sites. Moreover, the between-site variance of the site-specific average effect indicates the extent to which the programme impact is generalizable across the sites. In contrast, when researchers are primarily interested in the overall population of individuals who are served by a particular programme, the population-average treatment effect is simply an average over the individuals in the overall population regardless of their site membership. We call this the average effect for the population of individuals. The average effect for the population of sites and that for the population of individuals become equivalent only when the size of the site-specific subpopulation of individuals is the same across all the sites or if the effect does not vary across sites. In this study, with a primary interest in the between-site heterogeneity of the programme impacts and of the mediation mechanisms, we focus on the population of sites rather than the overall population of individuals.

The methodological development in this study is motivated by a reanalysis of the multisite experimental data evaluating Job Corps, which is the largest federal programme designed to promote economic wellbeing among disadvantaged youths in the USA who are unemployed and not in school. Intensive education and vocational training are the central elements of the programme. Besides, unlike most other training programmes that have been generally found ineffective because participants tend to ‘have more trouble in their lives than the programs could correct’, Job Corps is unique in its provision of a comprehensive array of support services including residential living, supervision, behavioural counselling, social skills training, physical and mental healthcare, and drug and alcohol treatment. According to a nationwide evaluation of all the Job Corps centres in the mid-1990s, known as the National Job Corps Study (NJCS), Job Corps was the only federal programme shown to increase earnings of disadvantaged youths; the programme also improved educational attainment and employment and reduced criminal involvement (Flores and Flores-Lagunes, 2013; Frumento et al., 2012; Lee, 2009; Schochet et al., 2006, 2008; Zhang et al., 2009).

However, no attempt has been made to test the Job Corps programme theory formally. The programme is intended to improve disadvantaged youths’ economic wellbeing not only through education and training that form conventional human capital (Becker, 1964; Card, 1999) but also through comprehensive support services for reducing risk exposures and risk behaviours. Given the comprehensiveness of the programme and given that support services tend to be lacking under the control condition, another interesting theoretical question is whether education and training obtained through Job Corps generated a greater effect on earnings on average than education and training obtained under the control condition. Therefore, we ask how much of the Job Corps effect on earnings is mediated by education and training and whether Job Corps enhanced the economic returns to education and training for disadvantaged youth.

Moreover, with their primary interest in the population of individuals served by Job Corps, most researchers have simply ignored the role of individual Job Corps centres in their analyses. Yet a recent study (Weiss et al., 2017) reported considerable variation in the programme impact on earnings across the sites, with one Job Corps centre at each site. This result coincides with findings from a qualitative process analysis (Johnson et al., 1999) revealing important discrepancies between the intended programme and the implemented programme in service provision at some centres.

In our reanalysis of the NJCS data, we intend to test the Job Corps programme theory that focuses on education and training without overlooking the role of support services. Moreover, we shall examine how the theory plays out differently at different sites that may explain between-site heterogeneity in the programme impact. Given our interest in generating empirical evidence to inform Job Corps operation at the site level, the target of inference in this study is the population of sites rather than the overall population of individuals.

We highlight some challenges in such research endeavours.

  • (a)

    Potential sampling bias due to differential sampling probabilities: the NJCS drew a probability sample of individuals representative of the overall population of eligible applicants to be assigned to each of the Job Corps centres. An individual’s probability of being sampled was a function of baseline characteristics. If the analyst overlooks the differential probabilities of sample selection, sample estimates of the average programme impacts and of their between-site variance would contain sampling bias.

  • (b)

    Potential treatment selection bias due to differential probabilities of treatment assignment: rather than assigning all sampled individuals with an equal probability to either the programme group or the control group, NJCS researchers let the probabilities of treatment assignment differ by personal and site level characteristics. Ignoring the differential probabilities of treatment assignment would pose a threat to internal validity and lead to treatment selection bias.

  • (c)

    Potential non-response bias due to differential probabilities of response: in the NJCS, some sampled youths were lost to attrition or failed to provide information on education and training or on earnings, whereas some were not assigned to a specific centre before random assignment. We define all these individuals as non-respondents. The sample estimates would contain non-response bias if non-random non-response changes the representativeness of the sample of individuals in longitudinal follow-ups or if the remaining sample shows systematic differences between the programme group and the control group.

  • (d)

    Potential mediator selection bias due to differential probabilities of mediator value assignment: even if a randomized experiment does not suffer from non-random non-response, mediator values are typically generated through a natural process rather than being experimentally manipulated. As a result, individuals displaying different mediator values tend to differ systematically in many other aspects that would confound the causal mediation analysis and result in mediator selection bias.

  • (e)

    Potential bias due to model misspecification: path analysis and structural equation modelling (Alwin and Hauser, 1975; Baron and Kenny, 1986; Duncan, 1966; Sobel, 1982; Wright, 1934) have been the primary technique for mediation analysis in the several past decades with recent extensions to multisite data analysis (Bauer et al., 2006; Kenny et al., 2003; Krull and MacKinnon, 2001). These regression-based methods, however, rely heavily on correct specifications of both the mediator model and the outcome model (Hong, 2017). Recent advances in single-site causal mediation analysis (e.g. Imai et al. (2010 a,b), Pearl (2001), Petersen et al. (2006), Valeri and VanderWeele (2013), VanderWeele and Vansteelandt (2009, 2010) and van der Laan and Petersen (2008)) have focused on accommodating treatment-by-mediator interactions within the linear structural equation modelling framework; whereas challenges involving the functional forms of covariates remain in model specifications.

Challenges (a), (b) and (c) are common in evaluation studies and are often addressed via sampling weights, inverse probability of treatment weights (IPTW) and non-response weights respectively. We innovatively adapt these weighting adjustments to the context of mediation analysis by combining them with the ratio of mediator probability weighting (RMPW) strategy, which is for unpacking the causal mechanism and reducing mediator selection bias. RMPW was initially proposed by Hong (1986, 2015) and others (Bein et al., 2018; Hong et al., 2011, 2015; Hong and Nomi, 2012; Huber, 2014; Lange et al., 2012; Tchetgen Tchetgen and Shpitser, 2012) and was recently extended to multisite studies by Qin and Hong (2017). This strategy, without invoking functional form assumptions for the outcome model, is particularly flexible for accommodating treatment-by-mediator interactions and is suitable for discrete and continuous mediators and outcomes. We assess the remaining overt bias due to possible misspecifications of propensity score models through a weighting-based balance checking procedure; and we adopt a novel weighting-based sensitivity analysis strategy for assessing hidden bias with minimal simplifying assumptions (Hong et al., 2018a,b). This series of strategies constitutes a systematic and coherent template for multisite causal mediation analysis. We also address challenges to estimation and statistical inference when multiple weights are unknown and must be estimated from sample data.

We organize the paper as follows. Section 2. introduces the NJCS sample and data. Section 3. defines the causal parameters under the counterfactual causal framework. Section 4. clarifies the identification assumptions and presents our identification strategy. Section 5. outlines our approaches to estimation, statistical inference, balance checking and sensitivity analysis. Section 6. reports the analytic results. Section 7. concludes and discusses extensions. In addition, we provide an R package ‘MultisiteMediation’ (http://cran.r-project.org/web/packages/MultisiteMediation) that implements the proposed template for multisite causal mediation analysis.

The data that are analysed in the paper and the programs that were used to analyse them can be obtained from

https://academic.oup.com/jrsssb/issue/

2 The National Job Corps Study sample and data

NJCS researchers identified about 80000 eligible applicants nationwide in the mid-1990s (Schochet et al., 2001). Through a stratified sampling procedure, more than 15000 eligible applicants were randomly selected into a nationally representative research sample and were assigned at random to either the programme group or the control group. Programme group members could enrol in Job Corps soon after random assignment, whereas control group members were barred from enrolling in Job Corps for 3 years. Applicants who were initially assigned to the same Job Corp centre, regardless of their subsequent treatment assignments, constitute the sample of individuals at the given site. Participants in the study were interviewed at baseline and at 12, 30 and 48 months after randomization. By design, the probability of selection for each follow-up survey differed across individuals.

We perform our analysis on the random sample of 14125 youths who were targeted for the 48-month interview. The mediator, which was collected at the 30-month follow-up, indicates whether a youth had obtained an education credential—typically a general educational development certificate—or a vocational certificate (or both) since the randomization. The outcome is weekly earnings in the fourth year after randomization. Our sample contains 8818 respondents (3491 control group members and 5327 programme group members) and 5307 non-respondents (2235 control group members and 3072 programme group members).

3 A theoretical model of multisite causal mediation process

We investigate the following research questions in relation to Job Corps.

  • (a)

    To what extent did Job Corps increase earnings through improving educational and vocational attainment?

  • (b)

    To what extent did Job Corps increase earnings through other pathways?

  • (c)

    Did the improvement in educational and vocational attainment produce a greater increase in earnings under Job Corps than under the control condition?

  • (d)

    Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment?

  • (e)

    Were Job Corps centres equally effective in increasing earnings through other pathways?

  • (f)

    Did Job Corps enhance the economic returns to education and training in some centres but not in others?

  • (g)

    Did Job Corps centres that increased earnings through improving educational and vocational attainment also tend to be successful in increasing earnings through other pathways?

Here we present a theoretical model that summarizes key information characterizing the multisite causal mediation process. We define the causal parameters under the potential outcomes framework (Holland, 1986, 1988; Neyman et al., 1935; Rubin, 1978) that has previously been extended to causal mediation research (Pearl, 2001; Robins and Greenland, 1992). The extension focuses on the intermediate process in which one’s mediator value is a potential natural response to the treatment assigned; and hence mediator values may naturally vary among individuals under the same treatment.

3.1 Potential mediators and potential outcomes

Let Tij denote the treatment assignment of individual i at site j. It takes values t = 1 for an assignment to Job Corps and t = 0 for the control group. Let Mij denote the focal mediator and Yij denote the outcome. For individual i at site j, educational and vocational attainment is a function of the treatment assignment t. Hence, we use Mij(1) to represent the individual’s potential attainment if assigned to Job Corps and we use Mij(0) for the potential attainment if the same person was assigned to the control group. For each individual, only one of these two potential mediators is observable after the treatment assignment. Under treatment condition t, the individual might obtain a credential by the 30-month follow-up (Mij(t)=1) or might fail to do so (Mij(t)=0).

The individual’s weekly earnings in the fourth year after randomization also depend on the treatment assignment. The convention is to use Yij(1) and Yij(0) to represent the potential earnings that are associated with an assignment to Job Corps and to the control group respectively. Alternatively, one may view the potential outcome as a function of both the treatment assignment and the corresponding potential mediator and denote it with Yij{t,Mij(t)} for t = 0, 1. When Mij(t)=m, where m = 0, 1, the individual’s potential outcome value associated with treatment t can be written as Yij(t, m). Again, only one of the two potential outcomes is observable for each individual given the treatment assignment.

In causal mediation analysis, two additional counterfactual outcomes play indispensable roles: Yij{1,Mij(0)} is one’s potential earnings if assigned to Job Corps yet counterfactually having the same attainment status as he or she would have under the control condition; and Yij{0,Mij(1)} is the potential earnings if one was assigned to the control group yet counterfactually having the same attainment status as he or she would have under Job Corps. Because Mij(0) is counterfactual for programme group members and Mij(1) is counterfactual for control group members, neither Yij{1,Mij(0)} nor Yij{0,Mij(1)} is directly observable for any individual.

The above potential mediators and potential outcomes are defined under the stable unit treatment value assumption (SUTVA) (Rubin, 1980, 1986, 1990). In a single site, the SUTVA implies

  • (a)

    that an individual’s potential mediators are not functions of the treatment assignments of other individuals,

  • (b)

    that an individual’s potential outcomes are not functions of the treatment assignments and the mediator values of other individuals and

  • (c)

    that an individual’s potential mediators and potential outcomes do not depend on which programme agents (e.g. instructors or counsellors) one would encounter, which is also known as ‘treatment version irrelevance’.

This assumption would be violated, for example, in the presence of peer influence or if programme agents were not equally effective (Hong, 2015). In a multisite study, the SUTVA further requires ‘no interference between sites’ (Hong and Raudenbush, 2006; Hudgens and Halloran, 2008). Because applicants are usually assigned to Job Corps centres that are relatively close to their original residences and because Job Corps centres are sparsely located, between-site interference seems unlikely.

3.2 Individual-specific causal effects

Under the SUTVA, for individual i at site j, the intention-to-treat (ITT) effect of the treatment on the mediator, i.e. the effect of the treatment assignment on the mediator, is defined as

the ITT effect of the treatment on the outcome, which is also known as the total effect, is defined as

The individual-specific natural indirect effect (NIE) of the treatment on the outcome transmitted through the mediator (Pearl, 2001) is defined as

It represents the Job Corps effect on earnings that is attributable to the programme-induced change in the individual’s attainment from Mij(0) to Mij(1) under Job Corps. This was called ‘the total indirect effect’ by Robins and Greenland (1992), who distinguished it from the individual-specific ‘pure indirect effect’ (PIE)

It represents the effect on earnings when the individual’s attainment is changed from Mij(0) to Mij(1) under the control condition.

The individual-specific natural direct effect (NDE) of the treatment on the outcome is defined as

It represents the Job Corps effect on earnings while holding the individual’s attainment at the level that would be realized under the control condition. The direct effect is non-zero if Job Corps exerted an effect on earnings without changing an individual’s attainment. Robins and Greenland (1992) called this ‘the pure direct effect’ in contrast with ‘the total direct effect’, Yij{1,Mij(1)}Yij{0,Mij(1)}. The latter is the Job Corps effect on earnings while holding attainment at the level that would be realized under Job Corps.

The individual-specific total treatment effect is the sum of the individual-specific NIE and NDE. Alternatively, one may decompose the individual-specific total treatment effect into the PIE and the total direct effect.

As Judd and Kenny (1981) pointed out, a treatment may produce its effect not only through changing the mediator value but also in part by altering the mediational process that produces the outcome. In other words, the treatment may alter the relationship between the mediator and the outcome. We have reasoned that obtaining an education or training credential under Job Corps might bring greater economic returns than obtaining such a credential under the control condition. Therefore, the individual-specific NIE and PIE may not be equal. The difference between the two is defined as the natural treatment-by-mediator interaction effect for each individual (Hong, 2015; Hong et al., 2015), which quantifies the treatment effect on the outcome transmitted through a change in the mediator–outcome relationship. A non-zero interaction effect will indicate that the programme-induced change in attainment influences earnings differently between the Job Corps condition and the control condition.

3.3 Site-specific causal effects and population parameters

We define the site-specific causal effects by taking expectations of the individual-specific causal effects over the population of individuals at a given site. The site-specific effects, which are represented by βj in general, are listed in the second column in Table 1 in which Sij = j indicates the site membership of individual i.

Table 1

Causal parameters

EffectSite-specific effectResearch questionAverage effect over population of sitesResearch questionBetween-site variance
ITT effect on the mediatorβj(T.M)=E[Mij(1)Mij(0)|Sij=j]To what extent did Job Corps improve educational and vocational attainment?γ(T.M)=E[βj(T.M)]Were Job Corps centres equally effective in improving educational and vocational attainment?σT.M2=var(βj(T.M))
ITT effect on the outcomeβj(T.Y)=E[Yij{1,Mij(1)}Yij(0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings?γ(T.Y)=E[βj(T.Y)]Were Job Corps centres equally effective in increasing earnings?σT.Y2=var(βj(T.Y))
NIEβj(I)(1)=E[Yij{1,Mij(1)}Yij{1,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the Job Corps condition?γ(I)(1)=E[βj(I)(1)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the Job Corps condition?σI(1)2=var{βj(I)(1)}
NDEβj(D)(0)=E[Yij{1,Mij(0)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through other pathways?γ(D)(0)=E[βj(D)(0)]Were Job Corps centres equally effective in increasing earnings through other pathways?σD(0)2=var{βj(D)(0)}
PIEβj(I)(0)=E[Yij{0,Mij(1)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the control condition?γ(I)(0)=E[βj(I)(0)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the control condition?σI(0)2=var{βj(I)(0)}
Interaction effectβj(T×M)=βj(I)(1)βj(I)(0)Did the improvement in educational and vocational attainment produce a greater increase in earnings under Job Corps than under the control condition?γ(T×M)=E[βj(T×M)]Did Job Corps enhance the economic returns to education and training in some centres but not in others?σT×M2=var(βj(T×M))
EffectSite-specific effectResearch questionAverage effect over population of sitesResearch questionBetween-site variance
ITT effect on the mediatorβj(T.M)=E[Mij(1)Mij(0)|Sij=j]To what extent did Job Corps improve educational and vocational attainment?γ(T.M)=E[βj(T.M)]Were Job Corps centres equally effective in improving educational and vocational attainment?σT.M2=var(βj(T.M))
ITT effect on the outcomeβj(T.Y)=E[Yij{1,Mij(1)}Yij(0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings?γ(T.Y)=E[βj(T.Y)]Were Job Corps centres equally effective in increasing earnings?σT.Y2=var(βj(T.Y))
NIEβj(I)(1)=E[Yij{1,Mij(1)}Yij{1,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the Job Corps condition?γ(I)(1)=E[βj(I)(1)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the Job Corps condition?σI(1)2=var{βj(I)(1)}
NDEβj(D)(0)=E[Yij{1,Mij(0)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through other pathways?γ(D)(0)=E[βj(D)(0)]Were Job Corps centres equally effective in increasing earnings through other pathways?σD(0)2=var{βj(D)(0)}
PIEβj(I)(0)=E[Yij{0,Mij(1)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the control condition?γ(I)(0)=E[βj(I)(0)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the control condition?σI(0)2=var{βj(I)(0)}
Interaction effectβj(T×M)=βj(I)(1)βj(I)(0)Did the improvement in educational and vocational attainment produce a greater increase in earnings under Job Corps than under the control condition?γ(T×M)=E[βj(T×M)]Did Job Corps enhance the economic returns to education and training in some centres but not in others?σT×M2=var(βj(T×M))
Table 1

Causal parameters

EffectSite-specific effectResearch questionAverage effect over population of sitesResearch questionBetween-site variance
ITT effect on the mediatorβj(T.M)=E[Mij(1)Mij(0)|Sij=j]To what extent did Job Corps improve educational and vocational attainment?γ(T.M)=E[βj(T.M)]Were Job Corps centres equally effective in improving educational and vocational attainment?σT.M2=var(βj(T.M))
ITT effect on the outcomeβj(T.Y)=E[Yij{1,Mij(1)}Yij(0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings?γ(T.Y)=E[βj(T.Y)]Were Job Corps centres equally effective in increasing earnings?σT.Y2=var(βj(T.Y))
NIEβj(I)(1)=E[Yij{1,Mij(1)}Yij{1,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the Job Corps condition?γ(I)(1)=E[βj(I)(1)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the Job Corps condition?σI(1)2=var{βj(I)(1)}
NDEβj(D)(0)=E[Yij{1,Mij(0)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through other pathways?γ(D)(0)=E[βj(D)(0)]Were Job Corps centres equally effective in increasing earnings through other pathways?σD(0)2=var{βj(D)(0)}
PIEβj(I)(0)=E[Yij{0,Mij(1)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the control condition?γ(I)(0)=E[βj(I)(0)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the control condition?σI(0)2=var{βj(I)(0)}
Interaction effectβj(T×M)=βj(I)(1)βj(I)(0)Did the improvement in educational and vocational attainment produce a greater increase in earnings under Job Corps than under the control condition?γ(T×M)=E[βj(T×M)]Did Job Corps enhance the economic returns to education and training in some centres but not in others?σT×M2=var(βj(T×M))
EffectSite-specific effectResearch questionAverage effect over population of sitesResearch questionBetween-site variance
ITT effect on the mediatorβj(T.M)=E[Mij(1)Mij(0)|Sij=j]To what extent did Job Corps improve educational and vocational attainment?γ(T.M)=E[βj(T.M)]Were Job Corps centres equally effective in improving educational and vocational attainment?σT.M2=var(βj(T.M))
ITT effect on the outcomeβj(T.Y)=E[Yij{1,Mij(1)}Yij(0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings?γ(T.Y)=E[βj(T.Y)]Were Job Corps centres equally effective in increasing earnings?σT.Y2=var(βj(T.Y))
NIEβj(I)(1)=E[Yij{1,Mij(1)}Yij{1,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the Job Corps condition?γ(I)(1)=E[βj(I)(1)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the Job Corps condition?σI(1)2=var{βj(I)(1)}
NDEβj(D)(0)=E[Yij{1,Mij(0)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through other pathways?γ(D)(0)=E[βj(D)(0)]Were Job Corps centres equally effective in increasing earnings through other pathways?σD(0)2=var{βj(D)(0)}
PIEβj(I)(0)=E[Yij{0,Mij(1)}Yij{0,Mij(0)}|Sij=j]To what extent did Job Corps increase earnings through improving educational and vocational attainment under the control condition?γ(I)(0)=E[βj(I)(0)]Were Job Corps centres equally effective in increasing earnings through improving educational and vocational attainment under the control condition?σI(0)2=var{βj(I)(0)}
Interaction effectβj(T×M)=βj(I)(1)βj(I)(0)Did the improvement in educational and vocational attainment produce a greater increase in earnings under Job Corps than under the control condition?γ(T×M)=E[βj(T×M)]Did Job Corps enhance the economic returns to education and training in some centres but not in others?σT×M2=var(βj(T×M))

As we emphasized earlier, of particular theoretical interest is not only the overall average of each of these site-specific causal effects but also its possible variation across the sites. The NJCS was a census of all the Job Corps centres that existed at the time of the study, which enables us to generalize results to the population of sites. Hence, the population parameters include the population average and the between-site variance of each site-specific effect, respectively represented by γ and σ2 in general. As shown in Table 1, the superscripts in βj and γ and subscripts in σ2, (T.M), (T.Y), (I), (D) and (T × M), serve as shorthand for the ITT effect on the mediator, the ITT effect on the outcome, the indirect effects, the direct effects and the interaction effect respectively. We have listed in Table 1 the research questions with regard to the population-average causal effects over all the sites in the third column and the corresponding notation in the fourth column. The fifth column lists the research questions about the between-site variances of the site-specific effects; and the final column lists the corresponding notation. Besides, we are also interested in the covariance between the site-specific NDE and NIE, σD(0),I(1)=cov{βj(D)(0),βj(I)(1)}, indicating whether Job Corps centres that increased earnings through improving educational and vocational attainment also tend to be successful in increasing earnings through other pathways.

4 Identification

The causal parameters that are listed in Table 1 could be easily computed if all the potential mediators and potential outcomes were observed for the population of eligible applicants at every site. However, Mij(t) and Yij{t,Mij(t)} are observed for t = 0, 1 only if individual i at site j was selected into the sample, was assigned to treatment t and responded to the interviews. In addition, we never directly observe one’s potential outcome of assignment to treatment t while the mediator would counterfactually take the value associated with the alternative treatment t′ where tt′. Causal inference relies exclusively on inferring counterfactual information from the observed information. The inference inevitably invokes one or more assumptions. Here we clarify the assumptions under which each of the causal parameters can be identified from the observed information in the NJCS data. These assumptions should not be taken lightly. Rather, they require close scrutiny on scientific grounds.

4.1 Identification of the intention-to-treat effects

For the ITT effects of the treatment on the mediator and the outcome, identifying their averages over the population of sites along with their between-site variances is complicated by the differential sampling probabilities, treatment assignment probabilities and non-response probabilities, as discussed in Section 1.. We adjust these differential probabilities by applying a series of standard weighting strategies under strong ignorability assumptions about the sampling, treatment assignment and response mechanisms.

4.1.1 Sampling mechanism

NJCS researchers employed a stratified sampling procedure for individuals. Sampling probabilities varied across strata defined by date of random assignment, gender, residential status and whether one came from an area with a concentration of non-residential female students. The probabilities of being included in the follow-up surveys were further determined by a number of factors including population density in one’s living area and whether one provided immediate response to the baseline survey. Given this complex sample–survey design, individuals who were included in the 48-month interview sample and those who were not are expected to be comparable in composition only if they share the above-mentioned pretreatment characteristics, which we denote with vector XD. This conclusion also holds within each site. Because the sampling mechanism is known in this study, it is ‘ignorable’ in the sense that we can reasonably make the following assumption.

4.1.2 Assumption 1 (strongly ignorable sampling mechanism)

Within levels of the observed pretreatment covariates xD, sample selection is independent of all the potential mediators and potential outcomes at each site:

for t=0,1,mM, where M is the support for all possible mediator values, and j = 1, …, J, where J denotes the total number of sites. Here Dij takes value 1 if individual i at site j was selected into the 48-month interview sample and 0 otherwise. We additionally assume that 0<Pr(Dij=1|XDij=xD,Sij=j)<1, i.e. each eligible applicant at a site had a non-zero probability of being selected (or not being selected) into the sample: an assumption that was guaranteed to hold by the NJCS design. This is also known as the positivity assumption.

4.1.3 Treatment assignment mechanism

NJCS researchers specified an individual’s treatment assignment probability as a function of applicants’ date of random assignment and residential status among other factors, though not by site. Hence sampled individuals who were assigned to the programme group and those assigned to the control group are expected to be comparable in composition only within each of these predetermined strata, which we denote by XT. We find that XT and XD partially overlap.

4.1.4 Assumption 2 (strongly ignorable treatment assignment)

Within levels of the observed pretreatment covariates xT, the treatment assignment for the sampled individuals is independent of all the potential mediators and potential outcomes at each site:

Under this assumption, there should be no unmeasured confounding of the treatment–mediator relationship or the treatment–outcome relationship at any site. It is also assumed that 0<Pr(Tij=t|Dij=1,XTij=xT,Sij=j)<1, i.e. each sampled individual had a non-zero probability of being assigned to either treatment group at a given site. This assumption is similarly guaranteed by the NJCS design.

4.1.5 Response mechanism

NJCS researchers did not have control over an individual’s probability of response. Hence, the respondents in the programme group and those in the control group are no longer comparable in composition even if they share the same pretreatment characteristics {XDXT}. Because response status is possibly a result of the treatment assignment, we find evidence that the response mechanism differs between the programme group and the control group. In theory, conditioning on all the pretreatment and post-treatment covariates predicting one’s response status under a given treatment at a given site, the respondents and the non-respondents are expected to be comparable in composition. However, controlling for post-treatment covariates would inevitably introduce bias in identifying the ITT effects of the treatment (Rosenbaum, 1984). Hence, in practice, adjustment is made only for the observed pretreatment covariates. We invoke a strong assumption that, among individuals who share the same observed pretreatment characteristics denoted by XR, one’s response status is as if randomized in each treatment group.

4.1.6 Assumption 3 (strongly ignorable non-response)

Within levels of the observed pretreatment covariates xR, the response status of a sampled individual in a given treatment group is independent of the potential mediators and potential outcomes associated with the same treatment at a site:

Here Rij is equal to 1 if individual i at site j responded and 0 otherwise. This assumption cannot be empirically verified because the potential attainment and the potential earnings were unobserved for the non-respondents. However, as will be introduced in Section 5., we could use balance checking and sensitivity analysis to assess the influence of possible violations of the assumption. We also assume that 0<Pr(Rij=1|Tij=t,Dij=1,XRij=xR,Sij=j)<1, i.e. each sampled individual had a non-zero probability of response (or non-response) under a given treatment at a given site. This assumption would be violated if certain individuals would always respond or would never do so.

Under these three assumptions, we may equalize the sampling probability and the treatment assignment probability for all the sampled individuals through weighting; by the same logic, the response probability for all the sampled individuals in each treatment group can be equated through weighting as well.

4.1.7 Weighting adjustment for sample selection

Because the sampling probability is predetermined as a function of individual characteristics, certain subpopulations are overrepresented whereas others are underrepresented in the sample. The sample representativeness can be restored by applying the stabilized sample weight defined as follows for sampled individual i at site j with pretreatment characteristics xD:

(1)

The numerator of the sample weight represents the average sampling probability at site j, and the denominator is the individual’s sampling probability as a function of the individual’s pretreatment characteristics and his or her site membership.

4.1.8 Weighting adjustment for treatment assignment

Similarly, in the presence of treatment selection, certain subpopulations will become overrepresented whereas others are underrepresented in a given treatment group. Extending the logic of sample weighting to causal inference, the analyst may apply a stabilized IPTW (Robins et al., 2000) to sampled individual i at site j in treatment group t with pretreatment characteristics xT:

(2)

The numerator is the average probability of assigning a sampled individual at site j to treatment t; the denominator is the individual’s conditional probability of being assigned to treatment t given his or her pretreatment characteristics and site membership, and this probability is pre-determined by design in the NJCS.

4.1.9 Weighting adjustment for non-response

To remove the observed pretreatment differences between the respondents and the non-respondents in each treatment group, the analyst may apply a non-response weight (see Little and Vartivarian (2005)), which is also stabilized, to sampled individual i at site j in treatment group t with pretreatment characteristics xR:

(3)

The numerator is the average probability of response status r among sampled individuals at site j who have been assigned to treatment group t; the denominator is the individual’s probability of response status r given his or her pretreatment characteristics, treatment assignment and site membership. This conditional probability is unknown and must be estimated from the observed data: an issue that we shall discuss in Section 5..

Applying the product of WD, WT and WR to the respondents, we expect that the distributions of the observed pretreatment covariates {XDXTXR} will be balanced between the sampled and the non-sampled, between the programme group and the control group, and between the respondents and the non-respondents in each treatment group. Hence, we obtain the following identification results.

Theorem 1
Under assumptions 1–3, the site-specific average potential mediator and potential outcome under treatment t for t = 0, 1 can be respectively identified by the sample average of the observed mediator and the sample average of the observed outcome among the respondents assigned to treatment group t at site j, weighted by the product of the sample weight, IPTW and non-response weight:

Here WITTij=WDijWTijWRij removes selection bias in identifying the ITT effects. The proof of theorem 1 is presented in appendix A in the supporting web materials.

The weighted mean difference in attainment between the programme group and the control group at each site identifies the site-specific ITT effect of the treatment on the mediator; similarly, their weighted mean difference in earnings identifies the site-specific ITT effect of the treatment on the outcome. The population average and the between-site variance of each of these ITT effects can be identified by following standard results without invoking further assumptions.

4.2 Identification of the mediation-related effects

Identifying the population average and the between-site variance of NDE, NIE, PIE and the natural treatment-by-mediator interaction effect is considerably more challenging. This is because the mediation-related causal effects involve the counterfactual outcomes Yij{1,Mij(0)} and Yij{0,Mij(1)} that cannot be directly observed; this is additionally because the mediator value assignment under each treatment was not experimentally manipulated. We invoke the following assumption about the strong ignorability of mediator values.

4.2.1 Assumption 4 (strongly ignorable mediator value assignment)

Within levels of the observed pretreatment covariates denoted by xM, the mediator value assignment under either treatment condition for respondents is independent of the potential outcomes at each site:

for all possible values of t and m where tt. Under assumption 4, Mij(1) and Mij(0) are both independent of Yij(1, m) for respondents in the programme group at site j who share the same covariate values; in parallel, they are also independent of Yij(0, m) for respondents in the control group at the site who share the same covariate values.

Assumption 4 implies that, among individuals who share the same observed pretreatment characteristics denoted by XM, the assignment of mediator values is as if randomized within each treatment condition or across treatment conditions at any site. This is a particularly strong assumption because it requires not only that there is no remaining pretreatment confounding of the mediator–outcome relationship but also that no post-treatment confounding of the mediator–outcome relationship exists. However, this is not entirely implausible. For any Job Corps applicant at a given site, the probability of educational and vocational attainment may be influenced not only by the treatment assignment but also by theoretically important individual characteristics. However, these predictors do not need to determine with certainty whether an individual would obtain a credential under Job Corps or under the control condition. For example, a Job Corps student might successfully complete the programme if he or she happened to encounter a highly effective counsellor; a student who is assigned to the control condition might succeed if an alternative training programme was launched at about the same time. These possible random events would make the random assignment of mediator values conceivable under each treatment condition. Hence, we additionally assume that 0<Pr{Mij(t)=m|Rij=1,Tij=t,Dij=1,XMij=xM,Sij=j}<1, i.e. each respondent has a non-zero probability of displaying a given mediator value under the actual treatment condition at a given site. Given the Job Corps screening procedure, arguably all eligible applicants are expected to have a chance of attainment in the programme; their chance of attainment under the control condition would depend on the availability of alternative education and training opportunities in the local community.

4.2.2 Weighting adjustment for mediator value selection in treatment effect decomposition

In the NJCS, only the treatment was experimentally randomized. Yet, under assumption 4, the mediator value assignment could be viewed as if it were randomized for individuals sharing the same covariate values xM. Putting aside the issues of sampling or survey design and non-random non-response, the average of Y{t, M(t)} at a site would be identified by a weighted mean of the observed outcome in treatment group t. For individuals who share the same pretreatment characteristics, the weight would transform the mediator distribution in treatment group t to resemble that in treatment group t. Hong (1986, 2015) and others proved the identification result for causal mediation analysis in a single site; Qin and Hong (2017) extended this result to multisite causal mediation analysis. Here we extend the result to multisite studies involving complex sample or survey designs and non-random non-response by combining the assumptions and the weighting strategies that are associated with sampling selection, treatment selection, non-response selection and mediator value selection.

Theorem 2
Under assumptions 1–4, the site-specific average counterfactual outcome E[Yij{t,Mij(t)}|Sij=j] can be identified by the weighted average of the observed outcome among the sample respondents who are assigned to treatment group t at site j, the weight being the product of the ITT weight and the RMPW weight,
for tt, where the RMPW weight is
(4)

For respondent i at site j who was assigned to treatment group t and displayed mediator value m, WMij is a ratio of two propensity scores, each as a function of the individual’s pretreatment characteristics xM. The numerator is the individual’s propensity of displaying mediator value m under the counterfactual treatment t′, whereas the denominator is the individual’s propensity of displaying the same mediator value under the assigned treatment t. Applying the product of WITTij and WMij to the sample respondents in each treatment group at each site, we identify the site-specific average potential outcomes E[Yij{1,Mij(0)}|Sij=j] and E[Yij{0,Mij(1)}|Sij=j]. Appendix A in the web supporting materials presents a proof of theorem 2.

To simplify the notation, let

(5)

Here μtjM is the weighted average of the observed mediator in treatment group t at site j that identifies E[Mij(t)|Sij=j], μtjY is the weighted average of the observed outcome in treatment group t at site j that identifies E[Yij{t,Mij(t)}|Sij=j] and μtjY* is the weighted average of the observed outcome in treatment group t at site j, with additional RMPW, that identifies E[Yij{t,Mij(t)}|Sij=j]. With the site-specific mean of each potential mediator and potential outcome identified, we can identify the site-specific causal effects through the weighted mean outcome differences at each site. Table 2 summarizes these identification results. The first column lists the site-specific causal effects defined in terms of the counterfactual quantities as explicated in Section 3.; the second column lists the corresponding observable quantities. These identification results enable us to equate the average counterfactual quantities with the observable quantities at each site under the assumptions that are listed in the third column. We then identify correspondingly the population average and the between-site variance of each causal effect as defined in Section 3..

Table 2

Identification of the site-specific effects

Site-specific effectIdentification resultAssumptions
ITT effect on the mediatorβj(T.M)ITT effect on the outcomeβj(T.Y)μ1jMμ0jMμ1jYμ0jY1–3
NIEβj(I)(1)NDEβj(D)(0)PIEβj(I)(0)Interaction effectβj(T×M)μ1jYμ1jY*μ1jY*μ0jYμ0jY*μ0jYμ1jYμ1jY*(μ0jY*μ0jY)1–4
Site-specific effectIdentification resultAssumptions
ITT effect on the mediatorβj(T.M)ITT effect on the outcomeβj(T.Y)μ1jMμ0jMμ1jYμ0jY1–3
NIEβj(I)(1)NDEβj(D)(0)PIEβj(I)(0)Interaction effectβj(T×M)μ1jYμ1jY*μ1jY*μ0jYμ0jY*μ0jYμ1jYμ1jY*(μ0jY*μ0jY)1–4
Table 2

Identification of the site-specific effects

Site-specific effectIdentification resultAssumptions
ITT effect on the mediatorβj(T.M)ITT effect on the outcomeβj(T.Y)μ1jMμ0jMμ1jYμ0jY1–3
NIEβj(I)(1)NDEβj(D)(0)PIEβj(I)(0)Interaction effectβj(T×M)μ1jYμ1jY*μ1jY*μ0jYμ0jY*μ0jYμ1jYμ1jY*(μ0jY*μ0jY)1–4
Site-specific effectIdentification resultAssumptions
ITT effect on the mediatorβj(T.M)ITT effect on the outcomeβj(T.Y)μ1jMμ0jMμ1jYμ0jY1–3
NIEβj(I)(1)NDEβj(D)(0)PIEβj(I)(0)Interaction effectβj(T×M)μ1jYμ1jY*μ1jY*μ0jYμ0jY*μ0jYμ1jYμ1jY*(μ0jY*μ0jY)1–4

5 General analytic procedure

On the basis of the above identification results, we develop an analytic procedure and apply it to the NJCS data. As the identification results indicate, the estimation relies on four weights—sample weight WDij, IPTW WTij, non-response weight WRij and RMPW weight WMij. In the NJCS, the product of the first two weights was given by design (Schochet et al., 2001), and the non-response weight and the RMPW weight need to be estimated. Conceptually, the estimation involves two major steps:

  • (a)

    estimation of the non-response weight and the RMPW weight by fitting mixed effects logistic regressions and

  • (b)

    estimation of the site-specific causal effects and subsequently average and the between-site variance of the causal effects over the population of sites.

To produce valid statistical inferences that incorporate the sampling uncertainty of the weights in the estimation of the causal parameters, we adopt a solution that extends an M-estimation procedure for single-site and multisite RMPW analysis (Bein et al., 2018; Qin and Hong, 2017). This approach estimates the weights and the site-specific causal effects jointly under a generalized method-of-moments (MOM) framework.

However, the analytic results cannot be given causal interpretations if the identification assumptions are violated. We therefore use balance checking to assess whether the estimated weights effectively reduce selection bias associated with the observed covariates. To examine whether possible violations of the identification assumptions due to omitting confounders or due to overlooking between-site heterogeneity in the selection mechanisms would easily alter the analytic conclusions, we further conduct a sensitivity analysis.

5.1 Weight estimation

As clarified above, the estimation of the causal parameters depends on the estimates of the non-response weight and the RMPW weight. We selected the pretreatment covariates on theoretical grounds (see appendix B in the supporting web materials for a list of the 51 covariates). We categorized all the continuous covariates to reduce the potential risk of misspecifying the functional form of a model. To preserve the probability sampling and the randomized experimental design, we create a missingness indicator for each covariate with missing values. Incorporating the missingness indicators, as suggested by Rosenbaum and Rubin (1984), tends to balance not only the observed pretreatment covariates but also the missingness patterns. One alternative approach to dealing with missing data is complete-case analysis that deletes all the observations with missing values. This approach is suboptimal because, besides reducing statistical power, it would generally introduce bias except when the missingness is completely at random, which is a particularly strong assumption that rarely holds in reality. Another approach is multiple imputation, which requires the assumption of missingness at random—i.e. the probability that a variable is observed can depend only on the values of those other variables which have been observed (Little and Rubin, 1989). The missingness indicator approach that we have chosen requires a different assumption, namely that, given other observed covariates, the missing values in a covariate are independent of the key variable of interest, or, in other words, within levels of other observed covariates, the unobserved values in a covariate do not differ in distribution between those in different categories of the key variable (Groenwold et al., 2012; Jones, 1996). In estimating the non-response weight, response status R is the key variable of interest; in estimating the RMPW weight, the key variable is an individual’s mediator value assignment. In these two cases, the missingness indicator approach assumes strongly ignorable non-response or strongly ignorable mediator value assignment among those whose covariate values are missing, conditionally on all the observed information.

5.1.1 Non-response weight estimation

Following equation (3), let pRtj=Pr(Rij=1|Tij=t,Dij=1,Sij=j) denote the average response rate among sampled individuals in treatment group t at site j. To reflect the differences in response mechanisms between the programme group and the control group, we fit a logistic regression to each treatment group. The between-site difference in the conditional response rate in each treatment group is captured by a site-specific random intercept in a mixed effects model. The following model estimates the numerator of the weight:

in which πRt* indicates the average log-odds of response among the sampled individuals who are assigned to treatment group t across all the sites; the random intercept rRtj*, which is assumed to be normally distributed, indicates the deviance of the log-odds of response in each treatment group t at site j from its overall mean; the variance of rRtj* is σRt*2. To estimate the denominator of the non-response weight, we further control for the observed pretreatment covariates XRij in the mixed effects logistic regressions:

in which pRtij=Pr(Rij=1|XRij=xR,Tij=t,Dij=1,Sij=j); XRij includes the intercept, πRt is the corresponding vector of coefficients and rRtj is the random intercept with variance σRt2. By fitting each response model through maximum likelihood estimation (MLE) (e.g. Goldstein (2011)), as shown in appendix C in the supporting web materials, we estimate the coefficients in the response models and obtain the empirical Bayes estimates of the random intercepts. On the basis of these estimates, we obtain p^Rtj and p^Rtij and the non-response weights W^Rij=p^Rtj/p^Rtij for the respondents and W^Rij=(1p^Rtj)/(1p^Rtij) for the non-respondents.

5.1.2 Ratio of mediator probability weighting weight estimation

To obtain the RMPW weight as defined in equation (4), we need to estimate each respondent’s probability of attaining an education or training credential under Job Corps and the probability of obtaining such a credential under the control condition. Let pMtij=Pr(Mij=1|XMij=xM,Rij=1,Tij=t,Dij=1,Sij=j) and pMtij=Pr(Mij=1|XMij=xM,Rij=1,Tij=t,Dij=1,Sij=j) denote respondent i’s probabilities of attaining a credential at sitej if assigned to treatment t and treatment t respectively, for tt. We fit the following mediator model to each treatment group, allowing the mediator value selection mechanisms to differ between Job Corps and the control condition:

in which XMij includes the intercept; πMt is the corresponding vector of coefficients and rMtj is the random intercept with variance σMt2. Importantly, the denominator of the RMPW weight is one’s mediator probability under the treatment that he or she was actually assigned to and can be obtained directly by fitting the mediator model to the corresponding treatment group. The numerator of the weight, however, is one’s counterfactual probability of having the same mediator value under the alternative treatment. This is obtained by fitting the second mediator model to the alternative treatment group and then applying the coefficient estimates and the empirical Bayes estimate of the random intercept to the focal individual. The estimated RMPW weight is W^Mij=p^Mtij/p^Mtij for respondents in treatment group t at site j who attained a credential and is W^Mij=(1p^Mtij)/(1p^Mtij) for respondents in the same group at the same site who did not.

5.2 Causal parameter estimation and inference

In accordance with the identification results as shown in equation (5), the sample estimators for the site-specific average potential mediators and potential outcomes are

Here W^ITTij=WDijWTijW^Rij, where WDij and WTij are given by design, and W^R is estimated from the sample data; W^M also needs to be estimated; I(Sij = j) is an indicator for whether individual i was a member of site j; I(Tij=t) is an indicator for whether the individual was assigned to treatment t for t = 0, 1. Under the identification assumptions 1–4, mean contrasts between the estimated average potential mediators and potential outcomes at each site consistently estimate the site-specific causal effects that are listed in Table 2.

We then obtain MOM estimates of the causal parameters that characterize the distribution of the site-specific effects in a theoretical population of sites (e.g. Cameron and Trivedi (2005)). For simplicity, we use γ as a general form of each population-average causal effect standing for γ(T.M), γ(T.Y), γ(D)(0), γ(I)(1), γ(I)(0) and γ(T × M) and use βj as a general form of each site-specific causal effect standing for βj(T.M),βj(T.Y),βj(D)(0),βj(I)(1),βj(I)(0) and βj(T×M). By definition, the average of each causal effect over the population of sites, γ, is a simple average of the corresponding site-specific effect βj. Hence, the estimate of γ is

where β^j, a mean contrast as described above, is a consistent estimate of βj.

Although a simple average of β^j is consistent for γ, a simple average of the squared deviation of β^j from γ^ is biased for var(βj) because this variance estimator contains the sampling variance of β^j as well as the sampling variance of γ^. The estimation of var(βj) is further complicated because β^j is obtained on the basis of the estimated non-response weight and the estimated RMPW weight. This is known as the two-step estimation problem in which nuisance parameters must be estimated in the first step and are then used to obtain estimates of the parameters of interest in the second step. The nuisance parameters in this case are the coefficients in the propensity score models for the response and for the mediator. Moreover, although the site-specific effects are to be estimated with only the observed data at a given site, the nuisance estimators are estimated with the data pooled from all the sites, which leads to a non-zero correlation of the sampling errors in the site-specific effect estimates.

Earlier research has extended a two-step estimation procedure (Newey, 1984) to single-site (Bein et al., 2018) and multisite (Qin and Hong, 2017) RMPW analysis in which the RMPW weights are estimated. The rationale is to stack the estimating equations from both steps and to solve them simultaneously in the spirit of one-step generalized MOM estimation (Hansen, 1982). The current study makes a further extension to incorporate the estimated non-response weights. Under the standard regularity conditions, we derive the asymptotic sampling variance matrix for the site-specific causal effect estimates var(β^jβj) and then obtain a consistent estimate of the standard error for each estimated population-average causal effect γ^. The details can be found in appendix C in the supporting web material. Even though the standard errors can be alternatively estimated through a bootstrap procedure, the closed form method is favoured because it requires much less computation.

The between-site variance of each site-specific effect var(βj) is a population parameter of key interest because it quantifies between-site heterogeneity in the causal mechanism. Its estimation involves subtracting the estimated average within-site sampling variance of the site-specific effect estimates (i.e. the second component of the following equation) from the estimated between-site variance of the site-specific effect estimates (i.e. the first component of the following equation), with adjustment for the between-site sampling covariance of the site-specific effect estimates (i.e. the third component of the following equation):

The estimate of the variance–covariance matrix of all the site-specific effects can be found in the web appendix C. When a variance estimate is negative, which is known as a Heywood case, we set the variance estimate as well as the related covariance estimate to 0.

We adopt a permutation procedure (Fitzmaurice et al., 2007) for hypothesis testing. All possible permutations of the site memberships are equally likely under the null hypothesis that the between-site variance is 0. By randomly permuting the site indices while holding the site sizes fixed, we can approximate the null distribution of the test statistic and empirically determine the probability of obtaining values that are greater than the sample test statistic. The web appendix C provides technical details about the estimation and inference.

5.3 Balance checking

The non-response weight and the RMPW weight are estimated and are subject to potential model misspecification errors. Major errors in model misspecification can be detected if, within a treatment group, the estimated non-response weight fails to balance the distribution of the observed covariates between the respondents and the non-respondents, or if the estimated RMPW weight fails to balance the distribution of the observed covariates between those who succeeded in attaining a credential and those who did not. A substantial reduction in the imbalance in each case would indicate that the estimated weight is effective in reducing selection bias associated with the observed covariates. If some observed covariates are still imbalanced after weighting, balance checking results would indicate how much bias might be remaining and in which direction it might affect the analytic results.

5.3.1 Balance after non-response weighting adjustment

Having estimated the non-response weight as defined in equation (3), we use the standardized bias to quantify the balance in an observed covariate between the respondents and the non-respondents in each treatment group after weighting. The standardized bias is calculated by dividing the weighted mean difference in each covariate by the standard deviation of the covariate (Harder et al., 2010). By convention, a covariate is considered to be balanced if the standardized bias is less than 0.25 and preferably less than 0.10 in magnitude. To evaluate whether the balance is achieved across most or all of the sites, it is essential further to estimate the between-site standard deviation of the standardized bias. Under the assumption that the site-specific standardized bias is normally distributed, we compute the 95% plausible value range of the site-specific standardized bias, which is expected to be within the range [−0.25, 0.25] if the covariate has acceptable balance at each site. Because all the observed pretreatment covariates in this study are categorical, we obtain the results for each treatment group by fitting a weighted mixed effects logistic model regressing a binary indicator for each covariate category on the response indicator R. The model includes a site-specific random intercept and a random slope that are assumed to be bivariate normal.

5.3.2 Balance after ratio of mediator probability weighting adjustment

We further assess the extent to which the estimated RMPW weights balance the distribution of the observed covariates between mediator categories in each treatment group at each site. Regressing a binary indicator for each covariate category on the mediator M in a weighted mixed effects logistic model for each treatment group, we obtain estimates that enable us to calculate the population average and the between-site standard deviation of the standardized bias.

5.4 Sensitivity analysis

The analytic procedure that was described above would generate causally valid results only when the identification assumptions hold. In the current study, although the sampling mechanism and the treatment assignment mechanism are ignorable, the assumptions of strongly ignorable non-response and strongly ignorable mediator value assignment are probably untenable. A sensitivity analysis is necessary for determining whether potential violations of these assumptions due to omitted confounders would easily alter the causal conclusions. A conclusion is considered to be sensitive if the inference can be easily reversed by additional adjustment for an omitted confounder.

We apply a weighting-based approach to sensitivity analysis that has been extended from single-site to multisite causal mediation studies (Hong et al., 2018a, b). This approach reduces the reliance on functional form assumptions that is characteristic of most other existing sensitivity analysis methods. The hidden bias that is associated with one or more omitted confounders is summarized by a function of a small number of weighting-based sensitivity parameters. In a single-site mediation study in which the treatment is randomized, there are two sensitivity parameters: one is the standard deviation of the discrepancy between a new weight that adjusts for a confounder and an initial weight that omits the confounder, and the other is the correlation between the weight discrepancy and the outcome within a treatment group. Intuitively, the former is associated with the degree to which the omitted confounder predicts the mediator and the latter is associated with the degree to which it predicts the outcome.

In the current study, we consider potential violations of assumption 3 (strongly ignorable non-response) and assumption 4 (strongly ignorable mediator value assignment). The former are posed by omitted pretreatment or post-treatment confounders of the response–mediator or response–outcome relationships. Such omissions may bias all the causal parameters of interest. Potential violations of assumption 4 are posed by possible omissions of pretreatment and post-treatment confounders of the mediator–outcome relationships. These omissions threaten to bias the population-average NDE, NIE, PIE and interaction effect, and their between-site variances. Moreover, both assumptions need to hold within each site; yet the response models and the mediator models have assumed the same response mechanism and mediation mechanism across all the sites for keeping the models parsimonious. If the response mechanism or the mediation mechanism that is associated with an observed pretreatment confounder in fact varied across the sites for a given treatment group, omitting the site-specific increment to the coefficient for the confounder in the response model or the mediator model would introduce bias as well. In addition, the original analysis adjusted only for pretreatment covariates because, in the presence of treatment-by-mediator interactions, post-treatment confounders of the mediator–outcome relationship cannot be directly adjusted for in the mediator model (Avin et al., 2005). Similarly, in the presence of treatment-by-response interactions, post-treatment confounders of the response–mediator or response–outcome relationship cannot be directly adjusted for in the response model. We adopt a weighting-based strategy that offers a solution to sensitivity analysis concerning post-treatment confounders (Hong et al., 2018a,b). Appendix D in the supporting web materials provides a list of weighting-based sensitivity parameters that are relevant to multisite causal mediation research. For each type of omission, we assess its potential effect on the causal conclusion with regard to each of the population parameters of interest.

6 Analytic results

6.1 Estimated non-response and ratio of moderator probability weighting weights

Appendix B in the supporting web materials compares the distribution of the outcome and of the 51 covariates between the programme group and the control group, between the respondents and the non-respondents in each treatment group, and between the two mediator categories among the respondents in each treatment group. Average pretreatment differences are notable between the columns. All these covariates are included in the propensity score models for response status and those for the mediator. The estimated non-response weight ranges from 0.57 to 3.95 among the respondents and from 0.33 to 4.48 among the non-respondents, both with a mean equal to 1 in each treatment group. The estimated RMPW weight ranges from 0.13 to 2.53 among the respondents in the programme group and from 0.40 and 6.62 among those in the control group, again with a mean equal to 1 in each case. The stabilized ITT weight ranges from 0.40 to 6.30 among the respondents in the programme group and from 0.54 to 5.13 among those in the control group. The stabilized product of the ITT weight and RMPW weight ranges from 0.11 to 6.74 among the respondents in the programme group and from 0.27 to 8.17 among those in the control group. An overly large weight may indicate possible violations of the positivity assumption or suggest computational error and may pose a threat to the stability of the estimation results. Our results do not flag such a concern.

6.2 Results of causal parameter estimation and inference

Table 3 presents the results of estimation and inference for the population average and the between-site standard deviation of the causal effects. These results are generalizable to a theoretical population of Job Corps centres serving disadvantaged youths, most of whom had not acquired a labour-market-worthy qualification in education and training at the time of application.

6.2.1 Population-average intention-to-treat effects of Job Corps

The population-average ITT effects of Job Corps on educational and vocational attainment and on earnings are both positive and statistically significant. Job Corps increased the rate of educational and vocational attainment from 22% to 40% within 30 months after randomization and increased weekly earnings by about $21 (in 1994 dollars) in the fourth year after randomization. The original study estimated an ITT effect of close to $16 (Schochet et al., 2006). This was estimated in the population of individuals, whereas we estimate the average ITT effect in the population of sites. We do not expect these two parameters to have the same values as we discussed in the second paragraph of Section 1.. Besides, there were other differences between the analyses. We conducted an analysis to decompose which differences between our analysis and the original analysis led to this difference in estimated effects (the results are available on request). We found that most of the difference was due to how we classified non-respondents (those missing a site identification number, the mediator, or the outcome) compared with the original study classification (those missing the outcome).

Table 3

Estimated causal parameters using the Job Corps data

EffectPopulation-average effectBetween-site standard deviation95% plausible value range of site-specific effects
EstimateEffect sizep-valueEstimatep-value
ITT effect on the mediator0.1860.445<0.0010.0870.035[0.015, 0.357]
(difference in probability)(0.014)
ITT effect on the outcome ($)21.0300.114<0.00129.6030.035[−36.992, 79.052]
(5.684)
NDE ($)12.5610.0680.02828.9850.070[−44.250, 69.372]
(5.730)
NIE ($)8.4690.046<0.0015.4070.135[−2.129, 19.067]
(1.612)
PIE ($)6.1980.0340.0014.3510.215[−2.330, 14.726]
(1.781)
Interaction effect ($)2.2700.0120.36411.0830.220[−19.453, 23.993]
(2.503)
EffectPopulation-average effectBetween-site standard deviation95% plausible value range of site-specific effects
EstimateEffect sizep-valueEstimatep-value
ITT effect on the mediator0.1860.445<0.0010.0870.035[0.015, 0.357]
(difference in probability)(0.014)
ITT effect on the outcome ($)21.0300.114<0.00129.6030.035[−36.992, 79.052]
(5.684)
NDE ($)12.5610.0680.02828.9850.070[−44.250, 69.372]
(5.730)
NIE ($)8.4690.046<0.0015.4070.135[−2.129, 19.067]
(1.612)
PIE ($)6.1980.0340.0014.3510.215[−2.330, 14.726]
(1.781)
Interaction effect ($)2.2700.0120.36411.0830.220[−19.453, 23.993]
(2.503)

For the point estimate of each population-average effect, the corresponding standard error estimate is provided in parentheses. The effect size of each population-average effect estimate is calculated by dividing the point estimate by the standard deviation of the outcome in the control group. The bounds for the 95% plausible value range of the site-specific effects are 1.96 times the between-site standard deviation estimate away from the population-average effect estimate, under the assumption that the site-specific effects are normally distributed.

Table 3

Estimated causal parameters using the Job Corps data

EffectPopulation-average effectBetween-site standard deviation95% plausible value range of site-specific effects
EstimateEffect sizep-valueEstimatep-value
ITT effect on the mediator0.1860.445<0.0010.0870.035[0.015, 0.357]
(difference in probability)(0.014)
ITT effect on the outcome ($)21.0300.114<0.00129.6030.035[−36.992, 79.052]
(5.684)
NDE ($)12.5610.0680.02828.9850.070[−44.250, 69.372]
(5.730)
NIE ($)8.4690.046<0.0015.4070.135[−2.129, 19.067]
(1.612)
PIE ($)6.1980.0340.0014.3510.215[−2.330, 14.726]
(1.781)
Interaction effect ($)2.2700.0120.36411.0830.220[−19.453, 23.993]
(2.503)
EffectPopulation-average effectBetween-site standard deviation95% plausible value range of site-specific effects
EstimateEffect sizep-valueEstimatep-value
ITT effect on the mediator0.1860.445<0.0010.0870.035[0.015, 0.357]
(difference in probability)(0.014)
ITT effect on the outcome ($)21.0300.114<0.00129.6030.035[−36.992, 79.052]
(5.684)
NDE ($)12.5610.0680.02828.9850.070[−44.250, 69.372]
(5.730)
NIE ($)8.4690.046<0.0015.4070.135[−2.129, 19.067]
(1.612)
PIE ($)6.1980.0340.0014.3510.215[−2.330, 14.726]
(1.781)
Interaction effect ($)2.2700.0120.36411.0830.220[−19.453, 23.993]
(2.503)

For the point estimate of each population-average effect, the corresponding standard error estimate is provided in parentheses. The effect size of each population-average effect estimate is calculated by dividing the point estimate by the standard deviation of the outcome in the control group. The bounds for the 95% plausible value range of the site-specific effects are 1.96 times the between-site standard deviation estimate away from the population-average effect estimate, under the assumption that the site-specific effects are normally distributed.

6.2.2 Population-average mediation mechanism

The ITT effect of Job Corps on earnings is partly transmitted through educational and vocational attainment. The estimated average NIE is $8.47 (standard error SE=1.61; t = 5.26;p < 0.001). This result suggests that human capital formation is not the only pathway through which Job Corps generated its effect on earnings. The estimated average NDE is $12.56 (SE=5.73; t = 2.19;p = 0.028), accounting for nearly 60% of the ITT effect. According to our earlier reasoning, the NDE transmits the Job Corps effect primarily through a wide array of support services. The estimated difference between the NDE and NIE is not statistically significant, indicating that the support services played a role that is at least as important as general education and vocational training in promoting economic wellbeing among disadvantaged youths. The estimated natural treatment-by-mediator interaction effect $2.27 (SE=2.50; t = 0.91;p = 0.36) is simply the difference between the estimated NIE and the estimated PIE, the latter being $6.20 (SE=1.78; t = 3.48;p = 0.001). The interaction effect is not statistically significant. Therefore, the economic returns to the programme-induced increase in educational and vocational attainment are indistinguishable between Job Corps and the control condition.

6.2.3 Between-site variance of the intention-to-treat effects

The ITT effect of Job Corps on educational and vocational attainment did not vary significantly across sites. However, there is considerable between-site variation in the ITT effect on earnings. Its between-site standard deviation is estimated to be $29.60 (p < 0.05). Under the assumption that the site-specific ITT effects are normally distributed, these effects range from −$37 to $79 in 95% of the sites. This result indicates that, even though Job Corps significantly improved earnings on average, not all the centres generated positive effects. An estimated negative correlation (−0.18) between the site-specific control group mean and the ITT effect (the results are not tabulated) suggests that Job Corps tended to have a greater positive effect on earnings in the sites where economic prospects were particularly dire under the control condition.

6.2.4 Between-site variance of the mediation mechanism

To explain the between-site heterogeneity in the ITT effect on earnings, we further investigate how the causal mediation mechanism varied across sites. The estimated between-site standard deviation of the NDE is as large as $29 (p = 0.07), which is nearly equal to the estimated between-site standard deviation of the ITT effect on earnings; the estimated site-specific NDE ranges from −$44 to $69 in 95% of the sites. In contrast, the estimated between-site standard deviation of the NIE is only about $5 (p = 0.14). The estimated between-site standard deviation of the PIE and that of the interaction effect are similarly negligible. According to these results, not only did Job Corps universally improve the rate of attaining a certificate, the economic benefit of such an improvement was also comparable across the sites. However, the programme impact transmitted through support services appeared to be uneven across the sites. The site-specific NDE seems to coincide largely with the site-specific ITT effect on earnings, their correlation being greater than 0.9 (the results are not tabulated). Therefore, the between-site variation in the ITT effect on earnings is primarily explained by the heterogeneity in support services. This result is consistent with a qualitative process analysis (Johnson et al., 1999) showing that, unlike the provision of education and vocational training that was strictly regulated by the national and regional Job Corps offices, the quantity and quality of support services were left largely to the discretion of agents at each local centre.

6.2.5 Summary

Our empirical evidence supports the Job Corps programme theory and suggests necessary modifications in programme practice. Job Corps distinguishes itself from other training programmes by emphasizing both human capital formation and risk reduction as complementary pathways for improving the economic wellbeing of disadvantaged youths. Our results have indicated that the risk reduction mechanism is no less if not more important than human capital formation. Although all Job Corps centres succeeded in increasing educational and vocational attainment which subsequently led to an increase in average earnings, they were not equally successful in promoting economic wellbeing through countering a wide range of risk factors. One implication seems clear: regularizing the quantity and ensuring the quality of support services are probably the key to achieving universal effectiveness of Job Corps.

6.3 Results of balance checking

As expected, the non-response weighting adjustment substantially improved the balance between respondents and non-respondents on average in both treatment groups. Before weighting, the magnitude of the standardized bias averaged over all the sites was greater than 0.25 for one variable and greater than 0.1 for six other variables in the programme group and was greater than 0.25 for two variables and greater than 0.1 for seven other variables in the control group. After weighting, the average standardized bias becomes less than 0.1 in magnitude for all the variables in both groups. The 95% plausible value range of the site-specific standardized bias, initially exceeding the −0.25 and 0.25 thresholds for five variables in the programme group and for four variables in the control group, is kept between these thresholds for all except two variables in the programme group and for all except one variable in the control group. We note that the non-response weighting increased the plausible value range for some variables because of the increase in the uncertainty of estimation. These balance checking results are illustrated in Figs E.1–E.4 in appendix E in the supporting web materials.

Figs E.5–E.8 in the same appendix summarize the balance between mediator categories among the respondents in each treatment group after RMPW. The weighting reduced the number of variables with an average standardized bias exceeding 0.1 in magnitude from 9 to 0 in the programme group and from 10 to 3 in the control group. The number of variables with the plausible value range falling beyond the thresholds of −0.25 and 0.25 is reduced from 6 to 3 in the programme group and is, however, increased from 8 to 10 in the control group. This is because, in some of the sites, relatively few respondents in the control group successfully attained a credential. Such noise may reduce the precision in estimating the between-site variance of the standardized bias.

6.4 Results of sensitivity analysis

It is straightforward to assess the sensitivity of the original conclusions to the omission of an observed pretreatment covariate, because we can directly calculate its sensitivity parameters on the basis of the observed data. However, to determine whether the initial results are sensitive to the existence of an unmeasured pretreatment confounder, it is important to reason further whether the confounding effect of the unmeasured covariate would be comparable with that of an observed pretreatment confounder. For example, characteristics of a peer network might influence a Job Corps applicant’s response status, educational attainment and job prospect. Even though peer network was unmeasured in the NJCS, we may reason that its confounding effect is comparable with one of the most important observed pretreatment confounders such as baseline earnings and thereby obtain a plausible reference value of the bias caused by the omission of peer network.

Take the population-average NIE as an example. An omission of the indicator for upper middle level baseline earnings would result in a negative bias of −$3.39. Our original estimate of the NIE effect is $8.47, with a 95% confidence interval (CI) [$5.31, $11.63]. With an additional adjustment for an unmeasured pretreatment confounder that is assumed to be comparable with upper middle level baseline earnings, the new estimate of the NIE would become $11.86; the 95% CI of the adjusted NIE estimate is [$8.70, $15.02]. Here we consider the plausible reference value of bias that is associated with the omission to be given rather than estimated, and thus the additional adjustment does not change the width of the 95% CI. This hypothetical adjustment would lead to an increase in the magnitude of the NIE estimate without changing the initial conclusion about the significant positive NIE. For the population-average NDE, the omission would contribute a positive bias of $1.85. With an additional adjustment for this hypothetical bias, the estimate of the NDE would change from $12.56 (95% CI [$1.33, $23.79]) to $10.71 (95% CI [−$0.52, $21.94]). The adjusted CI now contains zero. Hence the original conclusion about the significant positive NDE is potentially sensitive to an unmeasured confounder that is comparable with baseline earnings. Among the 51 observed pretreatment covariates, 10 of them each provide a plausible reference value of bias that would lead to a statistically insignificant NDE once the hypothetical bias has additionally been removed. This is also true with five observed pretreatment covariates when we assess the sensitivity of the population-average PIE. In addition, nine covariates would overturn the statistical significance of the population-average natural treatment-by-mediator interaction effect. Nevertheless, none of the between-site variance estimates is sensitive to the omission of pretreatment confounders.

In many cases, the analyst might not have enough scientific knowledge to equate the potential bias of an omitted confounder with that of an observed covariate. Yet other sources of data might supply values of its sensitivity parameters. Applying the bias formula as represented in appendix D of the web supplementary materials, the analyst can compute the approximate amount of bias that is associated with the omission and then assess the sensitivity of the original conclusion to the omission. In addition to assessing the amount of bias that each single omitted covariate might contribute, we could also assess how much bias a set of omitted covariates might introduce jointly.

We further assess the sensitivity of the original conclusions to the omission of the site-specific increment to the coefficient for each pretreatment confounder that has been adjusted for in the response model or the mediator model. The population-average ITT effect estimate is insensitive to such an omission. In contrast, with an additional adjustment for the site-specific increment to the coefficient for some of the covariates, the estimated population-average NIE, NDE or PIE would become insignificant, whereas the population-average natural treatment-by-mediator interaction effect, which was originally tested to be insignificant, would become either significantly negative or significantly positive. Nevertheless, none of the between-site variance estimates is sensitive to the omission of the site-specific increment.

The above discussion is focused on the omission of pretreatment confounders. As explicated in Section 5.4., the omission of a post-treatment confounder would also pose threats to the identification assumptions. Because the NJCS data do not have any measurement of a potential post-treatment confounder of the response–mediator, response–outcome or mediator–outcome relationship, we cannot assess the potential influence of omitted post-treatment confounders in this study.

7 Conclusion

This paper presents a comprehensive template for multisite causal mediation analysis that integrates a series of weighting-based strategies. These include using a sample weight to adjust for complex sample and survey designs, using an IPTW to adjust for differential treatment assignment probabilities, using a non-response weight to adjust for non-random non-response, and using RMPW weights to adjust for mediator value selection while unpacking the causal mechanisms. Under the identification assumptions that are clarified in the paper, these weighting strategies promise to enhance the external validity and internal validity of the conclusions with regard to the population average and the between-site variance of the causal effects. In addition to decomposing the average ITT effect of the treatment on the outcome into direct and indirect effects, the template further investigates the heterogeneity in causal mechanisms across sites that explains the between-site variation in the ITT effect. Weighting-based balance checking assesses the amount of overt bias that is associated with the observed covariates. And, finally, a weighting-based sensitivity analysis enables a flexible assessment of the causal conclusions in light of possible violations of the identification assumptions due to hidden bias. The paper is accompanied by the MultisiteMediation R package for implementing the entire analytic procedure.

Developed under the framework of potential outcomes, the template that is presented in this paper provides an important alternative to existing methods. Multilevel path analysis and structural equation modelling (Bauer et al., 2006) with MLE have difficulties in estimating and testing the between-site variance of the NIE and the natural treatment-by-mediator interaction effect. Consistent estimation further requires that both the mediator model and the outcome model be correctly specified and that the mediator, the outcome and the site-specific effects be normally distributed. In contrast, with each causal effect that is identified through a mean contrast between the weighted outcomes, the proposed MOM strategy invokes no assumptions about the functional form of the outcome model or about the distributions of the mediator, the outcome and the site-specific effects. The issue of possible misspecifications of the functional forms of the response models or of the mediator models can be evaluated through balance checking and weighting-based sensitivity analysis. We opt for the MOM estimators also because the MLE of a population-average effect is essentially a precision weighted average of the site-specific effect estimates. As discussed in Raudenbush and Schwartz (2018), MLE will be biased if the precision is correlated with the site-specific effect, which is likely in a multisite trial in which the sites that are more effective in implementing the programme may attract more applicants. In contrast, the MOM estimator ensures consistency at some cost of efficiency. In general, efficiency becomes less of a concern in studies with a larger number of sites. We have also derived an asymptotic standard error for each population-average effect estimate that fully accounts for the sampling variability in the two-step estimation. The permutation test for variance testing also fills a gap in the literature on multilevel mediation analysis.

Important topics remain. First, we have conceptualized the site-specific causal effects under the SUTVA. This assumption will need to be relaxed if an individual’s potential mediators and potential outcomes could be affected by other individuals’ treatment assignments, or if an individual’s potential outcomes could additionally be affected by other individuals’ mediator values (Hong, 2015; VanderWeele et al., 2013). For example, about halfway into the NJCS sample intake period, the Job Corps centres nationwide implemented a ‘zero tolerance’ policy expelling students who were involved in drug abuse or violence. The removal of such ‘problem’ students would presumably improve the institutional environment and would increase allocation of resources to other students who were not directly targeted by the policy. Hence expelling problem peers from the programme might contribute positively to one’s potential earnings. To test this hypothesis will require a major revision of the conceptual framework and creative extensions of the current template, which we shall explore in future work.

Second, the template proposed is directly applicable to multisite trial data that are similar to the Job Corps data, in which all the sites at the time of study were included and at least a moderate number of individuals were selected into the sample at each site. Unlike the NJCS, some multisite studies may sample sites first and then sample individuals within the sampled sites. One may further incorporate a site level sample weight to adjust for the sample selection of sites. The sample design has important implications for causal inference. For example, researchers would not be able to obtain results that are generalizable to the population of sites if individuals were sampled from the overall population with a relatively small probability whereas the number of sites was relatively large and the site sizes were uneven. This is because sampled observations might become too sparse or even non-existent in some of the relatively small sites, in which case the sample of sites would not be representative of the population of sites.

Third, we have adopted the missingness indicator strategy to handle missingness in the pretreatment covariates while using the inverse probability weighting strategy to account for non-response in the mediator and the outcome. If the true values of the missing cases were highly variant, the missingness indicator strategy would underestimate the variance and covariance of the covariates. When the missingness at random assumption holds, an alternative is to use multiple imputation to impute the missing values in the covariates as well as in the mediator and the outcome. A product of the sample weight, IPTW and RMPW weight, i.e. WDWTWM, will then be applied to each imputed data set. The final estimation results can be obtained by combining the estimates from multiple-imputed data sets.

Fourth, as acknowledged by Qin and Hong (2017), the MOM estimation procedure may not be optimal if there are fewer than 20 individuals at each site. Moreover, when site sizes are relatively small, propensity score models may be overfitted if selection mechanisms vary across sites. In such cases, there might be a lack of statistical power for detecting between-site heterogeneity in the causal mediation mechanism.

Fifth, our mediator is a combination of two central elements of the Job Corps programme. It takes value 1 if an individual obtained either an education or a training credential within 30 months after randomization. However, the selection mechanism that led to an education credential might be different from the mechanism that led to a training credential. Combining these two distinct types of credentials into one mediator may result in misspecified propensity score models for the mediator and correspondingly biased estimates of the causal parameters. This problem can be addressed by viewing vocational training attainment and general education attainment as two concurrent mediators, which is a topic that we investigate in a separate study.

Supporting information

Additional ‘supporting information’ may be found in the on-line version of this article:

‘Multisite causal mediation analysis in the presence of complex sample and survey designs and non-random nonresponse’.

Acknowledgements

The authors thank Donald Hedeker, Luke Miratrix, Stephen Raudenbush, James Rosenbaum, Peter Schochet, Yongyun Shin, Margaret Beale Spencer, Kazuo Yamaguchi and Fan Yang for their contribution of ideas and their comments on earlier versions of this paper. Alma Vigil provided invaluable research assistance to the analysis of the NJCS data. This study was supported by a grant from the National Science Foundation (SES 1659935), a US Department of Education Institute of Education Sciences statistical and research methodology grant (R305D120020) and a subcontract from MDRC funded by the Spencer Foundation. In addition, the first author received a Quantitative Methods in Education and Human Development Research Predoctoral Fellowship from the University of Chicago and a National Academy of Education–Spencer Foundation Dissertation Fellowship. This paper reflects the views of the authors and should not be construed to represent the Food and Drug Administration’s views or policies.

References

Alwin
,
D. F.
and
Hauser
,
R. M.
(
1975
)
The decomposition of effects in path analysis
.
Am. Sociol.Rev.
,
40
,
37
47
.

Avin
,
C.
,
Shpitser
,
I.
and
Pearl
,
J.
(
2005
)
Identifiability of path-specific effects. Department of Statistics
,
University of California at Los Angeles, Los Angeles
.

Baron
,
R. M.
and
Kenny
,
D. A.
(
1986
)
The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations
.
J. Personlty Socl Psychol.
,
51
,
11
73
.

Bauer
,
D. J.
,
Preacher
,
K. J.
and
Gil
,
K. M.
(
2006
)
Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: new procedures and recommendations
.
Psychol. Meth.
,
11
,
142
163
.

Becker
,
G. S.
(
1964
)
Human Capital Theory: a Theoretical and Empirical Analysis, with Special Reference to Education
.
New York
:
Columbia University Press
.

Bein
,
E.
,
Deutsch
,
J.
,
Hong
,
G.
,
Porter
,
K.
,
Qin
,
X.
and
Yang
,
C.
(
2018
)
Two-step estimation in ratio-of-mediator-probability weighted causal mediation analysis
.
Statist. Med.
,
37
,
1304
1324
.

Cameron
,
A. C.
and
Trivedi
,
P. K.
(
2005
)
Microeconometrics: Methods and Applications
.
New York
:
Cambridge University Press
.

Card
,
D.
(
1999
) The causal effect of education on earnings. In
Handbook of Labor Economics
(eds
O.
Ashenfelter
and
D.
Card
), vol. 3A, pp.
1801
1863
.
Amsterdam
:
Elsevier
.

Duncan
,
O. D.
(
1966
)
Path analysis: sociological examples
.
Am. J. Sociol.
,
72
,
1
16
.

Fitzmaurice
,
G. M.
,
Lipsitz
,
S. R.
and
Ibrahim
,
J. G.
(
2007
)
A note on permutation tests for variance components in multilevel generalized linear mixed models
.
Biometrics
,
63
,
942
946
.

Flores
,
C. A.
and
Flores-Lagunes
,
A.
(
2013
)
Partial identification of local average treatment effects with an invalid instrument
.
J. Bus. Econ. Statist.
,
31
,
534
545
.

Frumento
,
P.
,
Mealli
,
F.
,
Pacini
,
B.
and
Rubin
,
D. B.
(
2012
)
Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data
.
J. Am. Statist. Ass.
,
107
,
450
466
.

Goldstein
,
H.
(
2011
)
Multilevel Statistical Models
.
Chichester
:
Wiley
.

Groenwold
,
R. H.
,
White
,
I. R.
,
Donders
,
A. R. T.
,
Carpenter
,
J. R.
,
Altman
,
D. G.
and
Moons
,
K. G.
(
2012
)
Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis
.
Can. Med. Ass. J.
,
184
,
1265
1269
.

Hansen
,
L. P.
(
1982
)
Large sample properties of generalized method of moments estimators
.
Econometrica
,
50
,
1029
1054
.

Harder
,
V. S.
,
Stuart
,
E. A.
and
Anthony
,
J. C.
(
2010
)
Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research
.
Psychol. Meth.
,
15
,
234
249
.

Holland
,
P. W.
(
1986
)
Statistics and casual inference (with discussion).
J. Am. Statist. Ass.
,
81
945
960
.

Holland
,
P. W.
(
1988
) Casual inference, path analysis, and recursive structural equation models. In
Sociological Methodology
(ed.
C. C.
Clogg
), pp.
449
484
.
Washington DC
:
American Sociological Association
.

Hong
,
G.
(
2010
)
Ratio of mediator probability weighting for estimating natural direct and indirect effects
.
Proc. Biometr. Sect. Am. Statist. Ass.
,
2401
2415
.

Hong
,
G.
(
2015
)
Causality in a Social World: Moderation, Mediation and Spill-over
.
New York
:
Wiley
.

Hong
,
G.
(
2017
)
A review of “Explanation in causal inference: Methods of mediation and interaction”
.
J. Educ. Behav. Statist.
,
42
,
491
495
.

Hong
,
G.
,
Deutsch
,
J.
and
Hill
,
H. D.
(
2011
)
Parametric and non-parametric weighting methods for estimating mediation effects: an application to the National Evaluation of Welfare-to-Work Strategies
.
Proc. Socl Statist. Sect. Am. Statist. Ass.
,
3215
3229
.

Hong
,
G.
,
Deutsch
,
J.
and
Hill
,
H. D.
(
2015
)
Ratio-of-mediator-probability weighting for causal mediation analysis in the presence of treatment-by-mediator interaction
.
J. Educ. Behav. Statist.
,
40
,
307
340
.

Hong
,
G.
and
Nomi
,
T.
(
2012
)
Weighting methods for assessing policy effects mediated by peer change
.
J. Res. Educ. Effect.
,
5
,
261
289
.

Hong
,
G.
,
Qin
,
X.
and
Yang
,
F.
(
2018a
)
Sensitivity analysis for multisite causal mediation studies
.
Technical Report
.
University of Chicago, Chicago
.

Hong
,
G.
,
Qin
,
X.
and
Yang
,
F.
(
2018b
)
Weighting-based sensitivity analysis in causal mediation studies
.
J. Educ. Behav. Statist.
,
43
,
32
56
.

Hong
,
G.
and
Raudenbush
,
S. W.
(
2006
)
Evaluating kindergarten retention policy: a case study of causal inference for multilevel observational data
.
J. Am. Statist. Ass.
,
101
,
901
910
.

Huber
,
M.
(
2014
)
Identifying causal mechanisms (primarily) based on inverse probability weighting
.
J. Appl. Econmetr.
,
29
,
920
943
.

Hudgens
,
M. G.
and
Halloran
,
M. E.
(
2008
)
Toward causal inference with interference
.
J. Am. Statist. Ass.
,
103
,
832
842
.

Imai
,
K.
,
Keele
,
L.
and
Tingley
,
D.
(
2010a
)
A general approach to causal mediation analysis
.
Psychol. Meth.
,
15
,
309
334
.

Imai
,
K.
,
Keele
,
L.
and
Yamamoto
,
T.
(
2010b
)
Identification, inference and sensitivity analysis for causal mediation effects
.
Statist. Sci.
,
25
,
51
71
.

Imai
,
K.
and
Yamamoto
,
T.
(
2013
)
Identification and sensitivity analysis for multiple causal mechanisms: revisiting evidence from framing experiments
.
Polit. Anal.
,
21
,
141
171
.

Johnson
,
T.
,
Gritz
,
M.
,
Jackson
,
R.
,
Burghardt
,
J.
,
Boussy
,
C.
,
Leonard
,
J.
and
Orians
,
C.
(
1999
)
National Job Corps Study: report on the process analysis
.
Research and Evaluation Report
.
Mathematica Policy Research, Princeton
.

Jones
,
M. P.
(
1996
)
Indicator and stratification methods for missing explanatory variables in multiple linear regression
.
J. Am. Statist. Ass.
,
91
,
222
230
.

Judd
,
C. M.
and
Kenny
,
D. A.
(
1981
)
Process analysis: estimating mediation in treatment evaluations
.
Evaln Rev
.,
5
,
602
619
.

Kenny
,
D. A.
,
Korchmaros
,
J. D.
and
Bolger
,
N.
(
2003
)
Lower level mediation in multilevel models
.
Psychol. Meth.
,
8
,
115
128
.

Krull
,
J. L.
and
MacKinnon
,
D. P.
(
2001
)
Multilevel modeling of individual and group level mediated effects
.
Multiv. Behav. Res.
,
36
,
249
277
.

Lange
,
T.
,
Vansteelandt
,
S.
and
Bekaert
,
M.
(
2012
)
A simple unified approach for estimating natural direct and indirect effects
.
Am. J. Epidem.
,
176
,
190
195
.

Lee
,
D. S.
(
2009
)
Training, wages, and sample selection: estimating sharp bounds on treatment effects
.
Rev. Econ. Stud.
,
76
,
1071
1102
.

Little
,
R. J.
and
Rubin
,
D. B.
(
1989
)
The analysis of social science data with missing values
.
Sociol. Meth. Res.
,
18
,
292
326
.

Little
,
R. J.
and
Vartivarian
,
S.
(
2005
)
Does weighting for nonresponse increase the variance of survey means?
Surv. Methodol.
,
31
,
161
168
.

Newey
,
W. K.
(
1984
)
A method of moments interpretation of sequential estimators
.
Econ. Lett.
,
14
,
201
206
.

Neyman
,
J.
,
Iwaszkiewicz
,
K.
and
Kolodziejczyk
,
St.
(
1935
)
Statistical problems in agricultural experimentation (with discussion)
.
J. R. Statist. Soc.
, suppl.,
2
, no. 2,
107
180
.

Pearl
,
J.
(
2001
) Direct and indirect effects. In
Proc. Conf. Uncertainty in Artificial Intelligence
(eds
J. S.
Breese
and
D.
Koller
), pp.
411
420
.
San Francisco
:
Morgan Kaufmann
.

Petersen
,
M. L.
,
Sinisi
,
S. E.
and
van der Laan
,
M. J.
(
2006
)
Estimation of direct causal effects
.
Epidemiology
,
17
,
276
284
.

Qin
,
X.
and
Hong
,
G.
(
2017
)
A weighting method for assessing between-site heterogeneity in causal mediation mechanism
.
J. Educ. Behav. Statist.
,
42
,
308
340
.

Raudenbush
,
S. W.
and
Bloom
,
H. S.
(
2015
)
Using multi-site randomized trials to learn about and from a distribution of program impacts
.
Am. J. Evaln
,
36
,
475
499
.

Raudenbush
,
S.W.
and
Schwartz
,
D.
(
2018
)
Estimation in multisite randomized trials with heterogeneous treatment effects
.
University of Chicago
,
Chicago
.

Robins
,
J. M.
and
Greenland
,
S.
(
1992
)
Identifiability and exchangeability for direct and indirect effects
.
Epidemiology
,
3
,
143
155
.

Robins
,
J. M.
,
Hernán
,
M. A.
and
Brumback
,
B.
(
2000
)
Marginal structural models and causal inference in epidemiology
.
Epidemiology
,
11
,
550
560
.

Rosenbaum
,
P. R.
(
1984
)
The consequences of adjustment for a concomitant variable that has been affected by the treatment
.
J. R. Statist. Soc.
A,
147
,
656
666
.

Rosenbaum
,
P. R.
and
Rubin
,
D. B.
(
1984
)
Reducing bias in observational studies using subclassification on the propensity score
.
J. Am. Statist. Ass.
,
79
,
516
524
.

Rubin
,
D. B.
(
1978
)
Bayesian inference for causal effects: the role of randomization
.
Ann. Statist.
,
6
,
34
58
.

Rubin
,
D. B.
(
1980
)
Randomization analysis of experimental data: the Fisher randomization test: comment
.
J. Am. Statist. Ass.
,
75
,
591
593
.

Rubin
,
D. B.
(
1986
)
Statistics and causal inference: Comment, Which ifs have causal answers
.
J. Am. Statist. Ass.
,
81
,
961
962
.

Rubin
,
D. B.
(
1990
)
Formal mode of statistical inference for causal effects
.
J. Statist. Planng Inf.
,
25
,
279
292
.

Schochet
,
P.
,
Burghardt
,
J.
and
Glazerman
,
S.
(
2001
)
National Job Corps Study: the impacts of Job Corps on participants’ employment and related outcomes
.
Mathematica Policy Research
,
Princeton
.

Schochet
,
P. Z.
,
Burghardt
,
J.
and
McConnell
,
S.
(
2006
)
National Job Corps study and longer-term follow-up study: impact and benefit-cost findings using survey and summary earnings records data. Mathematica Policy Research, Princeton
.

Schochet
,
P. Z.
,
Burghardt
,
J.
and
McConnell
,
S.
(
2008
)
Does Job Corps Work?: Impact findings from the National Job Corps Study
.
Am. Econ. Rev.
,
98
,
1864
1886
.

Sobel
,
M. E.
(
1982
) Asymptotic confidence intervals for indirect effects in structural models. In
Sociological Methodology
(ed.
S.
Leinhardt
), pp.
290
312
.
San Francisco
:
Jossey-Bass
.

Tchetgen Tchetgen
,
E. J.
and
Shpitser
,
I.
(
2012
)
Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness and sensitivity analysis
.
Ann. Statist.
,
40
,
1816
1845
.

Valeri
,
L.
and
VanderWeele
,
T. J.
(
2013
)
Mediation analysis allowing for exposure–mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros
.
Psychol. Meth.
,
18
,
137
150
.

VanderWeele
,
T. J.
,
Hong
,
G.
,
Jones
,
S. M.
and
Brown
,
J. L.
(
2013
)
Mediation and spillover effects in group-randomized trials: a case study of the 4Rs educational intervention
.
J. Am. Statist. Ass.
,
108
,
469
482
.

VanderWeele
,
T. J.
and
Vansteelandt
,
S.
(
2009
)
Conceptual issues concerning mediation, interventions and composition
.
Statist. Interfc.
,
2
,
457
468
.

VanderWeele
,
T. J.
and
Vansteelandt
,
S.
(
2010
)
Odds ratios for mediation analysis for a dichotomous outcome
.
Am. J. Epidem.
,
171
,
1339
1348
.

Weiss
,
M. J.
,
Bloom
,
H. S.
and
Brock
,
T.
(
2014
)
A conceptual framework for studying the sources of variation in program effects
.
J. Poly Anal. Mangmnt
,
33
,
778
808
.

Weiss
,
M. J.
,
Bloom
,
H. S.
,
Verbitsky-Savitz
,
N.
,
Gupta
,
H.
,
Vigil
,
A. E.
and
Cullinan
,
D. N.
(
2017
)
How much do the effects of education and training programs vary across sites?: Evidence from past multisite randomized trials
.
J. Res. Educ. Effect.
,
10
,
843
876
.

Wright
,
S.
(
1934
)
The method of path coefficients
.
Ann. Math. Statist.
,
5
,
161
215
.

Zhang
,
J. L.
,
Rubin
,
D. B.
and
Mealli
,
F.
(
2009
)
Likelihood-based analysis of causal effects of job-training programs using principal stratification
.
J. Am. Statist. Ass.
,
104
,
166
176
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)