A Bayesian Approach for Estimating the Survivor Average Causal Effect When Outcomes Are Truncated by Death in Cluster-Randomized Trials

Abstract Many studies encounter clustering due to multicenter enrollment and nonmortality outcomes, such as quality of life, that are truncated due to death—that is, missing not at random and nonignorable. Traditional missing-data methods and target causal estimands are suboptimal for statistical inference in the presence of these combined issues, which are especially common in multicenter studies and cluster-randomized trials (CRTs) carried out among the elderly or seriously ill. Using principal stratification, we developed a Bayesian estimator that jointly identifies the always-survivor principal stratum in a clustered/hierarchical data setting and estimates the average treatment effect among them (i.e., the survivor average causal effect (SACE)). In simulations, we observed low bias and good coverage with our method. In a motivating CRT, the SACE and the estimate from complete-case analysis differed in magnitude, but both were small, and neither was incompatible with a null effect. However, the SACE estimate has a clear causal interpretation. The option to assess the rigorously defined SACE estimand in studies with informative truncation and clustering can provide additional insight into an important subset of study participants. Based on the simulation study and CRT reanalysis, we provide practical recommendations for using the SACE in CRTs and software code to support future research.

Outcomes such as quality of life are frequently measured nonmortality outcomes used to assess general health, recovery, and the impact of medical interventions (1)(2)(3). In studies with nontrivial mortality, such as those conducted among the elderly or persons with critical and serious illnesses, patientcentered outcome measures often cannot be captured for a sizeable proportion of participants who die during the study period (4,5). Nonmortality outcomes with missing data due to death raise conceptual and empirical issues. First, death itself is an outcome of interest. Second, because the nonmortality outcome is empirically unmeasured (i.e., undefined) among persons who die, imputation approaches may not appeal to certain stakeholders (6)(7)(8). Relatedly, composite outcomes require that some subjective valuation be used concerning what value should be used for those who die (9,10). Further, from a causal inference perspective, many strategies may not provide estimates with a clear causal interpretation under the counterfactual outcomes framework (11)(12)(13) because surviving study participants under treatment and control conditions can be systematically different, which obscures the target estimand.
Adapted from Suzuki (14), Figure 1 portrays what is often termed the "truncation-by-death" problem. In this setting, the survivor status (S) and quality-of-life outcome (Y) of an individual can be influenced by the treatment (D), as well as by an unobserved variable (U). The unobserved variable (U) can also be potentially associated with the treatment (D), but the association would be less likely if the treatment were assigned in a randomized trial setting. Under the counterfactual outcomes framework, there is considerable literature on estimating causal effects using principal stratification (15)(16)(17)(18)(19). This framework has been applied to study nonmortality outcomes in individually randomized trials (4,18), as well as to address protocol compliance, patient encouragement, and other problems in public health and social science settings (e.g., the effect of job training programs and behavioral health interventions) (16,17,20,21). Principal stratification seeks to identify strata of patients by their preexposure characteristics. The stratum of primary interest includes patients who would always survive through the end of the trial period regardless of treatment assignment. The effect of an intervention in this stratum of "always-survivors" is termed the survivor average causal effect (SACE). The SACE is a meaningful estimand because, without additional assumptions, only the always-survivors have both counterfactual outcomes well-defined. Thus, the SACE avoids survivor bias (i.e., observed and unobserved characteristics of survivors in treatment and control are likely different) and summarizes the treatment effect without needing additional assumptions about the counterfactual outcomes for the nonsurvivors (5,10,22,23). However, the always-survivors target population is not directly observed; that is, not all participants have a definitive stratum membership, but principal strata must be identified to estimate the SACE. Bayesian inference is particularly attractive for this purpose, since the posterior strata membership can be updated through a Markov chain Monte Carlo (MCMC) algorithm, while posterior predictive distributions for the SACE can be conditional on the strata membership (15,16). Herein, we extend and use principal stratification to estimate the SACE using Bayesian inference in clusterrandomized trials (CRTs) with death truncation. In contrast to individual-level randomization, CRTs randomize groups of individuals to treatments (24,25). Cluster-level randomization is used when the intervention is designed for a system-level improvement or when randomization is not feasible at the individual level (see Turner et al. (25) for a review). As a result, the outcome observations are often more similar within clusters than between clusters, causing a positive intracluster correlation that must be accounted for at the analysis stage to avoid inflated type I error. To estimate the SACE in CRTs, we developed a Bayesian approach that leverages the baseline covariates to predict the counterfactual survivor status and the counterfactual outcomes. As we explicate in the Methods section, our approach includes both a principal stratification model and an outcome model, for which we developed an iterative sampling algorithm to estimate the model parameters jointly, and hence the SACE in CRTs.

Causal framework and assumptions
We consider the counterfactual outcome framework, where the causal effect is defined as the difference between the 2 counterfactual outcomes averaged across a common population (12,13). We assume 1) the stable unit treatment value assumption by which patients receive no different forms or versions of treatment and that no interference exists and 2) that the treatment is randomized at the cluster level and is independent of both the potential survivor status and the counterfactual outcome of all individuals in each cluster. In a CRT with I clusters and n i individuals in each cluster, let us denote the cluster-level treatment and control assignment as D i = 1, 0, respectively, where i = 1, . . . , .I For the j th individual (j = 1, . . . , n j ) in the i th cluster, we define {Y ij (1), Y ij (0)} as the counterfactual outcomes for each individual under treatment and control conditions. We are typically interested in the average causal effect, (0)), if the counterfactual outcomes are well-defined for the entire trial population. However, when outcomes are truncated, the average causal effect can only be defined for a subset of patients. Estimating the average causal effect in a meaningful subgroup in this setting involves 1) using the covariates and survival status to identify the potentially unobserved always-survivor stratum and 2) comparing counterfactual outcomes within the always-survivor stratum under treatment and control conditions. Below we describe models for each of these components.

Principal strata model for survivor status
We define S ij as the observed survival status of a patient, with S ij = 1 indicating survival and S ij = 0 indicating death. Under the potential outcomes framework, the joint values of potential survival status produce 4 strata. These include: 1. Always-survivors (S ij (1) = S ij (0) = 1): patients who will survive until the end of the study regardless of their treatment status. We make an additional assumption of monotonicity such that the treatment does not lead to worse survival, and thus the harmed stratum is assumed away (17,26). Monotonicity is a plausible assumption in trials when interventions are carefully piloted for safety considerations. However, we Protected individual or never-survivor a An asterisk ( * ) indicates truncation by death.
acknowledge that an intervention can lead to worse survival in some patients, and in such instances, the monotonicity assumption will be questionable. In scenarios where the proportion of a harmed stratum is close to 0, modeling the harmed stratum can also lead to computational challenges, in which case the monotonicity assumption represents a practical consideration for model-fitting. In our motivating trial, we considered the harmed patients to be rare (if not nonexistent) and undertook the analysis assuming monotonicity. Potential approaches to relaxing this assumption are discussed below. Under monotonicity, the principal strata membership is observed for survivors in the control group and the deceased in the treatment group, but it is unknown for survivors in the treatment group or the deceased in the control group. Table 1 presents the strata membership attribution for each observed data group under monotonicity. Baseline covariates play a critical role in identifying the principal strata. Suppose G ij = {00, 10, 11} indicates principal strata membership, where G ij = 00 for neversurvivors, G ij = 10 indicates protected individuals, and G ij = 11 indicates always-survivors. Assuming X ij as a covariate vector with both cluster-level and individual-level covariates and further including an intercept, the principal strata can be modeled using multinomial logistic regression with G ij = 11 as the reference group: Here, β and γ are p-dimensional regression coefficient vectors for never-survivors versus always-survivors and for the protected versus always-survivors, respectively; each component of β and γ is interpreted as the log odds ratio.
To ensure numerical stability, we recommend choosing the (likely) largest stratum as the reference category for the multinomial logistic model. In addition, cluster-level random intercepts can be added when principal strata membership is believed to be correlated due to cluster randomization. Alternatively, a nested probit model for the strata membership can be used (16); however, the regression coefficients are more challenging to interpret. Thus, we did not pursue the nested probit model further but derived its posteriors with a latent variable specification to support its use by interested readers (see Web Appendix 1, available at https://doi.org/10.1093/aje/ kwad038).

Models for potential outcomes
After defining the principal strata model, we specify counterfactual outcome models within each principal stratum. Specifically, only the always-survivors have well-defined counterfactual outcomes under both treatment and control conditions (Y ij (0), Y ij (1)), since they are not subject to truncation. Patients in protected strata have only 1 welldefined counterfactual outcome under treatment, but their counterfactual outcome under the control condition is truncated by death (Y ij (0) = * , Y ij (1)). Among the alwayssurvivors, each counterfactual outcome can be modeled as a function of covariates adjusting for clustering, whereas only 1 counterfactual outcome for each protected patient can be similarly modeled. Now, let α 11 1 , α 11 0 , and α 10 1 be p-dimensional vectors of regression coefficients for covariates in these 3 groups: always-survivors in the treatment group (G ij = 11, D i = 1), always-survivors in the control group (G ij = 11, D i = 0), and protected individuals in the treatment group (G ij = 10, Let η i be the cluster-level random effect following a normal distribution with mean 0 and variance τ 2 ; let the residual error follow a normal distribution with a mean equal to 0 and a variance of σ 2 . The assumed linear mixed models (LMMs) for the counterfactual outcomes can then be summarized as in Table 2. The random-effects term η i is required here to account for the intracluster correlation coefficient (ICC), ρ = τ 2 τ 2 +σ 2 , a quantity that is central to the design and analysis of CRTs, since ignoring the ICC in the outcome model leads to an inflated type I error rate (27). With the outcome model specified for always-survivors, the SACE can be defined as We leverage the mixture modeling assumptions and the monotonicity to jointly estimate the strata membership probability for each individual and the counterfactual outcome model parameters. This approach produces the treatment effect for each always-survivor, which can be averaged over the always-survivor subpopulation to identify the SACE. Beyond the mixture modeling approach, we acknowledge that other structural assumptions can be used to identify the SACE even in the absence of monotonicity (see, for example, Hayden et al. (28) and Shepherd et al. (29)).

Joint inference of the outcome model and the principal strata model
Posterior inference of the parameters in the principal strata model and the outcome model can be achieved through an MCMC algorithm. The algorithm is summarized in the Appendix, and detailed derivations for each step are presented in Web Appendix 1. The algorithm uses Gibbs sampling steps to update outcome regression model parameters, where conjugate priors of normal and inverse gamma distributions are specified (details are given in Web Appendix 1). The algorithm further implements a Metropolis-Hastings step for the principal stratification model. To increase the convergence speed for β and γ, we use a random-walk Metropolis algorithm (30) that draws proposals from multivariate t distributions, t(s β T β ) and t(s γ T γ ), that center at the values of the previous iteration. The parameters s β and s γ scale the covariance to achieve optimal acceptance rates, and both T β and T γ are p × p-dimensional component-specific scale matrices. We use the adaptive proposal approach by Haario et al. (31) to tune T β and T γ by utilizing empirical covariance from an extended burn-in. As indicated by the asterisks ( * ) in the Appendix, when the principal strata model also accounts for clustering (denoted as χ i for the random intercept, where χ i ∼ N(0, φ 2 )), it can be updated using the same approach as for β and γ.

Analysis of the motivating trial example
We applied our methodology to analyze data from the Whole Systems Demonstrator (WSD) Telecare Questionnaire Study (35,36). The WSD Telecare Study was a CRT randomized at the general practice level that evaluated the effect of telecare on the health-related quality of life and psychological well-being of 1,189 elderly recipients of social care in the United Kingdom over 12 months from 2008 to 2009. A total of 639 participants were randomized at the cluster level into the telecare arm, and 550 were randomized to usual care. Recipients were additionally clustered within general practices across 3 English local authorities. The rationale for the telecare intervention was not only its potential health benefit but also its advantage in terms of cost-effectiveness (37). There were 204 general practices in the study, and the cluster size varied from 1 to 26. The telecare arm installed electronic sensors in the homes of recipients that provided safety monitoring (e.g., recipient falls, fires in the home). Persons in the usual-care arm did not receive telecare. Our illustration focuses on health-related quality of life as measured via the Euro-Qol Group's EQ-5D-VAS index, a self-rated scale (score range, 0-100) with 5 domains (mobility, self-care, usual activities, pain and discomfort, and anxiety and depression) measured at 12 months postrandomization (38,39). Higher scores represent a better overall quality of life. The results reported herein vary slightly from the original published analysis due to different analytical methods and outcome data.
We considered trial participants to be nonsurvivors if they were deceased or had seriously deteriorated health, such that a self-reported health outcome for them could not be measured or collected and was thus undefined. Recipients with seriously deteriorated health included those who were too ill; were unable to continue due to dementia or deteriorated mental capacity; had moved to long-term nursing care, residential care, or sheltered housing; or had a family caregiver. For missing data on the baseline covariates and outcomes not due to death or seriously deteriorated health, we used multiple imputation (6,13) to impute a single data set to fill those missing entries. Note that a fully Bayesian approach that incorporates the imputation in the proposed algorithm can also be implemented. Since the goal of our illustration was not focused on the missingcovariate problem, we did not pursue this direction. The resulting data set had 127 (10.7%) cases with truncated outcomes.
For both the principal stratification model and the potential outcome models, our baseline covariates included sex, age, ethnicity, participants' highest level of education, an indicator for living in an only-adult household, number of comorbid conditions, impairment score, physical health score, mental health score, and EQ-5D-VAS index score (35). We used LMMs for the outcome regression model and multinomial logistic regression for the principal strata model, as previously specified. We did not pursue the more complex model that accounted for the random effects in the principal strata model because the average cluster size was too small and a handful of clusters had a size of 1, which would cause convergence issues (we discuss this further in the Discussion). Two MCMC chains of 100,000 iterations were implemented where the first 25,000 iterations were set as a burn-in. We started each chain using random initials. Model convergence and chain mixing were checked by means of trace plots. All analyses were performed using R 4.0.1. In addition, as a comparison model that might be frequently used in practice in the absence of our method, we fitted an LMM based on complete outcomes adjusting for the same baseline covariates.  Table 3 presents the simulation results for the key model parameters in the scenario of (m, n) = (15, 100). (Full results are shown in Web Appendix 2.) Relative bias and coverage are presented for α 11 1 , α 11 0 , the SACE, and the ICC. Our results show that the posterior mean values for most parameters were accurate, with less than 10% bias and more than 90% coverage. In particular, the SACE estimate was close to the truth (% bias = −5.2%), and the coverage probability was 0.93. The results for the other two scenarios of (m, n) = {(50, 30) , (25, 60)} were similar, where SACE estimates both had less than 5.0% bias and at least 0.95 coverage (see Web Tables 1-3 Tables 4-6), principal strata model specification with random effects (Web Tables 7-10), a small number of clusters and/or cluster sizes (Web Tables 11 and 12), and variable outcome ICCs (Web Tables 13 and 14) showed similar performance.

Illustrative analysis
As noted above, our approach allows for adjustment for covariates and clustering. As is common in CRTs, several prognostic variables were preselected for adjustment in the primary analysis; the variables we used in our analysis are listed in Table 4. Our model identified 88.8% of recipients as always-survivors, 2.2% as protected individuals, and 8.9% as never-survivors. The posterior mean of the ICC was 0.003, with a 95% credible interval (i.e., 2.5% and 97.5% of the posterior sample) of (0.000, 0.020), suggesting small intracluster correlation in EQ-5D-VAS scores. Table 5 shows the analytical results of our analysis. The SACE point estimate for the effect of telecare on EQ-5D-VAS score was −0.70, with a 95% credible interval spanning  potential effects that ranged from a decrease of −3.11 to an increase of 0.83. The estimated effect suggests that telecare did not markedly improve EQ-5D-VAS score in comparison with usual care in the principal stratum of always-survivors. Web Figure 1 depicts the average number of alwayssurvivors, protected individuals, and never-survivors by cluster based on the posterior sample of principal strata membership after burn-in. Many clusters had recipients that were possibly from all 3 strata. A notable advantage of our Bayesian approach is that the baseline characteristics of always-survivors can be obtained by averaging over the baseline covariates among always-survivors in the posterior sample of principal strata membership (summary provided in Web Table 15). In our illustration, demographic characteristics, socioeconomic status, and baseline health status among the always-survivors were similar to those in the overall population. This is unsurprising in this specific illustration, since 88.8% of recipients were identified as always-survivors.
Finally, in Table 5, the LMM point estimate based on the recipients with observed outcomes was −0.93 (95% confidence interval: −3.24, 1.38). This result was in line (again due to overlap in this specific illustration, but this is not guaranteed) with the SACE. However, it is important to note that while similar in our illustration, the LMM estimate does not have a causal interpretation because the analytical sample for the LMM estimate included recipients who were deemed to belong to the protected stratum. Intuitively, it suggests that the protected individuals in the telecare arm who were likely to die tended to be those with worse health outcomes in the treatment group.

Conclusion
The SACE is a well-defined causal estimand that describes the effect of an intervention among participants who would survive regardless of their randomized assignment in a trial. We used Bayesian principal stratification to estimate the SACE in a CRT where the hierarchical data structure due to clustered randomization was accounted for in the modeling and data analysis. Since stratum membership is not fully identifiable for some participants, Bayesian estimation is a particularly attractive strategy with which to address uncertain stratum membership of the counterfactual survivor status. Notably, our approach considers the model-based credible interval estimation under the Bayesian framework, and therefore differs from the usual cluster-robust variance approach under the frequentist framework (40). The extent to which a cluster-robust variance approach applies to our Bayesian modeling framework merits additional research.
In our simulations, we observed low bias and good coverage of the true SACE parameter with our methods. In our analysis of the WSD telecare trial, 88.8% of the participants were identified as always-survivors, and the SACE suggested no significant change in the health-related quality of life measure (EQ-5D-VAS score) at 12 months (SACE = −0.70, 95% credible interval: −3.11, 0.83). This point estimate was slightly smaller than that from a complete-case analysis using a naive LMM model (point estimate = −0.93, 95% confidence interval: −3.24, 1.38). We note that our result is based on only a one-time measurement taken at 12 months, which differs from the original result published by Hirani et al. (35), where they utilized repeated outcome measures at different time points and considered different covariates. Adapting the principal stratification framework for CRTs with repeatedly measured outcomes requires additional methodological development. While our analysis of the WSD Telecare Study showed a limited effect on the EQ-5D-VAS outcome, telecare may affect other physical health outcomes or have cost-effectiveness properties due to prevention or earlier intervention on health needs.

Practical recommendations
During the development and implementation of our methodology, we identified some analytical considerations that users may need to consider. First, the monotonicity assumption may be plausible for practice-level interventions like the one used in the WSD study setting, where installing electronic sensors in the telecare arm was unlikely to harm participants. Before the implementation of many medical trials, interventions are evaluated in pilot studies with safety monitoring; thus, this assumption may often be reasonable. Relaxing the monotonicity assumption by adding the "harmed stratum" (i.e., participants who die in the treatment group but survive in the control group) is possible. However, fitting the mixture model with an additional harmed stratum is an added layer of computational considerations, and additional simulations are needed to fully understand the benefit of including this stratum when the harmed population is relatively rare. Beyond the mixture modeling framework, other types of structural assumptions or sensitivity parameters are necessary to relax the monotonicity assumption (28,29,41,42), and they represent a fruitful direction for future investigations in the context of CRTs.
Second, our simulation studies showed that the use of noninformative priors can achieve adequate performance without sufficient knowledge of key model parameters from existing studies. However, Bayesian approaches have an inherent advantage of leveraging existing knowledge through informative priors on key parameters to sharpen the model performance. For example, Turner et al. (43,44) have demonstrated that compared with the noninformative priors, incorporating informative half-normal and β priors on the outcome ICC parameter (based on published ICC estimates) can produce narrower credible intervals for the outcome ICC and variance components parameters. We anticipate this finding to be applicable for estimating the SACE.
Third, while we have provided methods to account for clustering in the principal strata membership model, the model-fitting can be substantially more challenging than its counterpart without clustering. In the analysis of the WSD trial, the principal strata membership model failed to converge due to several extremely small clusters. Thus, we did not consider clustering in the principal strata membership model. Therefore, in practice, specifying more complex principal strata membership models often requires the absence of extremely small clusters. On the other hand, our additional simulation results (Web Tables 7-10) showed that when the ICC in the principal strata model is not exceeding 0.10, specifying the principal strata model without a random intercept can still achieve adequate performance characteristics and can be sufficient. However, we acknowledge that our assessment was limited to the data-generating process we considered, and a more systematic comparison between clustered and unclustered principal strata models in CRTs is warranted.
Lastly, our additional simulation (Web Tables 11 and 12) showed that our method can be employed to estimate the SACE in relatively small CRTs (e.g., 20 clusters with a cluster size of 25). We caution against using our model in even smaller CRTs, as additional research to address smallsample challenges in these settings is needed.
The use of the SACE in CRTs (and individually randomized trials) also requires some practical considerations from the design perspective, since the always-survivors stratum is a subset of trial participants that cannot be identified until after randomization and trial completion. Though it is a more attractive target estimand and thus alternative to composite outcomes, imputation, or other approaches to dealing with truncated outcomes, there is always a threat of loss of statistical power with the SACE because of the inherently smaller sample size of this stratum. However, this concern may be partially offset when the treatment effect is likely to be larger in the always-survivors stratum than in the overall population (i.e., average treatment effect) (45,46). Relatedly, without knowing the size or characteristics of the always-survivors stratum prior to a study, power calculations may be a challenge, particularly when sample sizes are constrained and cannot be increased. Thus, we believe we can offer 3 practical recommendations for using the SACE in a trial. First, the SACE may be most ideal as a preplanned secondary analysis in trials with smaller available sample sizes, and only considered for the primary analysis in larger pragmatic trials or in trials where effect sizes and alwayssurvivor rates can be anticipated with reasonable certainty to be in some range, and thus available sample sizes are adequate (45). Second, as is recommended when working with other uncertain trial design elements, we recommend that Monte Carlo simulation studies be undertaken to assess statistical power (47,48). Jo (45) provides statistical power guidance for trials with treatment noncompliance that is potentially relevant to those using the SACE. Closed-form sample size solutions for noncompliance in CRTs could be extended to the SACE in future work. Third, when there is interest in conducting primary analysis to estimate the SACE, approaches for sample size reestimation with preplanned interim analysis (49) may be considered, but they may require future development.