A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes

Summary Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England’s Test and Trace programme for COVID-19.


INTRODUCTION
This article considers the problem of estimating from observational data the causal effect of an intervention (or treatment/policy) on an outcome of interest in applications where: (i) there are multiple sample units, some of which receive the intervention; (ii) for each unit, the outcome of interest is measured at multiple time points; and (iii) the units receiving the intervention and the times at which the intervention is implemented are not randomized.This problem arises often in epidemiology (Barnard et al., 2022), public health (Bruhn et al., 2017), economics (Hsiao et al., 2012), marketing (Brodersen et al., 2015), political science (Xu, 2017), and social studies (Ben-Michael et al., 2023).Often the sample units in such applications represent groups of individuals; for example, the units might be hospitals or administrative regions.See Ferman et al. (2020) and Samartsidis et al. (2019) for a list of real-world examples.
Existing approaches have limitations.First, for some of them, it is hard to obtain uncertainty intervals for the causal effects of interest.Second, most of them are designed to deal with continuous outcomes.An exception is Nethery et al. (2023), who develop a factor model for a single count outcome.However, these authors do not consider binomial outcomes, which arise frequently in applications; nor do they account for uncertainty in the number of latent factors.Third, few of these methods allow joint modeling of multiple outcomes affected by the intervention.Such joint modeling may improve statistical efficiency, which is particularly valuable when evaluating policies in contexts where data are sparse, i.e.only available annually or quarterly over a limited number of years.In recent work (Samartsidis et al., 2020), we used the multivariate factor analysis model of De Vito et al. (2021) to jointly model multiple normally distributed outcomes.However, this model cannot be applied to binomial/count data.It also assumes that variability shared across any of the multiple outcomes is shared across all the multiple outcomes.
Here, we develop a general approach to causal inference from time-series observational data that tackles these limitations.This method uses a multivariate factor analysis model that can be applied to multiple outcomes of mixed type (continuous/binomial/count) and allows for sharing of the loadings parameters across these outcomes.The model is Bayesian and thus automatically provides uncertainty quantification for all quantities of interest.Our model accounts for uncertainty in the number of latent factors and allows sharing of loadings across any subsets of the multiple outcomes.The Markov chain Monte Carlo (MCMC) algorithm we propose for fitting the model combines modern data augmentation and sampling techniques to overcome slow mixing issues stemming from the well-known identifiability problems of factor analysis models.The potential benefits of joint outcome modeling are demonstrated via a simulation study.We apply the proposed method to real data on a new policy for COVID-19 contact tracing in UK, providing valuable insights on the effectiveness of this policy.Finally, although our model is developed for intervention evaluation problems, it can be of use in other fields where multivariate factor analysis models are of interest, such as cancer biology (Avalos-Pacheco et al., 2022).
The remainder is structured as follows.In Section 2, we introduce the motivating COVID-19 contact tracing policy evaluation problem.Section 3 presents our Bayesian model.In Section 4, we discuss model fitting.Section 5 presents a simulation study that demonstrates the benefit of jointly modeling multiple outcomes.Section 6 contains results from our motivating data set.Finally, Section 7 summarizes the article's findings and lists some directions for future research.

COVID -19 CONTACT TR ACING POLIC Y EVALUATION PROBLE M
The NHS Test & Trace (TT) programme was launched in May 2020 as part of the containment strategy for the COVID-19 pandemic in UK.One of its main functions was to ensure that individuals who tested positive for COVID-19 (the cases) were aware1 of their infection, and to inform them of their legal duty to self-isolate to prevent onward transmission.When TT communicated with a case (henceforth this communication will be referred to as tracing), the case was asked to list the individuals (contacts) with whom he/she had been in close physical proximity within 48 h of his/her symptom onset (for asymptomatic cases, date of positive test is used).TT would then attempt to reach these contacts to provide advice and tell them to self-isolate.Multiple studies have provided evidence in favor of TT (Fetzer and Graeber, 2021).
Starting in July 2020, local authorities gradually introduced local tracing partnerships (LTPs).These LTPs involved the local authority working with national TT to improve tracing of cases and contacts.An LTP team was composed of staff based locally, whose job was to trace cases resident in that local authority who had not been able to be traced by TT.This is done mainly by telephone, often using the team's local knowledge and, in several authorities, by visiting a case's home if telephone communication had not been successful.The question we address here is whether LTPs (the intervention) have had an impact on the effectiveness of TT.Which local authorities introduced LTPs, and when, depended mainly on the interest expressed by local authorities, rather than being randomized, which complicates estimation of the causal effects of LTPs.
We judge the effectiveness of TT in terms of four outcomes measured daily: (i) the proportion of cases on that day that were ultimately completed, where a case is considered complete if TT manage to contact him/her; (ii) the proportion of completed cases whose completion was timely, i.e., occurs within 48 h of the case being registered with TT; (iii) the total number of contacts elicited from cases completed on that day; and (iv) the proportion of these elicited contacts who were subsequently completed.We shall refer to these as case completion, timely case completion, number of contacts and contact completion, respectively.We have daily measurements of these four outcomes on 181 spatial units during the study period July 01, 2020 to November 15, 2020 (i.e., 138 days).Each of these units is either an upper-tier local authority (UTLA) or part of a UTLA.
Figure 1 provides some graphical summaries of the data.The roll-out of LTPs is summarized in Fig. 1(a).The first LTP was formed on July 15, 2020 and by November 15, 2020, 118 (66%) of the units were covered by an LTP. Figure 1(b) shows the number of cases for each day and unit; during July-September the number of cases is low for most of the units, but it increases thereafter.The data on the four outcomes (Fig. D.1 of the supplementary material available at Biostatistics online) show considerable variability across units for the same day and across time within the same unit.The existing methods for policy evaluation listed in Section 1 cannot be used to investigate whether LTPs improved some of these outcomes because they are not applicable to binomial data.Let i = 1, . . ., N index the sample unit and t = 1, . . .T denote time (e.g., in days) relative to a fixed calendar time (e.g., July 1st 2020 in our application).Let N 1 and N 2 = N − N 1 be the numbers of control units and treated units, respectively (i.e., units that do not and units that do experience the intervention).Without loss of generality, assume units are ordered so that 1, . . ., N 1 are control units.Let T i denote the last time at which unit i has not experienced the intervention (for control units, T i = T).Let T min = min{T 1 , . . .T N } and Ñt = N i=N 1 +1 I(T i < t).Suppose there are D 1 continuous, D 2 binomial and D 3 count outcomes.Let y itd be the observed value of the dth continuous outcome (d = 1, . . ., D 1 ).For the dth binomial outcome (d = 1, . . ., D 2 ), let k itd be the observed number of successes in a known number, n itd , of independent Bernoulli trials.Let z itd be the observed value of the dth count outcome (d = 1, . . ., D 3 ).
We shall define the causal effect of the intervention on any one of the D = D 1 + D 2 + D 3 outcomes in unit i at time t in terms of the difference between its observed value and the outcome that unit i would have if it were not treated at time t.The latter value is called a potential untreated outcome (Holland, 1986).We denote the potential untreated continuous, binomial, and count outcomes as y (0)  itd , k (0) itd , and z (0) itd , respectively.We also define the potential treated outcomes y (s)  itd , k (s) itd , and z (s)  itd (s = 1, . . .T i ) as the outcomes that unit i would have at time t if time s had been the last time at which unit i had not experienced the intervention.We make the stable unit treatment value assumption (SUTVA), which implies that the potential outcomes of one unit do not depend on whether, or at what time, the intervention is applied to other units.We further make the notreatment-anticipation assumption (Athey and Imbens, 2021) itd , and z (s) itd = z (0) itd for all t ≤ s.With these assumptions, the observed outcomes are related to the potential outcomes by There is a growing interest in using factor analysis (FA) to model the potential untreated outcomes (see Section 1 for references), or as the data generating mechanism to study the properties of synthetic control-type approaches (Abadie et al., 2010;Hsiao et al., 2012;Ben-Michael et al., 2021;Ferman, 2021).This is due to the FA model's ability to adjust for observed covariates and to allow for the presence of unmeasured confounding with a particular structure.Here, we propose a multivariate, mixed-outcome FA model for the potential untreated outcomes.Specifically, for each i = 1, . . ., N and t = 1, . . ., T, we assume Here, f td , g td , h td are J-vectors of unobserved factors at time t; λ i is an unobserved J-vector of loadings for unit i; x it is a P-vector of (possibly time-dependent) exogenous (not affected by the intervention) covariates; and η ,d are its regression coefficients ( = 1, 2, 3; the degree of overdispersion relative to the Poisson distribution.Here, w itd is a known offset; if there is no offset, w itd = 1.In some applications, there might be enough information to estimate a separate dispersion for each unit; in such cases, we can replace ξ d by ξ id .We assume that.for all i = 1, . . ., N, Causal multivariate factor analysis • 871 This implies that after controlling for observed potential confounders x i1 , . . ., x iT and unobserved potential confounders λ i , there is no confounding of the causal effects of the intervention on the dth continuous, dth binary and dth count outcome of unit i > N 1 at time t > T i , defined as respectively.An alternative to γ itd would be to define the causal effect on the dth binomial outcome of unit i at time t in terms of the success probability of the Bernoulli trials.To do this, we would assume k ) and define the causal effect as For the dth continuous outcome, we define (with slight abuse of notation) the average (over time) causal effect in unit i as α id = T t=T i +1 α itd /(T − T i ) and the overall (over both time and units) causal effect as Corresponding average causal effects, β id , β d , γ id , γ d , δ id , and δ d , for binary and count outcomes are defined analogously.
Policy makers are also interested in identifying treated units whose response to the intervention (i.e., causal effect) is extreme compared to other treated units.This could be useful, for example, to identify units to which alternative interventions should be applied.For the dth continuous outcome, unit i and time t > T i , define r (α)  itd as the (scaled) rank of the α itd values of the Ñt units treated by time t, i.e., r (α)  itd = j:T j <t I(α jtd ≤ α itd ) Ñt + 1 .
( 3 .4 ) In (3.4), we have scaled by Ñt + 1 −1 to ensure that ranks are comparable between any times t and s (both > T min ) for which Ñt = Ñs .Further, let r (α) id = T t=T i +1 r (α)  itd /(T − T i ) be the average (over time) rank of unit i. Ranks for β, γ , and δ are defined analogously.

LTP application
In our motivating problem, where intervention is the introduction of an LTP, N = 181, T = 138, D 1 = 0 (no continuous outcomes), D 2 = 3 (case completion, timely case completion and contact completion), D 3 = 1 (number of contacts), and N 1 = 63 units have no LTP during study period.
For the first binary outcome (case completion), n it1 is the number of cases registered on day t for unit i, and k it1 is the number of these cases that were ultimately completed.For the second binary outcome (timely case completion), n it2 is the number of cases completed on day t for unit i, and k it2 is the number of these cases that were completed within 48 h.For the count outcome (number of contacts), w it1 is the number of cases that were completed on day t for unit i, and z it1 is the number of contacts elicited from these w it1 cases.For the third binary outcome (contact completion), k it3 is the number of contacts that were ultimately completed from the n it3 = z it1 contacts.No covariates x it are considered.
In Section 3.1, we assumed n itd and w itd are not affected by the intervention.In the LTP application, however, n it2 = k it1 and n it3 = z it1 , which may both be affected.Also, w it1 is related to k ij1 and k ij2 for j ≤ t, and so may be affected.It is also possible that n it1 is affected (successful tracing can prevent new cases).This changes the interpretation of the causal effects γ it1 , γ it2 , γ it3 , and δ it1 .As shown in Section A of the supplementary material available at Biostatistics online, they can be interpreted (under some additional assumptions) as separable direct effects on k it1 , k it2 , k it3 , and z it1 , respectively.These are different from the corresponding total effects, defined as itd and δitd = z itd − z(0) itd , respectively, where k(0) itd and z(0) itd are obtained by replacing n itd and w itd with their potential untreated outcomes in (3.1).We do not consider the total effects (even though they can be estimated), because we believe they can be misleading.For example, suppose that for some i > N 1 and t > T i , p (T i ) it3 = p it3 and n it3 > ñit3 , that is, LTP has no impact on contact completion probability but increases the number of contacts elicited.Then k it3 > k(0) it3 (because larger number of trials), and hence γit3 > 0, suggesting that the LTP increased the number of contacts completed.In contrast, γ it3 , interpreted as the total number of additional contacts completed thanks to the LTP, would be zero.

Outline of inference
In order to estimate the causal effects defined in Section 3.1, we need to estimate the potential untreated outcomes of the treated units i > N 1 in their postintervention periods t > T i .We do this under the Bayesian paradigm as follows.
First, we set prior distributions for the parameters of the multivariate FA model (3.1).These are detailed in Section 3.4.We then fit, using the MCMC algorithm in Section 4, the FA model to preintervention data only (i.e., for each i, we only use data up to time T i ).The posterior samples that we obtain allow us to draw samples from the posterior predictive distribution of the potential untreated outcomes conditional on x it , n itd and w itd .We transform them using (3.2), to obtain samples from the posterior distribution of causal effects α itd , γ itd , and δ itd .For β itd , we further need to account for uncertainty in p (T i )  itd .For each i > N 1 , t > T i , and d = 1, . . ., D 2 , we a priori assume that p We obtain samples from the posterior of β itd using (3.3), i.e., by subtracting the draws from the posterior of p itd from the samples obtained from the beta posterior of p (T i )  itd .Let α ( )  itd denote the th sample ( = 1, . . ., L) from the posterior distribution of α itd .We estimate α itd as L =1 α ( ) itd /L (the posterior mean) and calculate 95% posterior credible intervals (CIs) using the 2.5% and 97.5% percentiles of the α ( )  itd .Samples from the posterior (and hence point estimates and 95% CIs) of α id , α td , α d , and r (α)  itd can be obtained from the α ( ) itd using the expressions provided in Section 3.1.For binomial and count outcomes, point estimates and 95% CIs are obtained analogously.

Prior distributions
One of the main challenges in fitting the FA model (3.1) is that the number of latent loadings J is not known.To account for uncertainty in J, we start by setting it to J * , an upper limit for the number of factors we expect.Following Gao et al. (2016), we assign a three-level three-parameter beta prior to the loadings.In particular, for j = 1, . . ., J * we let Zhao et al. (2016), we set a λ , . . ., f λ = 0.5 and ν = 0.1.
Let be the N × J * matrix with rows λ i .The prior (3.5) induces three levels of regularization in (Zhao et al., 2016).The overall matrix shrinkage is controlled by ρ.The ζ j induce loading-specific shrinkage.When ζ s is estimated as ≈ 1 for some 1 ≤ s ≤ J * , we will also have that λ is ≈ 0 for all i, thus effectively removing the sth column of .This allows us to recover the effective number of factors supported by the data as the number of nonzero2 columns in .Finally, the φ ij allow the scale of the jth loading to differ across the N units.This is useful because given the interpretation of λ ij as potential unobserved confounders, one may expect them not to be similar across units, especially between control and treated units.
It is possible that each element λ ij of λ i affects only some of the outcomes.To address this issue, one potential solution would be to assume a different loadings vector for each outcome.However, this is inefficient, especially when T i 's are small (see simulation study of Section 5).Another solution would be to assume that there are loadings that are specific to each outcome and loadings that are shared by all outcomes (De Vito et al., 2021).However, such an approach does not allow for loadings that are shared by subsets of the outcomes.Grabski et al. (2023) propose a model for normal outcomes that allows for shared loadings among a subset of outcomes.However, v j are effectively either zero or one, thus not allowing loadings to affect the outcomes to different extents (which is the case when 0 < v j < 1).Here, we allow for this via the prior on the factor parameters.
Let f tdj , g tdj , and h tdj denote the elements of f td , g td , and h td , respectively (j = 1, . . ., J * ).We assume that f tdj ∼ N(0, s 1,dj ), g tdj ∼ N(0, s 2,dj ) and h tdj ∼ N(0, s 3,dj ).For each j, let v j be a vector of size D with elements v j defined as We introduce variables M j ∈ {1, . . ., D} which, for each j, indicate the outcome "most affected" by loading j.Conditional on M j = l, we set v jl = 1 and assume v j ∼ Uni[0, 1] for each = l.When v j is close to zero for some , the values of the corresponding factor will be close to zero, and hence the effect of loading j on outcome will be small.An alternative (which we test on our real data) is to let v j ∼ Uni[0, 1] for all j and .This introduces more free parameters than needed but has worked well in previous studies (Roy et al., 2021).
For fixed d and j, we have chosen not to impose any temporal structure on f tdj T t=1 (or g tdj T t=1 , h tdj T t=1 ) to simplify posterior computations.We expect the loss in efficiency to be small.The reason is that in policy evaluation problems, the number of control units at each time point is typically sufficiently high to allow accurate estimation of factor parameters without the need to impose further structure, see e.g., simulation study in Samartsidis et al. (2020).
For the remaining model parameters, we specify weakly informative prior distributions.For all i and d = 1, . . ., D 1 , we assume σ 2 id ∼ Uni(0, 10 2 ).We prefer this prior to the conjugate IG ( , ) with small , because the latter encourages values of σ 2 id near zero.This can lead to over-fitting as non-systematic fluctuations are encouraged to be absorbed by the loadings/factors term.For all = 1, 2, 3 and d, we assume that regression coefficients η ,d ∼ N(0, 10 2 I).For d = 1, . . ., D 3 , we let ξ d ∼ Uni(0, 20), i.e., we assume that the variance of the counts is at most 20 times higher compared to a Poisson model.For each loading j, we set P(M j = ) = 1/D for = 1, . . ., D.
Finally, we set a p = b p = 1 i.e., let p (T i ) itd ∼ Uni(0, 1).The choice of prior on p (T i ) itd is not crucial for the LTP data as n itd (t > T i ) are large, allowing us to estimate these parameters accurately.However, when n itd are low, the posterior uncertainty in p (T i )  itd under independent uniform priors will be high, leading to high uncertainty in β itd (see (3.3)).In such cases, efficiency can be gained by imposing structure on p (T i ) itd .For example, one can assume that for fixed i and d, logit(p (T i ) itd ) arise from an AR(1) process.

POSTERIOR COMPU TATIONS
We are interested in drawing samples from the high-dimensional posterior with density where .
The reason for grouping some parameters in θ is that they are easy to sample.Posterior (4.6) is analytically intractable; we thus use MCMC to sample from it.We propose a Metropolis-within-Gibbs sampler in which subsets of parameters are drawn (updated) in blocks from their full conditionals.Here, we provide an outline of the algorithm; for further details see Section B of the supplementary material available at Biostatistics online.
The most challenging step in the proposed MCMC algorithm is to update λ i , g td , and h td .Their full conditionals are not available in closed form and hence Gibbs updates are not possible.Moreover, standard algorithms which employ a constant variance-covariance matrix in their proposal (e.g., MALA, preconditioned MALA, HMC) can be inefficient in FA models.This due to the label-switching, rotation, and scale ambiguity problems (Zhao et al., 2016, among others).Consider, for example, the choice of a proposal variance for an element h tdj of h td .Tuning this value is nontrivial as h tdj may represent a different factor (due to label-switching) or change its scale (due to scale ambiguity) at each iteration of MCMC.To overcome these problems, we make use of data augmentation techniques.In particular, we employ methods developed in the case of univariate models (Polson et al., 2013;Zhou and Carin, 2015).As explained below, introducing auxiliary variables enables efficient sampling of λ i , g td , and h td .Finally, we note that under the proposed data augmentation scheme, drawing the remaining parameters η 2,d , η 3,d , and ξ d is also conducted efficiently, see the remaining of this section.
Let PG(a, b) denote the Pólya-Gamma distribution with parameters a and b.Following Polson et al. (2013), for each i = 1, . . ., N, t ≤ T i and d = 1, . . ., D 2 , we introduce latent variables ω itd which are a priori PG(n itd , 0) distributed.Using results from Polson et al. (2013), we can show that the likelihood contribution of each binomial data point conditional on ω itd is where κ itd = k itd − n itd /2.The likelihood (4.7) is a quadratic form of both g td and η 2,d , and these parameters have multivariate normal priors.Therefore, introducing ω itd allows us to draw each g td (d = 1, . . .D 2 and t = 1, . . ., T) and η 2,d (d = 1, . . ., D 2 ) from their multivariate normal full conditionals (see SM Section B).Further, the full conditional of ω itd is a PG(n itd , λ i g td + η 2,d x it ) (Polson et al., 2013), a result which allows us to update these parameters easily.
Let CRT(a, b) denote the Chinese restaurant table distribution with parameters a and b.Following Zhou and Carin (2015), for each i = 1, . . ., N, t ≤ T i and d = 1, . . ., D 3 , we introduce latent variables L itd ∼ CRT(z itd , w itd q itd /ξ d ).These can be drawn as L itd = z itd l=1 b l , where b l ∼ Bernoulli( w itd q itd /ξ d w itd q itd /ξ d +l−1 ).Recall that q itd = exp λ i h td + η 3,d x it .From Section 3.1 of Zhou and Carin (2015), we have that where S(a, b) are Stirling numbers of the first kind.In (4.8), q itd only appear in the second term (Poisson probability mass function).As shown in Section A of the supplementary material available at Biostatistics online, this enables us to update h td (d = 1, . . ., D 3 , and t = 1, . . ., T), η 3,d (d = 1, . . ., D 3 ) and λ i (i = 1, . . ., N) using the simplified manifold Metropolis adjusted Langevin (SM-MALA) algorithm (Girolami and Calderhead, 2011), which would be very hard unless introducing the latent variables L itd and ω itd .The SMMALA algorithm exploits the geometry of the parameter space to adapt the variance-covariance matrix of its normal proposal at each iteration of MCMC, thus mitigating the problems caused by label-switching and scale ambiguity explained above.In our simulation studies, we found that SMMALA massively outperformed MALA and HMC, for which convergence was extremely slow.Finally, we update each negative binomial dispersion parameter ξ d (d = 1, . . ., D 3 ) using the Barker method (Livingstone and Zanella, 2022).To reduce dependence on the L itd (t = 1, . . ., T i ), we integrate these parameters out when updating ξ d .This is straightforward using (4.8), see Section A of the supplementary material available at Biostatistics online.The Barker method requires the specification of a stepsize parameter; following Livingstone and Zanella (2022), we tune this for each i during the burn-in phase of the MCMC to achieve an acceptance rate near 40%.

SIMUL ATION STUDIES
We perform a simulation study to demonstrate benefits, in terms of efficiency of causal effect estimates, that can be obtained by modeling multiple outcomes jointly (i.e., loadings are shared across outcomes) rather than individually (i.e., each outcome has its own loadings matrix).See Section C of the supplementary material available at Biostatistics online for additional simulations in settings that differ from the one considered here.

Data generating mechanism
We simulate B = 2, 500 synthetic data sets.For each one, we generate the untreated outcomes from the FA model of (3.1) with D 1 = D 2 = D 3 = 1, T = 24 (2 years of monthly data), N = 80 and λ i , f t , g t , h t ∈ R 7 .Since we consider only one outcome of each type, we omit the outcome index d for the remainder of this section to ease notation.
For all i, we set λ i1 = 1 (to control the mean), and draw λ ij ∼ N(0, 1) for j = 2, . . ., 7. Let f j = f 1j , . . ., f Tj , gj = g 1j , . . ., g Tj , and hj = h 1j , . . ., h Tj .For each j, we generate f j , gj and hj from a N(m 1j 1, s 2 1j R), N(m 2j 1, s 2 2j R) and N(m 3j 1, s 2 3j R), respectively, where 1 is a T-vector of ones and R is a T × T correlation matrix.The values of m dj and s j ( = 1, 2, 3) are shown in Table 1.These specifications imply that E(μ it ) = 5, E(p it ) = 0.6, and E(q it ) = 4.Some of the s j are set to zero so that each loading j affects only a subset of the outcomes.For fixed outcome, the nonzero s j are all equal for simplicity.The values of s are chosen such that the 97.5% quantiles of μ it , p it , and z it (over units, times, and simulations) are roughly 7.5, 0.85, and 10, respectively.R has elements R ts = exp (log (0.8)|t − s|).This nondiagonal R introduces temporal correlations within the data on each unit.The value of σ is chosen such that λ i f t accounts for 80% of the variability in the y it .We draw the n it from a Pois(ñ it ) distribution.For each i, ñi1 = 5, ñiT is drawn from a Uni(50, 200) distribution and ñit = ñi1 + t−1 T−1 (ñ iT − ñi1 ) for 1 < t < T. Similarly, we draw the w it from a Pois( wit ) distribution where for each i, wi1 = 5, wiT is drawn from a Uni(25, 75) distribution and wit = wi1 + t−1 T−1 ( wiT − wi1 ) for 1 < t < T. We have made ñit and wit increasing to mimic the LTP data of Section 5. Finally, we set ξ = 2.
In each data set, T i are chosen as follows.For every i and t, we draw u it ∼ Uni(0, 1) and let it = expit κ 0 + κ 1 p it + κ 2 q it t > t min 0 t ≤ t min , ( 5 .9 ) where expit(•) = exp (•)/(1 + exp (•)).Then, for each i we set T i = min {t : u it < it }.We set the minimum number of pre-intervention time points to t min = 8.The value of κ 0 is chosen such that the average over simulated data sets N 1 is 40.The values of κ 1 and κ 2 control the degree of unobserved confounding of the effects β it and δ it , respectively.We set κ 1 such that the average (over simulated data sets) value of the empirical completion probability k it /n it is roughly 10% higher in controls units for all t > t min .The value of κ 2 is chosen such that on average (over simulated data sets), z it /w it is roughly 10% higher in controls units for every t > t min .For each simulated data set b, we perform a multivariate (MV) and a univariate (UV) analysis.MV analysis is carried out by fitting the FA model of Section 3 to all three outcomes jointly.UV analysis is carried out by fitting the FA model of Section 3 to each one of the outcomes individually.In both cases, the models that we fit are correctly specified.For both MV and UV analyses, we run MCMC for 100,000 iterations, saving posterior draws every 50 iterations to obtain 2,000 posterior draws.Of these, 500 are discarded as burn-in.The maximum number of factors J * in MV and UV analyses is set to 25 and 15, respectively.
For each b, we generate the data of treated units in their post-intervention period under L = 50 different scenarios, corresponding to causal effects of increasing magnitude.Thus, we obtain L different estimates (and CIs) for all causal effects defined in Section 3.For each = 1, . . ., L, let α( ) it , β( ) it , and δ( ) it be the causal effect of intervention on μ it , logit(p it ) and log q it , respectively.We use the tilde (•) notation to indicate that these are the effects on the mean function rather than on the outcome as in (3.2).For the binomial and count outcomes, the effects are applied on the logitscale and log-scale, respectively, to avoid negative values.For each , we assume that for all i : T i < T Figure 2 in Appendix B of the supplementary material available at Biostatistics online shows the power for different values of T i .For reasons explained above, we see that the improvements in power achieved by the MV approach compared to the UV are greater when T i is either 8 or 16.
For binomial outcomes, the uncertainty in the estimates of a treated unit's potential untreated outcomes p it and k (0)  it , and thus the accuracy of the estimates of β it and γ it , depends on both T i and the values of n it in the pre-intervention period.More specifically, the lower the counts {n it } T i t=1 are, the less information to estimate the loadings there is.Similarly, for count outcomes, the uncertainty in the estimates of a treated unit's z (0)  it (t > T i ) depends on {w it } T i t=1 .Figure 3(a) presents a heatmap of the CI width for β i obtained by the MV approach for different combinations of T i and ni = 1 T i T i i=1 n it .For any fixed T i , the width of CIs decreases with ni .Figure 3(b) shows the percentage decrease in CI width achieved by the MV approach compared to the UV approach for different combinations of T i and ni .We see that for fixed T i , the gains in efficiency due to joint outcome modeling are similar for the different values of ni .For the count outcome, we present the power achieved for moderate δi (≈ 1.25) as a measure of efficiency; see Fig. 3(c).We find that power w it /T i .(d) % increase in power of detecting a moderate effect on δ i achieved by the MV model compared to the UV.In all panels, entries that were obtained as the average of less than 50 simulated data sets were discarded.
increases with wi for fixed T i .Figure 3 (d) shows the percentage increase in power for moderate δ achieved by the MV approach compared to the UV.Again, we see that for fixed T i , the gains do not differ much over across the wi .

EVALUATION OF LTPS
In this section, we apply our multivariate FA model to the LTP data of Section 2. We set J * = 25 and ran three MCMC chains starting from different initial values.For each chain, we ran for 250,000 iterations, discarding the first 50,000 as burn-in, and thinning the remaining draws at every 100 iterations to obtain 2,000 draws from the posterior.Convergence was assessed visually  by comparing the posterior densities and trace plots obtained from the three chains for multiple randomly selected causal estimands; these showed that all three chains have converged to the same stationary distribution and that mixing was good.Results for the case completion outcome are shown in Fig. 4. Figures D.2,D.3,and D.4 of the supplementary material available at Biostatistics online show results for the remaining outcomes.
Potential outcomes.Figure 4(a) refers to a unit, say ι, chosen for illustration.The plot shows 95% CIs for p ιt1 and p (T ι )  ιt1 in yellow and blue, respectively.For most days, the bands for p (T ι ) ιt1 are above the bands for p ιt1 , indicating that the LTP had a positive effect on completion probability on these days.We also see that the 95% CIs for p ιt1 are substantially wider than the CIs for p (T ι )  ιt1 ; this is due to the large number of cases in the post-LTP period, allowing p (T ι )  ιt1 to be estimated with high precision.Hence, uncertainty in β ιt1 is mainly due to the uncertainty in p ιt1 .

Figure 3 .
Figure3.Effect of n it and w it on the efficiency of the causal estimates.(a) average (over simulated data sets and treated units) CI width of β i achieved by the MV model, as a function of ñi = T i t=1 n it /T i and T i .(b) average (over data sets and units) % reduction in CI width of β i as a function of ñi and T i achieved by the MV model compared to the UV.(c) power of detecting a moderate intervention effect from δ i achieve by the MV model, for different values of wi = T i t=1 w it /T i .(d) % increase in power of detecting a moderate effect on δ i achieved by the MV model compared to the UV.In all panels, entries that were obtained as the average of less than 50 simulated data sets were discarded.

Figure 4 .
Figure 4. Results for outcome completion.(a) 95% CIs for p ιt1 and p (T ι ) ιt1 , where ι is a unit chosen for illustration.(b) scatterplot of the posterior mean of β it1 .Filled yellow dots represent β ιt1 .Panels (c) and (d) show posterior summaries of β i1 and γ i1 , respectively, where units have been ordered by increasing mean posterior β i1 in both plots.The vertical yellow lines indicate unit ι.Boxes and whiskers represent the 75% and 95% CIs, respectively.

Table 1 .
Mean and standard deviation used to simulate the seven factors of Section 5.