Target Trial Emulation and Bias Through Missing Eligibility Data: An Application to a Study of Palivizumab for the Prevention of Hospitalization Due to Infant Respiratory Illness

Abstract Target trial emulation (TTE) applies the principles of randomized controlled trials to the causal analysis of observational data sets. One challenge that is rarely considered in TTE is the sources of bias that may arise if the variables involved in the definition of eligibility for the trial are missing. We highlight patterns of bias that might arise when estimating the causal effect of a point exposure when restricting the target trial to individuals with complete eligibility data. Simulations consider realistic scenarios where the variables affecting eligibility modify the causal effect of the exposure and are missing at random or missing not at random. We discuss means to address these patterns of bias, namely: 1) controlling for the collider bias induced by the missing data on eligibility, and 2) imputing the missing values of the eligibility variables prior to selection into the target trial. Results are compared with the results when TTE is performed ignoring the impact of missing eligibility. A study of palivizumab, a monoclonal antibody recommended for the prevention of respiratory hospital admissions due to respiratory syncytial virus in high-risk infants, is used for illustration.

Initially submitted August 27, 2021; accepted for publication November 9, 2022. Target trial emulation (TTE) applies the principles of randomized controlled trials to the causal analysis of observational data sets. One challenge that is rarely considered in TTE is the sources of bias that may arise if the variables involved in the definition of eligibility for the trial are missing. We highlight patterns of bias that might arise when estimating the causal effect of a point exposure when restricting the target trial to individuals with complete eligibility data. Simulations consider realistic scenarios where the variables affecting eligibility modify the causal effect of the exposure and are missing at random or missing not at random. We discuss means to address these patterns of bias, namely: 1) controlling for the collider bias induced by the missing data on eligibility, and 2) imputing the missing values of the eligibility variables prior to selection into the target trial. Results are compared with the results when TTE is performed ignoring the impact of missing eligibility. A study of palivizumab, a monoclonal antibody recommended for the prevention of respiratory hospital admissions due to respiratory syncytial virus in high-risk infants, is used for illustration. average causal effect; eligibility; missing data; multiple imputation; target trial emulation Abbreviations: ACE, average causal effect; CI, confidence interval; HTI, Hospital Treatment Insights database; MAR, missing at random; MCAR, missing completely at random; MI, multiple imputation; MNAR, missing not at random; RCT, randomized controlled trial; RSV, respiratory syncytial virus; TT, target trial.
Randomized controlled trials (RCTs) are commonly used for estimating causal effects of point interventions. However, in many epidemiologic settings, an RCT may be infeasible or ethically nonviable. Hence, observational data are also used to compare effectiveness, with various strategies adopted to address the lack of randomization and indication bias, for example, by controlling for measured confounders. Analysis of observational data suffers from various additional sources of bias, such as selection bias, indication bias, and immortal time bias (1).
Target trial emulation aims to avoid some of these biases by adopting the design principles of RCTs. Individuals in an observational database, such as administrative health records, are selected according to a set of eligibility criteria that mirror those that would be used in an RCT (2). However, data on variables that determine eligibility are often incomplete, and as such not all participants of the target trial (TT) are identifiable from the observational database. It is typically advised to consider a different target trial with more complete eligibility criteria (1) or to exclude or censor individuals with missing data (3,4). Missing data is often a source of bias when those excluded are systematically different from the observed (i.e., if data are missing at random (MAR) or missing not at random (MNAR)) (5,6). Although identified as a potential limitation, there is little work investigating the extent to which missing eligible data can impact the analysis of a target trial.
One solution is to impute missing eligibility prior to selection into a target trial. However, we could find only one precedent of imputation of eligibility criteria prior to the creation of a target trial in (7). More generally, multiple imputation (MI) of exclusion criteria in observational studies has been considered in a recent work (8) for validating errorprone confounders, but it remains an infrequently studied Figure 1. Directed acyclic graph of the assumed relationships between exposure, outcome, confounders, and data missingness indicator. A and Y are the exposure and outcome respectively. L 1 are confounders of the association between A and Y, with E the variable that determined eligibility for the target trial. L 2 and L 3 are drivers of missing data in E.
topic. We intend to bring attention to work of this kind to the context to target trial emulation.
In this paper we investigate biases in the average causal effect (ACE) of a point exposure, in a target trial with missing eligibility data. Our simulations consider realistic scenarios where the eligibility variables modify the true causal effect. We consider two strategies of analysis: 1) conditioning on variables that drive missingness eligibility, and 2) recovering the missing eligibility data via MI. A study of palivizumab, a monoclonal antibody for prevention of symptoms of severe respiratory syncytial virus (RSV) infection in high-risk infants, based on administrative hospital and pharmacy dispensing data is used to illustrate these alternative approaches.

Setup
Consider the setting with a binary treatment A, end of study outcome Y, and confounding variables L 1 and E, where the latter determines eligibility. Suppose E has informative missingness, with R E being an indicator of completeness (1 = complete, 0 = missing). Missingness in E may be MAR, driven by variables that are not necessarily confounders, which we denote L 2 and L 3 , or MNAR, if also driven by E itself (9) (Figure 1). This is a typical setting, whereby L 2 and L 3 are separate causes of respectively A and Y.
We emulate a TT where eligibility is defined by E being greater than or equal to some value e. In practice E may represent a set of variables, which determine an eligibility indicator variable I E . The mechanism for inclusion is shown in Figure 2, represented by the box (indicating conditioning) surrounding I E = 1. It also shows the selection mechanism induced by restricting the TT to individuals with complete E, indicated by the box around R E = 1. We distinguish between: • The source population, from which the TTs are derived • The full eligibility TT (TT true ), containing all those who are eligible (I E = 1) • The complete eligibility TT (TT obs ), containing those who are complete and eligible, (I E = 1 and R E = 1) Our target estimand is the ACE of A on Y in TT true , defined as, where E(Y(a)) is the average value of Y, if the exposure A were set to take the value of a, for a = 0,1 in the whole population. In reality, TT true is not known, and thus ACE I E =1 is approximated by the equivalent estimand from TT obs , The ACE of a point exposure can be identified by invoking assumptions of no interference, counterfactual consistency, and conditional exchangeability (i.e., no unmeasured confounding) (2).
selection bias when conducting an analysis on TT obs if, for any reason, the causal effect of A on Y is different in the missing eligible, compared with the complete eligible. By controlling for L 1 , L 2 , and E we identify the causal effect, To find ACE I E =1,R E =1 we marginalize (average) ACE I E =1,R E =1 (L 1 ,L 2 ,E) over the distribution of L 1 , L 2 and E in TT obs . If the effect of A on Y is modified by these confounders, then the value of ACE I E =1,R E =1 depends on the distribution of that confounder in TT obs .
Hence, since we cannot recover the distribution of the confounders in TT true , ACE I E =1,R E =1 , obtained from TT obs , is a biased approximation of ACE I E =1 . In other words, the distribution of the confounders in TT obs , does not match that in TT true .
Suppose E was a score capturing standards of hospital care. We might expect a treatment A to be more effective on the outcome at higher standards of care. Now if hospitals of a low standard are more likely to have a missing score, then we would overrepresent eligible hospitals of higher standards in TT obs , and lead to a biased ACE.
Dealing with this bias requires recreating the joint distribution of exposure, outcome, and confounders of TT true , for example, using multiple imputation (12)(13)(14).
This bias has been discussed in the wider setting of "datafusion" of multiple data sources (15), with identification of targeted causal effects involving knowledge of the distribution of the confounders in the "fused" population, as we discuss above. This type of bias has been referred to as an issue of "transportability" (15) or external validity to a different population (16). Our setting is created by missing information that precludes the identification of the target population. This could be viewed as an issue of internal validity of TT obs itself, or of its external validity to TT true . The issue also has an impact on the generalizability of results to other populations.

Strategies
We indicate possible strategies to address the above biases in the estimation of ACE I E =1 . Strategy 1: ignoring missing eligibility. In the setting of Figure 2, we fit an outcome regression model for Y on A, controlling for L 1 and E in the model, and then estimate ACE I E =1,R E =1 by marginalizing over their distribution in TT obs , as described in Aolin et al. (17).
Strategy 2: dealing with collider bias. With this approach we fit an outcome regression model for Y on A, controlling for L 1 , E, and either L 2 or L 3 in order to block the path opened by conditioning on R E , and then estimate ACE I E =1,R E =1 as in strategy 1.
If the estimand of interest is ACE I E =1,R E =1 , then this strategy is sufficient to remove bias induced by missing eligibility data.

Strategy 3: dealing with collider and selection bias.
We specify an imputation model to predict the missing eligibility data in the source population. We impute E in multiple copies of the source population and, from each, construct an imputed copy of TT true using imputed eligibility criteria. We then control for L 1 and E as in strategy 1 to estimate ACE I E =1 in each of the copies, which are pooled using Rubin's Rules (9). Implementation. The imputation step is as follows: 1. Specify an imputation model for the missing mechanism of E.
To capture any suspected treatment effect heterogeneity, imputations are carried out separately for each value of A. Note that this technique requires A be fully observed (18). Giganti and Shepherd (8) highlight that excluding data relevant to inclusion in a study after MI leads to biased estimates of Rubin's pooled estimate of the variance because of incongeniality between the imputation and outcome model. We hence consider confidence intervals using a percentilebased bootstrap. Combining bootstrap and imputations. We combine bootstrapping with MI using the "Boot-MI" methodology (19). This consists of the following steps: 1. Obtain b bootstrap samples of the source population. 2. Apply steps 1-5 of the MI procedure above for each of the b data sets, and obtain b estimates of ACE

A percentile-based bootstrapped confidence interval (CI)
is then derived as the α × 100 th and (1 − α) × 100 th percentiles of the ordered bootstrapped estimates.
We used single imputation (m = 1), which has been shown to have good statistical properties (20), and reduce computational burden (19,20), nested within b = 1,000 bootstraps, which is at or above the typically recommended number (21). Sensitivity analyses. Imputation models for E that allow for different mean values depending on A could be used, Target Trial Emulation and Imputation 603 for a = 0,1.We use fully conditional specification (or multiple imputation by chained equations) using the "mice" package in R (R Foundation for Statistical Computing, Vienna, Austria) (13,22) to impute the data. The parameters δ a are MNAR sensitivity parameters. If MNAR is suspected, setting δ a = 0 shifts the imputed values of E (separately for each a) by an amount that accounts for the effect of E on its own missingness (23,24). In practice, sensible ranges for δ a are chosen, with the data imputed over these ranges.

SIMULATIONS
We investigate strategies 1-3 by simulating data according to the structure of Figure 2. Specifically: • L 1 , L 2 , and L 3 are independent N(0, 1).
• E is a normal variable dependent on L 1 and L 2 : • Eligibility is defined as Hence, around 50% of the population is eligible. • The missing mechanism of E is expressed as a linear function of L 2 , L 3 , and E: • The exposure A is a binary variable, and generated in terms of the log-odds of exposure, expressed as a linear function of L 1 L 3 , and E: • Around 54% of individuals in the source population are exposed. • The outcome Y is a normal variable that depends on exposure A, eligibility E, their interaction, and also on L 1 and L 2 , with L 2 exercising a stronger impact than L 1 : The source population is of size n = 1, 000. We investigated strategies 1-3 at different values of μ, α, and γ, the parameters affecting R E . Specifically, μ drives the percentage of missing completely at random (MCAR) missingness. α drives the strength of the MAR assumption, and the spurious association between L 2 and L 3 , and γ drives the strength of the MNAR mechanism, with positive values leading to a higher probability of larger values of E being observed. The parameter μ was set at 0 and 1.5, leading to severe (50%) and moderate (18%) MCAR missingness. α and γ were set to range from 0 (no association) up to ±0.4. For each combination we carried out l = 1, 000 simulations for each of these scenarios using b = 1, 000 bootstraps, reporting for each the average bias in the estimation ACE I E =1,R E =1 − ACE I E =1 , its Monte Carlo error (MCE), root mean squared error (RMSE), and 95% coverage (25). Table 1 describes the characteristics of a set of single large simulations of TT obs for different values of α, γ, and μ. We set n = 1, 000, 000 to minimize random variation. The 3 missingness scenarios are MCAR (α = γ = 0), MAR (α = 0 and γ = 0), and MNAR (α = 0 and γ = 0). The scenario when E is not missing (TT true ) is included for comparison.

Observed and true target trial comparisons
When the mechanism is MCAR, the means and correlations of relevant variables are not affected. When the mechanism is MAR, they depart from those found in TT true : When α > 0, individuals in TT obs have larger mean values for E, L 2 , and L 3 than in TT true . This is because α leads to individuals with larger values for L 2 and L 3 being more likely to be observed, shifting upward their distributions, and by extension, the distribution of E. When α is negative, the opposite is true. These biases are more noticeable at μ = 0 due to the greater proportion of missing individuals.
Under MNAR, setting γ > 0 makes higher values of E more likely to be observed in TT obs , with the opposite occurring when γ < 0, leading to shifts in the distributions for E, L 2 , and L 3 similar to what occurs with α.
The combined impact of α and γ varies. When both are of the same sign, their impacts compound and strengthen the corresponding shifts in distribution. When they are of opposite sign, their impacts partially offset one another.
The shifts in distribution for L 1 are complicated, shifted downward when α > 0 but shifted upward when γ > 0. This is due to a complicated relationship between the spurious negative L 1 − L 2 association (caused by conditioning on I E ), driving a downward shift in L 1 with higher values of L 2 , and the positive L 1 − E association, driving an upward shift with higher values of E.

Strategies
For strategies 1 and 2, bias in estimation of ACE increased with higher values of α and γ, and was worse when μ = 0 (Tables 2 and 3). This is due to having to average over the distribution of the confounders to estimate ACE I E =1 . The size and direction of this bias is nearly identical to the shift in the distribution of E observed in Table 1. This is because effect modification by E has effect size equal to 1.
The impact of collider bias induced by α is negligible, as shown by the small differences in bias for strategies 1 and 2. The root mean squared error is smaller for strategy 2 but has more undercoverage, possibly because it involves averaging L 2 , which also has a shifted distribution. Table 1 implies that had L 2 been the effect modifier rather than E, strategies 1 and 2 would have shown more bias under the MAR assumption. This is investigated in Web Appendix 1 and Web  Selection bias appears to increase under the following conditions: • Larger numbers of missing eligible individuals • Larger values of α and γ, the drivers of missingness • A stronger effect modification of the causal effect of A on Y by E (or any variables related to E) With fewer eligible participants lost to missingness, there is less missing data to drive a differentiation in the distributions of E in TT obs and TT true , which is why bias decreased when μ was larger, and the number of missing eligible participants decreased. None of these features are likely to be known in advance.
When E was MNAR, imputation was carried out with the correct values of the sensitivity parameters δ 0 , δ 1 . This was to demonstrate that, all other biases (including a misspecified imputation model) accounted for, strategy 3 can eliminate the biases described above in Sources of Bias when E is MNAR. This is unlikely to be possible in reality; hence, in Web Appendix 2 we repeat specific MNAR simulations of Table 3 assuming a MAR imputation model (δ 0 , δ 1 ) = (0, 0), which shows notable bias. This highlights that, in Target Trial Emulation and Imputation 605 Abbreviations: MAR, missing at random; MCAR, missing completely at random; MCE, Monte Carlo error; MNAR, missing not at random; RMSE, root mean square error; TT obs , target trial emulated from observed data; TT true , the full eligibility target trial, containing all those who are eligible. a Average size of TT obs for the 7 settings of α and γ are n = 410, 387, 415, 442, 430, 374, and 332 respectively. Average size of TT true is 500. Note that the average causal effect ACE I E =1 was calculated from a single simulation with n = 1, 000, 000 and was estimated at 2.386. practice, MNAR imputation is an exploratory technique, and careful considerations must be made to choose informative values of ranges for δ 0 and δ 1 to investigate (23,24). A realistic application of strategy 3 is shown in the case study.
In summary, strategy 3 is necessary in the case that missing data are noticeably MAR or MNAR. If that is not the case, a user may prefer the simpler strategies 1 and 2. Strategy 2 is the most precise, if this is preferred by the user, but one must account for the possibility of undercoverage if a CI is sought.

CASE STUDY: EFFECT OF PALIVIZUMAB ON INFANT HOSPITAL ADMISSION
RSV is a major cause of acute lower respiratory tract infection in infants, with RSV bronchiolitis responsible for 40,000 hospital admissions annually in England (26).
Palivizumab is licensed for passive immunization to prevent RSV in premature infants with congenital heart disease or chronic lung disease. Due to its high cost, palivizumab is typically recommended to more select groups of high-risk infants than those in clinical trials, with limited data on real-world effectiveness (27). Hence, analysis by a selective emulated trial is of interest.
An observational cohort of infants potentially eligible for palivizumab treatment in England has been developed (27), using the Hospital Treatment Insights database (HTI), which links pharmacy dispensing records from 43 acute hospitals in England, and hospital records from Hospital Episode Statistics (HES). This cohort details infants born between January 1, 2010, and December 31, 2016, with follow-up data on palivizumab prescriptions and hospital admission up to their first year of life. HTI is maintained by IQVIA (https://www.iqvia.com/). This cohort identifies a source population of 8,294 highrisk infants, defined as having congenital heart disease or chronic lung disease, under care of an HTI-reporting hospital, alive at the start of their first RSV season (October 1 to March 31), with a full linked hospital admission history. This is shown in the cohort flow charts of Figures 3 and 4.
Infants in the source population were considered eligible for the TT if they had a diagnosis of congenital heart disease or chronic lung disease and met additional eligibility criteria based on gestational age and chronological age at start of RSV season and, specifically, those who met 1a or 2a criteria for recommendation of treatment by palivizumab in Chapter 27a in the Green Book (28) (Web Table 3). Gestational age, however, is missing for 2,814 (34%) infants in the source population. As a result, the eligibility of many children cannot be identified.

Target trial emulation
The emulated target trial protocol is detailed in Web Appendix 3 and Web Table 2. We define TT obs to include all eligible individuals with complete eligibility data on gestational age, birth weight, Index of Multiple Deprivation score, and ethnicity. This led to a trial of 1,560 infants. We also aimed to recover TT true by imputing missing gestational age in the high-risk cohort. This corresponds to using strategies 2 and 3, respectively.
We are interested in the effect of any palivizumab prescription on RSV-related hospital admission in infants during their first RSV season of life. A full course of treatment by palivizumab requires up to 5 monthly doses during RSV season. As we could not determine adherence to treatment from the HTI data, we define a simplified exposure as a binary indicator of having been prescribed at least 1 dose Note that the palivizumab prescriptions database contains a separate but overlapping population from those in the Hospital Treatment Insights database (HTI). Thus, this population is denoted by t until linked to individuals in the HTI population (n). CHD, congenital heart disease; CLD, chronic lung disease; RSV, respiratory syncytial virus; SCID, severe combined immunodeficiency.
of palivizumab in their first RSV season of life. Infants are identified in the first month of life for treatment, and it is typically administered in outpatient clinics, not when hospitalized for RSV. Our outcome is a binary indicator of having been hospitalized for an RSV-related condition during their first RSV season of life. Our target estimand is the ACE of palivizumab prescription on RSV-related hospital admission in TT true , expressed as the average difference in absolute risk of hospital admission (the intent to treat (ITT) effect).
To balance the confounders in the treated and untreated, we fitted a model for the propensity of receiving palivizumab, including gestational age, age at start of RSV season, Index of Multiple Deprivation quintiles, sex, ethnicity, year of birth, diagnosis of congenital heart disease or chronic lung disease (or both), and other comorbidities. The resultant propensity scores showed reasonable overlap in the treated and untreated (Web Figure 1). Mean differences between treated and untreated, adjusted for inverse probability of weighting by propensity score, were within 0.1, indicating good confounder balance.
We fitted 2 different outcome models, a logistic regression model of hospital admission against treatment with inverse probability weight of being treated (IPTW), corresponding to a marginal structural model (MSM) (29), and a second where we controlled for the propensity score and all con-  founders directly in the outcome model, similar to those in 2-stage g-estimation of structural nested mean models (SNMMs) (30). The ACE is calculated by estimating potential outcomes via the "data stacking" method of (17). Continuous gestational age is imputed in the treated and untreated arms separately (to account for any interaction between gestational age and palivizumab) using a MNAR imputation model that includes all the variables of the propensity score model, plus the outcome and birth weight. Birth weight is not included in the outcome model due to collinearity with gestational age. There are thus 2 sensitivity parameters: δ 1 for exposed and δ 0 for unexposed. We assert that infants with missing gestational age may have higher mortality, implying a shorter gestation (31). Hence we performed the analysis setting δ 1 and δ 0 to either 0 (MAR), or −4 (MNAR).
Based on recommendations in Tompsett et al. (23), rather than compare the ACE directly with δ 0 and δ 1 , which are difficult to interpret physically, we estimated from the imputed data the mean gestational age in treated and untreated infants to contrast against the results.
Missing birth weight, Index of Multiple Deprivation score, and ethnicity were imputed alongside gestational age using MICE. We report the results in Tables 4 and 5 below.

Results
Analysis of TT obs suggests that treatment by at least 1 dose of palivizumab has little effect on the risk of being hospitalized, indicated by an ACE of −0.003 using a propensity score-conditioned outcome model, and −0.01 under inverse probability weighting (a 0.3% or 1.0% lower risk of hospital admission). When imputing the TT under MAR we observed a 0.1% and 0.2% lower risk of hospital admission, respectively. Under MNAR there is a more noted effect of palivizumab, ranging from −1.0% to 1.3% using an outcome model controlled for the propensity score, and −3.1% to 2.3% using inverse probability weighting.
The imputation model implies a high number of missing eligible participants, with over 1,000 more individuals under the MAR imputed trial, and up to nearly 2,500 more under MNAR.
When δ 0 was set to −4, this led to a reduction in average gestational age in the untreated by 2.2 weeks. In this case there was stronger reduction in risk of hospital admission when treated. When δ 1 was set to −4, the average gestational age in the treated was reduced by 2.7 weeks and there was an increasing risk of hospital admission under treatment.
No estimate was found to be significant based on a 95% CI. Despite there being a clear change in the distribution Abbreviations: ACE, average causal effect; CI, confidence interval; N/A, not applicable; TT imp , target trial emulated from observed and imputed data; TT obs , target trial emulated from observed data.
a The ACE is expressed as a risk difference both in absolute value and in percentage risk difference The sensitivity parameters are listed in order (δ 0 , δ 1 ). With sensitivity parameter TT imp (0,0), the data are assumed missing at random. In all other cases for TT imp it is assumed missing not at random. This is not applicable for TT obs , which has complete data.
of gestational age under MNAR conditions, and a large number of missing eligible, there is only weak evidence of selection bias in this study. This implies that gestational age only weakly modifies the effect of palivizumab on hospital admission.
The implication is that receiving at least 1 dose of palivizumab appears to have little effect on hospital admission, and the results are robust to changes in the missing data assumption.

DISCUSSION
In this paper we bring to light notable sources of bias in target trial emulation, emanating from ignoring missing eligible data. We explored one means to analyze a TT combined with multiple imputation of eligibility criteria prior to selection. We demonstrated via simulation that an imputed TT can eliminate sources of selection and collider bias, improve the sample size of a TT, and allow users to investigate sensitivity to changes in the assumptions of the missing eligible data on effect size.
An imputed TT of the effect of receiving at least 1 dose of palivizumab on RSV-related hospital admission indicated that a significant number of infants with missing gestational age were eligible, although any selection bias in this case was small.
We identified characteristics of the data that determine the size of selection bias, namely the strength of the MAR or MNAR mechanism, the number of missing eligible individuals and the size of the effect modification. None of these Abbreviations: ACE, average causal effect of treatment; CI, confidence interval; N/A, not applicable; TT imp , target trial emulated from observed and imputed data; TT obs , target trial emulated from observed data. a The ACE is expressed as a risk difference both in absolute value and in percentage risk difference The sensitivity parameters are listed in order (δ 0 , δ 1 ). With sensitivity parameter TT imp (0,0), the data are assumed missing at random. In all other cases for TT imp it is assumed missing not at random. This is not applicable for TT obs , which has complete data. characteristics can be calculated from the source population but could be inferred using external linked data sets. This selection bias can occur if any variable related to eligibility is an effect modifier. We showed in Web Appendix 1 that when L 2 was the effect modifier, strong selection bias was identified when E was MAR.
A limitation of this method is the tendency of CIs to overcover. The Boot-MI method (19) is computationally intensive, and thus one should expect an analysis to take several hours even with cluster computing methods. Hence we constructed CIs using a percentile bootstrap with just single imputation. However, single imputation lends itself to overcoverage (19). In Web Appendix 2, we applied strategy 3 using MI with m = 5, which demonstrates improved coverage. One alternative would be to investigate the corrected Rubin's pooled variance of ACE I E =1 suggested in Giganti and Shepherd (8). However, obtaining accurate confidence interval estimates in this way for the ACE using MI requires complex methods (32)(33)(34).
Instead of MI, we could consider using inverse probability weighting to address the bias caused by missingness in E (35). We investigated this method in Web Appendix 2 and found that it did not correct the bias. Another possible alternative is to utilize the work in Bareinboim and Pearl (15), by inferring or presuming the distribution of the confounders in TT true and standardizing the conditional ACE estimated in TT obs , but it would be a considerable challenge.
It is also worth noting that using strategy 2, and targeting the causal effect in those with complete records, may be a pragmatic choice if the expected selection bias is limited and the source population is cumbersome.
Data on palivizumab prescriptions and adherence were limited, and this had an impact on the quality of conclusions that could be made. Clinical colleagues reassure us that children hospitalized with RSV would not be issued palivizumab, protecting from reverse causation. However, other issues, such as confounding by indication, cannot be discounted. Limitations of the diagnostic data also meant a slight inflation of our definition of the eligible population because some of the diagnoses may include less severe diseases than those listed in the Green Book (28).