Benchmarking Observational Analyses Before Using Them to Address Questions Trials Do Not Answer: An Application to Coronary Thrombus Aspiration

Abstract To increase confidence in the use of observational analyses when addressing effectiveness questions beyond those addressed by randomized trials, one can first benchmark the observational analyses against existing trial results. We used Swedish registry data to emulate a target trial similar to the Thrombus Aspiration in ST-Elevation Myocardial Infarction in Scandinavia (TASTE) randomized trial, which found no difference in the risk of death or myocardial infarction by 1 year with or without thrombus aspiration among individuals with ST-elevation myocardial infarction. We benchmarked the emulation against the trial at 1 year and then extended the emulation’s follow-up to 3 years and estimated effects in subpopulations underrepresented in the trial. As in the TASTE trial, the observational analysis found no differences in risk of outcomes by 1 year between groups (risk difference = 0.7 (confidence interval, −0.7, 2.0) and −0.2 (confidence interval, −1.3, 1.0) for death and myocardial infarction, respectively), so benchmarking was considered successful. We additionally showed no difference in risk of death or myocardial infarction by 3 years, or within subpopulations by 1 year. Benchmarking against an index trial before using observational analyses to answer questions beyond those the trial could address allowed us to explore whether the observational data can be trusted to deliver valid estimates of treatment effects.


Initially submitted September 27, 2021; accepted for publication May 24, 2022.
To increase confidence in the use of observational analyses when addressing effectiveness questions beyond those addressed by randomized trials, one can first benchmark the observational analyses against existing trial results. We used Swedish registry data to emulate a target trial similar to the Thrombus Aspiration in ST-Elevation Myocardial Infarction in Scandinavia (TASTE) randomized trial, which found no difference in the risk of death or myocardial infarction by 1 year with or without thrombus aspiration among individuals with ST-elevation myocardial infarction. We benchmarked the emulation against the trial at 1 year and then extended the emulation's followup to 3 years and estimated effects in subpopulations underrepresented in the trial. As in the TASTE trial, the observational analysis found no differences in risk of outcomes by 1 year between groups (risk difference = 0.7 (confidence interval, −0.7, 2.0) and −0.2 (confidence interval, −1.3, 1.0) for death and myocardial infarction, respectively), so benchmarking was considered successful. We additionally showed no difference in risk of death or myocardial infarction by 3 years, or within subpopulations by 1 year. Benchmarking against an index trial before using observational analyses to answer questions beyond those the trial could address allowed us to explore whether the observational data can be trusted to deliver valid estimates of treatment effects. benchmarking; causal inference; observational analyses; randomized trial; target trial emulation Abbreviations: CI, confidence interval; PCI, percutaneous coronary intervention; STEMI, ST-elevation myocardial infarction; SWEDEHEART, Swedish Web-Based System for Enhancement and Development of Evidence-Based Care in Heart Disease Evaluated According to Recommended Therapies; TASTE, Thrombus Aspiration in ST-Elevation Myocardial Infarction in Scandinavia.
Randomized trials are the preferred approach to estimate causal effects of clinical interventions. Randomized trials, however, cannot answer all clinically relevant causal questions, including those about long-term treatment effects or effects in individuals who do not enroll in trials. Analyses of observational databases are often used to complement the estimates of randomized trials, but observational analysis estimates may be confounded because differences in risk between treatment groups may be explained by differences between the individuals in each group rather than by the effect of treatment (1,2). Therefore, causal analyses of observational data adjust for known and measured confounders, although there is no guarantee that such adjustment suffices to eliminate confounding bias (3).
One possible approach to increase confidence in observational effect estimates is benchmarking, that is, to demonstrate that the observational analysis is able to replicate an index trial's findings (e.g., effect on death by 1 year) before using the observational data to estimate what the index trial could not estimate (e.g., effect on death by 3 years if the index trial had a follow-up of 1 year, or the effect within subpopulations that were not well represented in the index trial) (4). As an example, consider the Thrombus Aspiration in ST-Elevation Myocardial Infarction in Scandinavia (TASTE) randomized trial as our index trial. TASTE found no difference in the risk of death or myocardial infarction by 30 days or 1 year when comparing percutaneous coronary intervention (PCI) with and without thrombus aspiration among individuals with ST-elevation myocardial infarction (STEMI) in the Nordic countries (5,6). TASTE was designed to study the effects of thrombus aspiration by 1 year after baseline; analyses of observational data may be able to complement these results and make further inferences beyond those made by the TASTE trial. Trust in such observational analyses designed to ask a similar question as TASTE would be increased if they agreed with the 1-year trial results.
Successful agreement, however, requires sufficient adjustment for confounding, and it is possible that the structure of confounding varies before and after the publication of the trial, especially if that trial contributed to changes in the reasons for receiving treatment. Before TASTE there was evidence that thrombus aspiration improved coronary artery flow after PCI, but it was unknown whether it improved clinical endpoints such as mortality (7)(8)(9). In Sweden, before TASTE, this uncertainty resulted in some centers implementing the routine use of thrombus aspiration, while others left it to the discretion of the operator. After TASTE found no beneficial effect of thrombus aspiration, in Sweden, routine thrombus aspiration was largely reserved for patients with large thrombi in a coronary artery (10).
Here, we used observational data from the national Swedish Web-Based System for Enhancement and Development of Evidence-Based Care in Heart Disease Evaluated According to Recommended Therapies (SWEDEHEART) Registry, which is the same registry in which TASTE was nested, to emulate a target trial similar to TASTE. By carrying out the observational analysis in the same registry as the trial, we ensure that the causal question was asked in the same health-care setting. After evaluating whether observational data before and after TASTE were comparable, we informally benchmarked the observational analysis results against the trial results at 1 year, then extended follow-up to 3 years and explored effects in subpopulations by 1 year.

Trial design and analysis
TASTE was a multicenter, prospective, randomized, controlled, open-label clinical trial carried out between June 2010 and March 2013 (5,6). In total, 31 PCI centers recruited participants: 29 in Sweden, 1 in Iceland, and 1 in Denmark. SWEDEHEART was used to collect information for Swedish participants. Individuals were eligible for TASTE if PCI was planned for the treatment of acute STEMI (see Table 1 for additional criteria). Individuals who accepted the invitation to participate were randomly assigned to receive PCI either with or without thrombus aspiration. The primary end point was death by any cause within 30 days of PCI, and additional analyses explored death by any cause, rehospitalization for myocardial infarction, and stent thrombosis with 1 year of PCI. Data on clinical end points were obtained from the Cause of Death and SWEDEHEART registries. The intention-to-treat analyses compared 1-year risk curves from Kaplan-Meier analyses and estimated the corresponding average hazard ratios from Cox proportional-hazards models.

Trial results
As reported in the original TASTE publication, during the enrollment period there were 11,956 individuals with STEMI, approximately 9,420 individuals potentially eligible for enrollment in Sweden and Iceland (eligible individuals unknown in Denmark), and 7,244 individuals randomized to a treatment arm in Sweden, Iceland, and Denmark; 3,621 were assigned to thrombus aspiration and 3,623 to no thrombus aspiration (6). Web Table 1 (available at https://doi. org/10.1093/aje/kwac098) shows the baseline characteristics, and Table 2 shows the 1-year risks and average hazard ratios. The risk of each individual outcome did not differ between the treatment groups. The 1-year risk of death was 5.3% in individuals in the thrombus aspiration group and 5.6% in the no thrombus aspiration group, with a hazard ratio of 0.94 (95% confidence interval (CI): 0.78, 1.15). The 1-year risk of myocardial infarction was 2.7% in both groups, with a hazard ratio of 0.97 (95% CI: 0.73, 1.28). Stent thrombosis was rare; the 1-year risk was 0.7% in the thrombus aspiration group and 0.9% in the no thrombus aspiration group, with a hazard ratio of 0.84 (95% CI: 0.50, 1.40).

THE OBSERVATIONAL ANALYSIS
Causal inference from observational data can be seen as an attempt to emulate a pragmatic randomized trial-the target trial-that would answer the question of interest. The approach for emulating a target trial has 2 steps: 1) specify the protocol of the target trial, and 2) emulate the target trial using the available observational data and appropriate methodology (11). To compare TASTE to an observational analysis that aims to ask the same questions, using data from the SWEDEHEART Registry, we first specified a protocol of a target trial similar to the protocol of TASTE, with deviations only when the observational data did not correspond to the information collected in the trial. We then emulated the target trial using the SWEDEHEART registry data. Table 1 summarizes all protocol elements from the target trial and its emulation, which we describe herein.

The target trial protocol
Eligibility criteria. The eligibility criteria of the target trial would be the same as TASTE with 5 exceptions. First, the enrollment period would be September 2007 to January 2016, excluding June 2010 to March 2013, which is the period of participation in TASTE. Second, only Swedish coronary intervention centers would be included (no data are available from the Icelandic and Danish centers in the observational data). Third, the possibility of performing thrombus aspiration would not be assessed. Fourth, correspondence between electrocardiogram findings and culprit artery pathoanatomy would not be assessed. Fifth, individuals who died on the day of PCI would be excluded and identification of outcomes would start from the day after PCI as it is not possible to distinguish whether outcome events other than death (i.e., myocardial infarction) occurred before 2) Thrombus aspiration followed by percutaneous coronary intervention: thrombus aspiration with an Export aspiration catheter (Medtronic Inc., Santa Rosa, California). Continuous manual suction is performed using a proximal-to-distal approach, which is defined as active aspiration during initial passage of the lesion. In lesions that cannot initially be passed with the thrombus aspiration catheter, it is permitted to dilate the lesion with an angioplasty balloon up to a maximal nominal diameter size of 2.0 mm and attempt to advance the thrombus aspiration catheter for a second time. After thrombus aspiration, percutaneous coronary intervention is done as described above.  or after PCI when the events occurred on the same day as the procedure.
Treatment strategies. The treatment strategies in the target trial would be the same as those in TASTE: PCI, either 1) with thrombus aspiration or 2) without thrombus aspiration.
Treatment assignment. The target trial would randomly assign eligible individuals to one of the treatment strategies, and the physicians would be aware of the strategy to which the patient had been assigned.
Outcomes. The outcomes in the target trial would be death from any cause, myocardial infarction, or stent thrombosis.
Follow-up. The target trial would follow each individual from the day after treatment assignment until the outcome of interest (separate analysis for each outcome), or either 1 year for benchmarking or 3 years for analyses with extended follow-up, whichever occurred first. It is not possible to identify migration date, so outcome data on those who migrated out of Sweden are unavailable. However, only about 0.5% of the Swedish population emigrates each year (12). We expect this proportion to be even lower among individuals eligible for our study, who recently had a myocardial infarction and are receiving regular health care.
Causal contrasts. The target trial would estimate the intention-to-treat effect, which is the effect of being assigned to thrombus aspiration or no thrombus aspiration, and the per-protocol effect, which is the effect of receiving the assigned thrombus aspiration or no thrombus aspiration.
Statistical analysis. For the intention-to-treat analysis, we estimated the survival curves in each group defined by assigned treatment strategy via a parametric pooled logistic model with an indicator for treatment group, a flexible time-varying intercept, and product terms between treatment group and time. We compared the estimated risks (1 − survival) via differences and ratios. To estimate the total effect on myocardial infarction, individuals who die are treated as not experiencing the outcome after death rather than as censored at death (13). For the per-protocol analysis, we used the same technique as above, except that the analysis was restricted to individuals who received their assigned treatment, baseline covariates are included in the outcome models, and the estimated probabilities are standardized to a Risk estimates from Kaplan-Meier analyses, and no confidence intervals were provided in the published trial results; risk ratios are hazard ratios from a Cox proportional hazards model with treatment as the only regressor.
b Adjusted at baseline for: age, sex, hospital, diabetes, body mass index, smoking, hyperlipidemia, hypertension, previous infarction, previous percutaneous coronary intervention, previous coronary artery bypass graft, stenosis class, proportion stenosis, angiography finding, heart rate, systolic blood pressure, diastolic blood pressure, thrombolysis, and use of warfarin, aspirin, clopidogrel, prasugrel, heparin, low molecular weight heparin, bivalirudin, or glycoprotein 2b/3a inhibitors. the distribution of the baseline covariates (14). Nonparametric bootstrapping with 200 samples was used to calculate 95% confidence intervals.

Emulating the target trial in the SWEDEHEART registry
Data sources. SWEDEHEART collects data from all patients hospitalized for acute coronary syndrome or undergoing coronary or valvular intervention for any indication in all relevant hospitals across Sweden (15). The registry was created by merging 4 existing cardiovascular health-care quality registries in 2009: the Register of Information and Knowledge About Swedish Heart Intensive Care Admissions (RIKSHIA), the Swedish Coronary Angiography and Angioplasty Registry (SCAAR), the Swedish Heart Surgery Registry and the National Registry of Secondary Prevention (SEPHIA), and the Swedish Heart Surgery Registry. SWEDEHEART was used to collect information for patients when they were randomized in the TASTE trial, hence the data collection process was broadly similar between the 2 studies. SWEDEHEART is also linked to the Swedish National Patient Register, which records all primary and secondary diagnoses and procedures from inpatient hospitalizations and outpatient specialist care visits across Sweden; the Swedish Cause of Death Register, which records all deaths and causes of death; and the Prescribed Drug Register, which collects information on all dispensed medications (16)(17)(18).
Eligibility criteria. We identified individuals in the SWEDE-HEART registry who met the eligibility criteria. As in all observational emulations, no informed consent was asked, and hence we could not exclude individuals who would not have been asked or who would have declined participation if asked. Treatment strategies and assignment. As treatment had already been given under routine clinical practice, we assigned eligible individuals in SWEDEHEART to the strategy their data were compatible with at baseline, and proceeded as if treatment was randomly assigned within levels of the following baseline covariates (full detail on covariates and their definitions in Web Table 2): age, sex, hospital, diabetes, body mass index, smoking, hyperlipidemia, hypertension, previous infarction, previous PCI, previous coronary artery bypass graft, stenosis class, proportion stenosis, angiography finding, heart rate, systolic blood pressure, diastolic blood pressure, thrombolysis, and use of warfarin, aspirin, clopidogrel, prasugrel, heparin, low molecular weight heparin, bivalirudin, and glycoprotein 2b3a inhibitors.
Outcomes. We did not use stent thrombosis as an outcome because few events had been reported. We identified deaths from the Cause of Death Register and myocardial infarctions from the SWEDEHEART Registry. See Web Table 3 for further details on outcomes and their definitions. Follow-up. Follow-up was the same as the target trial. Causal contrasts. It was possible to estimate only the observational analog of the per-protocol effect as SWEDE-HEART collects information on the treatment an individual actually received, not what they were assigned.
Statistical analysis. The per-protocol analysis was the same as described above. Details of our modeling approach are presented in Web Appendix 1. We additionally stratified data by time period (before the TASTE trial began enrollment and after the TASTE trial completed enrollment), repeated the analyses, and compared estimates using data from each period to assess whether results at 1 year of follow-up were comparable, regardless of changing reasons for receiving thrombus aspiration following publication of TASTE trial results.
We carried out 9 sensitivity analyses: 1) We did not apply the eligibility criterion of 50% minimum stenosis (there was a high degree of missingness for the proportionstenosis variable used to identify this eligibility in the period before TASTE); 2) to understand the impact of measured covariates on effect estimates, we conducted a separate analysis in which we adjusted for age and sex only, and we computed the difference between the fully adjusted risk difference and the age-and sex-adjusted risk difference; 3) we dropped all individuals with any missing data for baseline covariates (complete-case analysis); 4) we censored individuals at death in the myocardial infarction analysis; 5) we defined myocardial infarction using a 2-day gap between discharge following the initial period in hospital and the new myocardial infarction event to account for individuals that who transferred between different hospitals without a new event; 6) we additionally included a Killip class variable in the models when data were stratified into time after TASTE (Killip class was collected from June 2009, so there was a high degree of missing data before TASTE); 7) we additionally included an indicator for time period (before or after the TASTE trial) in the models; 8) we estimated the standardized risk of each outcome separately in each treatment arm to allow for all possible interactions between treatment and covariates; and 9) we adjusted for baseline covariates using inverse probability weighting.
We informally benchmarked 1-year results from the emulation against the results of the TASTE trial. If risk contrasts when comparing those with and without thrombus aspiration were similar to those from TASTE, and the same clinical decision would be made regardless of the study used to inform the decision, benchmarking was deemed successful; analyses were then replicated to estimate the 3-year risks, and data were stratified to estimate treatment effects by 1 year in subpopulations of individuals within strata of sex (female/male), age (<65 years/≥65 years), diabetes (no/yes), previous PCI (no/yes), and previous myocardial infarction (no/yes). Figure 1 shows a flowchart of selection for the target trial emulation. There were 18,222 eligible individuals, of whom 3,462 were given thrombus aspiration and 14,760 were not given thrombus aspiration. Table 3 shows the baseline characteristics of all eligible individuals. Before standardization, there were differences between groups for several variables including age, hospital, stenosis in culprit artery, and angiography finding (standardized mean differences in Web Table 4). Table 2 shows the estimated 1-year risks, risk differences, and risk ratios for death and myocardial infarction. The estimated risk of death was 8.0% (95% CI: 6.7, 9.3) in the thrombus aspiration group and 7.3% (95% CI: 6.8, 7.9) in the group without thrombus aspiration; which results in a risk difference of 0.7% (95% CI: −0.7, 2.0) and a risk ratio of 1.09 (95% CI: 0.96, 1.24). The estimated risk of myocardial infarction was 3.9% (95% CI: 2.9, 4.9) in the thrombus aspiration group and 4.1% (95% CI: 3.6, 4.5) in the group without thrombus aspiration; which results in a risk difference of −0.2% (95% CI: −1.3, 1.0) and a risk ratio of 0.96 (95% CI: 0.79, 1.17).

RESULTS
Web Table 5 shows the baseline characteristics stratified by period (before and after TASTE enrollment), and Web Table 6 and Web Figure 1 show results when using these stratified data for analysis. The 1-year risk of death and myocardial infarction did not differ between the treatment groups in both time periods, so use of data from both enrollment periods for benchmarking appears justified. Web Tables 7-16 show results from sensitivity analyses; all results were broadly similar those from the primary analyses. Treatment groups were generally balanced in terms of the observed covariates after inverse probability weighting (Web Table 4).

Benchmarking
Results of the target trial emulation at 1 year were informally benchmarked against results from the intentionto-treat analyses in TASTE (Table 2). Results appeared compatible within sampling variability: Both the estimates from TASTE and the emulated target trial were very compatible with a similar range of hazard and risk ratio values for death (TASTE 95% CI: 0.78, 1.15; emulated target trial 95% CI: 0.96, 1.24) or myocardial infarction (TASTE 95%: 0.73, 1.28; emulated target trial 95% CI: 0.79, 1.17) by 1 year in the groups with or without thrombus aspiration. Figure 2 shows the 3-year survival curves and Table 2 also shows the estimated 3-year risks, risk differences, and risk ratios. The estimated risk of death was 13.3% (95% CI: 11.8, 14.7) in the thrombus aspiration group and 12.4% (95% CI: 11.7, 13.1) in the group without thrombus aspiration, which results in a risk difference of 0.9% (95% CI: −0.7, 2.4) and a risk ratio of 1.07 (95% CI: 0.98, 1.17). The estimated risk of myocardial infarction was 6.7% (95% CI: 5.6, 7.9) in the thrombus aspiration group and 6.9% (95% CI: 6.4, 7.5) in the group without thrombus aspiration; which results in a risk difference of −0.2% (95% CI: −1.5, 1.1) and a risk ratio of 0.97 (95% CI: 0.85, 1.11).   Table 4 shows the 1-year risks, risk differences, and risk ratios stratified by age, sex, diabetes, previous PCI, and previous myocardial infarction. Results were generally consistent with those from our main analyses.

DISCUSSION
We used observational data from the SWEDEHEART registry to address questions beyond those the TASTE trial could answer. The process had 3 steps. First, we used the observational data to emulate a target trial similar to TASTE, which estimated the effect of thrombus aspiration on risk of death and myocardial infarction by 1 year. Second, we informally benchmarked the observational analysis against TASTE by concluding that the same clinical decision would be made using either study because both studies estimated no benefit of thrombus aspiration. Third, in the observational analysis we extended follow-up to 3 years to also estimate no benefit, and estimated no effects by 1 year in subpopulations defined by age, sex, diabetes, previous PCI, and previous myocardial infarction.
Unmeasured confounding is always a possibility in observational analyses. We were concerned that we could not adjust for thrombus burden, a predictor of myocardial infarction and death that affects the decision whether to administer thrombus aspiration, especially after TASTE when it was only used as a bail-out for those with high thrombus burden in Sweden (10,19). Lack of adjustment for this variable might explain the increased 1-year risk of death in the subpopulations of women and those without diabetes. However, a sensitivity analysis that additionally adjusted for Killip class, which is correlated with thrombus burden, did not considerably change the estimates (Web Table 13; analysis restricted to the time period after TASTE as there was a high degree of missingness of Killip class earlier) (20). This implies either that Killip class does not adequately capture thrombus burden or, more likely, that there is little residual confounding due to thrombus burden.
Even in the absence of unmeasured confounding, there may be differences between the randomized trial and the observational analysis with respect to: 1) study populations, 2) definition or measurement of interventions or outcomes, and 3) causal estimands. Because these differences may affect the estimates in different directions, it is logically possible that a partial canceling out of these impacts leads to an erroneous conclusion that benchmarking was successful. We now discuss each of these differences and consider their impact on our results.
Between-study differences in effect estimates will occur if the treatment effect varies across groups that are unequally represented in each study. Eligible individuals who would not agree to enroll in a randomized trial is one such group, because those who do not enroll in trials (about 39% of individuals with STEMI in TASTE) are generally sicker, with poorer prognosis. In TASTE, the 1-year risk of death was 5.3% in those who enrolled and were randomized to thrombus aspiration, and it was 16.4% in those who did not enroll and were given thrombus aspiration under routine practice. The inclusion of these individuals in the target trial emulation meant higher absolute risks than among those who enrolled in TASTE, but this did not seem to affect the risk ratio estimates because, among those not enrolled in TASTE, the risks were similar when comparing groups with and without thrombus aspiration (16.4% and 15.7% for death; 3.8% and 3.7% for myocardial infarction) (6).
Another reason that study populations may differ is that observational data may not be detailed enough to match the eligibility criteria of the index trial. In our application, fewer individuals were eligible for the observational analysis in the period before TASTE compared with after TASTE, possibly because data on the proportion of stenosis in the culprit artery were less complete in SWEDEHEART in the earlier period (meaning that fewer people could be evaluated for the minimum of 50% stenosis criterion). However, in a sensitivity analysis in which we did not use the minimum stenosis criterion to determine eligibility, effect estimates were broadly similar to the main results (Web Table 7).
Between-study differences in effect estimates will also occur if the measurement of interventions or outcomes varies between studies. However, this is unlikely to occur in our application because the randomized trial, TASTE, and the observational analysis were both embedded within the SWEDEHEART Registry, and the definition and measurement of the intervention, thrombus aspiration, and the outcomes, death and myocardial infarction, were captured using the same mechanism. Use of the SWEDEHEART Registry also means the health-care system was the same in both studies.
Differences in causal estimands may also lead to betweenstudy differences in effect estimates. In randomized trials, the main estimand is often the intention-to-treat effect (i.e., the effect of assignment to treatment). However, when using observational data, information may be available only on treatment an individual actually received, not what they were assigned or prescribed. Then, for point interventions like thrombus aspiration, the observational analysis can only estimate the per-protocol effect-that is, the effect of receiving treatment (21). Appropriate benchmarking then necessitates reanalyzing the randomized trial data to estimate the per-protocol effect, which requires adjustment for beforerandomization factors to account for confounding (22,23). In our application, it is unlikely that differences in estimands affected the comparability of the estimates because adherence to the assigned treatment was very high (94%) in TASTE, and in fact, an unadjusted comparison restricted to the adherers resulted in a hazard ratio (0.95) very similar to that of the intention-to-treat analysis (0.94) (6).
Informal benchmarking at 1 year increases confidence in the reliability of observational inferences at 3 years and within subpopulations. Because increasing followup increases the possibility of selection bias due to loss to follow-up, observational analyses generally require longitudinal data on joint predictors of loss to followup and the outcome interest. In our study, however, loss to follow-up is a minor concern because <0.5% of individuals emigrate each year (12). We also cannot think of baseline confounders that introduce bias only after 1 year. Additionally, our main analysis relies on the assumption that the measured covariates are approximately sufficient to adjust for confounding when estimating the effect in the entire study population. That is, we assume the magnitude of unmeasured confounding is, on average, small across all subgroups defined by the measured covariates. However, the magnitude of unmeasured confounding might be greater (or smaller) in certain subgroups and thus some subgroupspecific effect estimates may be more (or less) biased than the effect estimates in the entire study population.
In summary, we carried out an observational analysis that emulates a target trial, informally benchmarked its results with those from an index randomized trial, and used the observational analyses to draw causal inferences over a longer follow-up duration and within subpopulations. The agreement between estimates from TASTE and our emulated target trial suggests that the observational data can deliver approximately valid estimates of treatment effects. This example shows how high-quality observational data can complement results from randomized trials and provide additional evidence to support clinical decision making.