Abstract

Aims

Several methods provide new insights into understanding clinical trial composite endpoints, using both conventional and novel methods. The TRILOGY ACS trial is used as a contemporary example to prospectively compare these methods side by side.

Methods and results

The traditional time-to-first-event, Andersen–Gill recurrent events method, win ratio, and a weighted composite endpoint (WCE) are compared using the randomized, active-control TRILOGY ACS trial. This trial had a neutral result and randomized 9326 patients managed without coronary revascularization within 10 days of their acute coronary syndrome to receive either prasugrel or clopidogrel and followed them for up to 30 months. The traditional composite, win ratio, and WCE demonstrated no significant survival advantage for prasugrel, whereas the Andersen–Gill method demonstrated a statistical advantage for prasugrel [hazard ratio (HR), 0.86 (95% CI, 0.72–0.97)]. The traditional composite used 73% of total patient events; 40% of these were derived from the death events. The win ratio used 66% of total events; deaths comprised 57% of these. Both Andersen–Gill and WCE methods used all events in all participants; however, with the Andersen–Gill method, death comprised 41% of the proportion of events, whereas with the WCE method, death comprised 64% of events.

Conclusion

This study addresses the relative efficiency of various methods for assessing clinical trial events comprising the composite endpoint. The methods accounting for all events, in particular those incorporating their clinical relevance, appear most advantageous, and may be useful in interpreting future trials. This clinical and statistical advantage is especially evident with long-term follow-up where multiple non-fatal events are more common.

Clinical Trial Registration

NCT00699998.

See page 340 for the editorial comment on this article (doi:10.1093/eurheartj/ehu311)

Introduction

Phase three randomized clinical trials (RCTs) are intended to provide reliable and objective estimates of treatment interventions. The costs and complexity of RCTs, including those done in the setting of acute coronary syndromes (ACS), have increased steeply over the past decade, thereby setting a higher standard for the creative development of novel therapeutic strategies. The ability to demonstrate additional gains in treatment efficacy has also been attenuated by major improvements in treatment with corresponding reductions in mortality and morbidity rates in ACS over the past two decades.

These developments have engendered the adoption of an increasingly diverse set of related clinical outcomes gathered together to form a single composite endpoint. The selection of components for the composite should be clinically relevant. The assumptions of the composite endpoint are that each component endpoint should be meaningful; each endpoint should be equivalent in severity and that the treatment effect should be similar for each component. However, these assumptions are seldom met in practice. The time to the first event within the components of that endpoint often serves as the defining unit of measurement in a given trial. A composite endpoint is typically used to construct a larger signal, thereby reducing overall sample size. In response to further calls to improve the value of the information from RCTs,1,2 statisticians and trialists have developed novel methodologies designed to better reflect the overall patient experience and afford more meaningful trial interpretation.

The Targeted Platelet Inhibition to Clarify the Optimal Strategy to Medically Manage Acute Coronary Syndromes (TRILOGY ACS) trial (NCT00699998) evaluated 9326 study participants with unstable angina/non—ST-segment elevation myocardial infarction (UA/NSTEMI) who were managed medically without revascularization to investigate the effects of the thienopyridine platelet inhibitors prasugrel vs. clopidogrel. In TRILOGY ACS, the primary analysis was based on a composite endpoint of cardiovascular death, myocardial infarction (MI), and stroke and a neutral result was found. In contrast with many prior trials of dual antiplatelet therapy, longer-term follow-up was employed; additionally, a concomitant exploration of a lower dose therapy (i.e. a 5 mg prasugrel maintenance dose in patients <60 kg and/or ≥75 years) was undertaken. Our intention herein was to use the neutral results of the TRILOGY trial as a template to explore various methodologies beyond the traditional time to event approach.

Given the complexity of composite endpoints and the simplicity of the traditional composite analysis, there is an opportunity to improve upon the use of trial outcomes information to more efficiently capture potential outcomes of interest. Hence the objectives of this study were to explore the attributes of different analytical approaches using the TRILOGY ACS trial as an example.3–5

Methods

Study design and participants

The design and primary results of the TRILOGY ACS trial have been described previously.5,6 In the present study, we utilize all-cause mortality as an endpoint rather than cardiovascular death in each of the analyses. Patients with UA/NSTEMI were eligible for enrolment if their treating physician decided on a final treatment strategy of medical management without revascularization within 10 days of their index event; patients had to have at least one of four risk criteria (age >60 years, diabetes, prior myocardial infarction, prior revascularization). Coronary angiography was not required for enrolment, but if such a procedure was planned, it had to be performed before randomization. Major exclusion criteria included a history of transient ischaemic attack or stroke, coronary revascularization within the previous 30 days, renal failure requiring dialysis, and concomitant treatment with an oral anticoagulant.

Patients were randomly assigned to receive either a loading dose of 30 mg of prasugrel (n = 4663) or 300 mg of clopidogrel (n = 4663), which was followed by daily blinded maintenance administration of a study drug: 10 mg of prasugrel (or 5 mg for those <60 kg of body weight or ≥75 years) or 75 mg of clopidogrel daily. Study treatment was administered for at least 6 months and up to 30 months. Concomitant low-dose aspirin treatment was strongly encouraged.

Statistical analysis

Participant characteristics for this trial were summarized according to randomly allocated treatment arm (i.e. intention-to-treat). The efficacy outcomes of interest in this study were time to all-cause death (death; used to align with the prior weighted composite method and minimize censored cases), MI, and stroke.4 Multiple imputation techniques were utilized for missing baseline covariates. All analyses were done using SAS (version 9.3) and R (version 2.16).

Models

In addition to a comparison of the primary analysis method using time-to-first-event (all cause death, MI, stroke), a secondary analysis describing repeated ischaemic events among all components of the primary endpoint for the overall period and using a time-dependent model with separate hazard ratios (HR) before and after 30 days, 6 months, and 12 months (Andersen–Gill model)7 was undertaken.6 We also provide the secondary analyses of the data using the win ratio3 and the WCE.4

Win ratio

The win ratio methodology3 provides a rank-based approach for assessing treatment superiority by first ranking and then pairing the patients between treatment groups according to risk score; in the current study, the Global Registry of Acute Cardiac Events (GRACE) Risk Score for 6-month mortality was applied to all patients and each group was ranked by the score independently,8 the patients were then paired by rank. Multiple imputation was used for patients missing data required to compute their GRACE Risk Score (n = 540, 6%) Each patient pair was then evaluated as to which (if either) member of the pair had a death event first. The remaining pairs (i.e. those with no death event) were then evaluated for the stroke events and then subsequently MI events. If the treatment arms were unbalanced in number, then the larger arm had members left unpaired. If one member of the pair was censored prior to the time of the event in the other member, the event was considered ‘unused’. With the intent to include these ‘unused’ events, new pairs were generated according to length of follow-up and then GRACE Risk Score.

Once the pairs were evaluated, the number of ‘wins’ (i.e. pairs in which the clopidogrel arm had the event first) were divided by the number of ‘losses’ (i.e. pairs in which the prasugrel arm had the event first) to provide the win ratio and corresponding 95% confidence intervals (CIs).

For pairs that had one member censored early or were unmatched, these subjects were then recomputed and paired by time groups (rather than GRACE score) to form a time stratified re-ranking which allows for more of the pairs to be used in the win-ratio calculation.

Weighted composite endpoints

The WCE methodology provides a generalization of the standard time to event methodology by determining a weight for each of the nonfatal events.4 In this analysis, each patient begins with a weight of 1.0, patients with non-fatal events were considered to have their contribution to the cohort size reduced in weight, such that the additional weight was lost for subsequent events and the full (or residual) weight was lost for a death event. For example, a patient with an MI on Day 2 and stroke on Day 3 would have a cumulative weighting of 0.67 = 1 − [(1 − 0.38) × (1 − 0.47)]. We can then create a modified life table with a weighted number of patients at risk. Two sets of weights were determined by the Delphi panel, which comprised 23 cardiologist/clinician-investigator members of the TRILOGY ACS steering committee6 during the planning stage of the trial. The first round of results was received from the larger panel using a worksheet (see Supplementary material online, Appendix). These results of the worksheet had median score for death of 10 (IQR 10–12), 4 (IQR 3–5) for MI, and 5 (IQR 4–6 for Stroke). In the subsequent review, the consensus values were 10.8 for death, 4.1 for MI, and 5.0 for Stroke. From this, a single weight was determined for each non-fatal outcome determined using a previously described process.4,9,10 In this survey, the clinician-investigators determined that the weights for MI should be 0.38 and 0.47 for stroke patients. We did not generate a separate weight for non-cardiovascular-related death and cardiovascular-related death but rather included death as a single endpoint with weight 1.0: hence we considered all-cause death in all methods rather than restrict our analysis to cardiovascular death.

With a view to extending the WCE methodology, the non-fatal endpoints of MI and stroke were further stratified into three severity categories. The same respondents were asked to rate three levels of each non-fatal outcome, relative to a safety endpoint (i.e. severe bleeding). From these ratings, we generated weights for mild, moderate, and severe MIs and strokes. In this scenario, the relative tradeoffs were then converted to weights for each type of non-fatal event. The moderate MIs and non-disabling strokes were anchored to the weight assigned by the Delphi panel for the non-severity weights. The mild and severe MIs and mild and severe (disabling) strokes were then evaluated based on the tradeoff weight. For example: if the moderate MI was assigned a weight of 0.38 and the willingness to trade bleeding events for a moderate MI was 1.6 times that of a severe MI, then the severe MI received a weight of 0.59. Table 1 provides the definition and the relative weighting for each non-fatal event.

Table 1

Classification of endpoint severity

Class Definition Rule Weight for WCE 
Myocardial infarction 
 Mild Small periprocedural Periprocedural (tick box on CRF) OR <5× ULN Troponin OR (if no troponin available) <2× ULN CK-MB OR (if no CK-MB available) <2× CK 0.17 
 Moderate Medium spontaneous 5–30x ULN troponin OR (if no troponin available) 2–10× ULN CK-MB OR (if no CK-MB available) 2–10× CK 0.38 
 Severe Large with major ST change and substantial myocardial necrosis with accompanying left-ventricular dysfunction >30× ULN troponin OR (if no troponin available) >10× ULN CK-MB OR (if no CK-MB available) >10× CK 0.59 
Stroke 
 Mild Transient ischaemic attack with brief quadratic visual field loss Derived from text comments 0.23 
 Moderate Stroke with significant speech and right arm weakness that entirely recovers in 3 months (non-disabling) Derived from text comments 0.47 
 Severe Severe right hemiplegia that permanently persists (disabling) Derived from text comments 0.82 
Class Definition Rule Weight for WCE 
Myocardial infarction 
 Mild Small periprocedural Periprocedural (tick box on CRF) OR <5× ULN Troponin OR (if no troponin available) <2× ULN CK-MB OR (if no CK-MB available) <2× CK 0.17 
 Moderate Medium spontaneous 5–30x ULN troponin OR (if no troponin available) 2–10× ULN CK-MB OR (if no CK-MB available) 2–10× CK 0.38 
 Severe Large with major ST change and substantial myocardial necrosis with accompanying left-ventricular dysfunction >30× ULN troponin OR (if no troponin available) >10× ULN CK-MB OR (if no CK-MB available) >10× CK 0.59 
Stroke 
 Mild Transient ischaemic attack with brief quadratic visual field loss Derived from text comments 0.23 
 Moderate Stroke with significant speech and right arm weakness that entirely recovers in 3 months (non-disabling) Derived from text comments 0.47 
 Severe Severe right hemiplegia that permanently persists (disabling) Derived from text comments 0.82 

CK, creatine kinase; CK-MB, creatine kinase MB; CRF, case report form; ULN, upper limit of normal.

The survey questions on weights of the composite components are provided as Supplementary material online, Appendix.

Results

There were a total of 1913 efficacy events (death, MI, stroke) and 210 safety events (moderate and major bleeds) among 9326 participants enrolled in TRILOGY ACS (Table 2). Using the traditional composite endpoint, there was no difference between prasugrel and clopidogrel at 30 months [HR, 0.96 (95% CI, 0.86–1.06)]. The traditional composite analysis incorporated 1389 of the overall total of 1913 events (i.e. 73%; first event per patient). The Andersen–Gill method, which counts multiple events per patient equally (n = 794), revealed a significant difference in favour of prasugrel [HR, 0.86 (95% CI, 0.72–0.97)] as in the primary analysis of the trial.

Table 2

Distribution and weighting of events by analytical method

Method Deathsa
 
Stroke
 
MI
 
Treatment arm Prasugrel (n = 4663) Clopidogrel (n = 4663) Prasugrel (n = 4663) Clopidogrel (n = 4663) Prasugrel (n = 4663) Clopidogrel (n = 4663) 
All events (max/pt) 385 (1) 409 (1) 62 (1) 74 (3) 474 (6) 509 (7) 
Primary analysis (% events used) (%) 272 (70.6) 276 (67.4) 55 (88.7) 63 (85.1) 354 (78.0) 369 (72.5) 
Andersen–Gill (%) 385 (100) 409 (100) 62 (100) 74 (100) 474 (100) 509 (100) 
Win ratio (%) 306 (79.5) 338 (82.6) 32 (51.6) 32 (43.2) 204 (43.0) 201 (39.5) 
Win ratio-time stratified  (%) 343 (89.1) 362 (88.5) 37 (59.7) 43 (58.1) 232(48.9) 244 (47.9) 
WCE BASE (%) 385 (100) 409 (100) 62 (100) 74 (100) 474 (100) 509 (100) 
WCE augmented sm/med/lg (%) 385 (100) 409 (100) 2/44/16 (100) 6/56/18 (100) 308/115/52 (100) 303/142/65 (100) 
Method Deathsa
 
Stroke
 
MI
 
Treatment arm Prasugrel (n = 4663) Clopidogrel (n = 4663) Prasugrel (n = 4663) Clopidogrel (n = 4663) Prasugrel (n = 4663) Clopidogrel (n = 4663) 
All events (max/pt) 385 (1) 409 (1) 62 (1) 74 (3) 474 (6) 509 (7) 
Primary analysis (% events used) (%) 272 (70.6) 276 (67.4) 55 (88.7) 63 (85.1) 354 (78.0) 369 (72.5) 
Andersen–Gill (%) 385 (100) 409 (100) 62 (100) 74 (100) 474 (100) 509 (100) 
Win ratio (%) 306 (79.5) 338 (82.6) 32 (51.6) 32 (43.2) 204 (43.0) 201 (39.5) 
Win ratio-time stratified  (%) 343 (89.1) 362 (88.5) 37 (59.7) 43 (58.1) 232(48.9) 244 (47.9) 
WCE BASE (%) 385 (100) 409 (100) 62 (100) 74 (100) 474 (100) 509 (100) 
WCE augmented sm/med/lg (%) 385 (100) 409 (100) 2/44/16 (100) 6/56/18 (100) 308/115/52 (100) 303/142/65 (100) 

CV, cardiovascular; max, maximum; MI, myocardial infarction; pt, patient; sm/med/lg, small/ medium/large.

aAll deaths rather than CV deaths were used.

Win ratio

Ninety-four percent of patients had complete information on GRACE Risk Score. Missing elements were imputed for the patients without data and their individual risk scores were then calculated. The number of patients was balanced between the two arms such that all 4663 patients in the prasugrel group were matched to the clopidogrel group based on their GRACE Risk Score rank. Following the initial ranking by GRACE Risk Score, there were 245 pairs with events (117 death events, 26 strokes, and 102 MIs) unused from the analysis due to the other member of the pair being censored prior to the event time of the first. After the time stratified re-ranking of the 245 missed pairs, 149 of these events were re-entered in the analysis. There were 331 pairs with a death in the prasugrel arm first compared with 348 in the clopidogrel arm first. The non-fatal outcomes were at the similar pattern between arms, with 37 vs. 43 for stroke and 225 vs. 234 for MI in the prasugrel and clopidogrel arms, respectively. In the win ratio, the interpretation of the treatment effect is the reverse of a conventional analysis; that is, conventionally we are concerned with the odds or hazard associated with an event. In the win ratio, a ‘win’ is determined by one member of the pair not having an event before the other. In our analysis, there were numerically more ‘wins’ for prasugrel, with a ratio of 1.05 [95% CI, 0.94–1.18], which did not meet statistical significance.

Weighted composite endpoint

The Delphi panel determined the weights of 0.379 for MI, 0.469 for stroke, and 1.0 for death. Unlike the win ratio which considers only the relative time to event, for the WCE, we consider the time to weighted event between ‘each’ event; the modified Kaplan–Meier curves are provided in Figure 1. Compared with the traditional time to first (equally weighted) event, we observed a change in the location of divergence of the curves (as noted by the vertical line) at Day 420 (Figure 1B) rather than at Day 470 (Figure 1A). This is an indication of the impact of weighting the events and incorporates the differential in both the timing and type of event between the two treatment arms. There was no significant difference between treatment groups.

Figure 1

(A) Survival curves for traditional composite. (B) Weighted composite endpoint modified-survival curves. (C) Weighted composite endpoint severity-modified severity curves. Pts, patients.

Figure 1

(A) Survival curves for traditional composite. (B) Weighted composite endpoint modified-survival curves. (C) Weighted composite endpoint severity-modified severity curves. Pts, patients.

Weighted composite with severity weights

In this analysis, the two curves also diverge on Day 420 (Figure 1C) indicating that many patients had early non-fatal events in both arms that were then succeeded by more severe events later in follow-up. In Figure 2, these differences are shown by treatment in modified Kaplan–Meier curves for the three scenarios (traditional time to event, WCE, and severity-weighted composite result). Note that when the severity of weights is incorporated, most of the non-fatal events are seen to be of the less severe variety, as indicated by the flatter slopes of the curves (Figure 2).

Figure 2

Modified Kaplan–Meier curves for traditional time-to-first-event, weighted composite endpoint, and severity-modified weighted composite endpoint by treatment arm. Pts, patients.

Figure 2

Modified Kaplan–Meier curves for traditional time-to-first-event, weighted composite endpoint, and severity-modified weighted composite endpoint by treatment arm. Pts, patients.

Efficiency of event use

The distribution and use of events according to the various methods is provided in Table 2. Of note, the Andersen–Gill and weighted composite approaches include all of the endpoints, whereas the traditional analysis and the win ratio used the fewest of all collected data on events.

Figure 3A depicts the percentage of total events used by each method and the relative importance of each event type within that total. For example, in the traditional composite method, we observed that ∼73% of the total events were used and that 40% of the relative importance was placed on death. In comparison, the Andersen–Gill method used 100% of events and shows that 42% of all events were deaths. Like the Andersen–Gill, the WCE used all events but instead death comprised 60% of the overall result. The win ratio used the smallest percentage (66%) of overall events and derived ∼50% of the relative information from the death events.

Figure 3

(A) Percentage use of events and types and relative importance of each event type by treatment. (B) Effective number and types of events used per 1000 patients enrolled by treatment expressed as percentage use. (C) Weighted non-fatal event type (%) by treatment arm for severity-modified weighted composite endpoint. AG, Andersen–Gill; MI, myocardial infarction; mod, moderate; WCE, weighted composite endpoint.

Figure 3

(A) Percentage use of events and types and relative importance of each event type by treatment. (B) Effective number and types of events used per 1000 patients enrolled by treatment expressed as percentage use. (C) Weighted non-fatal event type (%) by treatment arm for severity-modified weighted composite endpoint. AG, Andersen–Gill; MI, myocardial infarction; mod, moderate; WCE, weighted composite endpoint.

Figure 3B compares the event rate in absolute terms by depicting the effective number of events per 1000 patients as used in each analysis. Although both the Andersen–Gill and WCE methods utilized all events, the relative importance of the death events differed. In the Andersen–Gill analysis, the relative contribution of the component outcomes for MI, stroke, and death was 51, 8, and 41%, respectively. In contrast, in both the win ratio and WCE approaches, death was the most heavily weighted; i.e. the relative distributions for the win ratio were 37, 6, and 57%, respectively, and for the WCE, 30, 6, and 64%. When the severity of each non-fatal event is incorporated into this analysis according to treatment (Figure 3C), there were nominally fewer moderate and severe MI and stroke events in the prasugrel vs. the clopidogrel group.

Discussion

The current comparison of analytical approaches within the TRILOGY ACS trial provides insight into the relative efficiency and effectiveness of different methodological approaches to clinical outcome study data and potential implications regarding the interpretation of results. The strengths and weaknesses of each are summarized in Table 3. The various methodologies differ in their uses of the available information when the assumptions of a traditional composite outcome have been met. We consider these results in light of a previous observation from a simulation study4 which indicated that there was no additional loss in power resulting from the use of a weighting methodology.

Table 3

Comparison of analysis attributes

Treatment arm Traditional time to first event Andersen–Gill Win ratio WCE base WCE augmented 
Uses first event ✓ ✓  ✓ ✓ 
Uses all events  ✓  ✓ ✓ 
Death as most important   ✓ ✓ ✓ 
Differentiates within event types     ✓ 
Time to event used ✓ ✓  ✓ ✓ 
A priori attribution   Ranking outcome risk scoring Weighting Weighting and outcome definition 
Treatment arm Traditional time to first event Andersen–Gill Win ratio WCE base WCE augmented 
Uses first event ✓ ✓  ✓ ✓ 
Uses all events  ✓  ✓ ✓ 
Death as most important   ✓ ✓ ✓ 
Differentiates within event types     ✓ 
Time to event used ✓ ✓  ✓ ✓ 
A priori attribution   Ranking outcome risk scoring Weighting Weighting and outcome definition 

The traditional time-to-first-event analysis, Andersen–Gill, and WCE methods all identify patients having at least one event. In the application to the TRILOGY data, the Andersen–Gill analysis was the only one to demonstrate statistical significance. We contend, however, that the more important difference in these methods relates to the manner in which events are counted; i.e. for Andersen–Gill, weights are considered equal, whereas in the WCE they are differentiated. Considering subsequent events appropriately is important as their occurrence have clear implications for both the health care costs and the quality of life for the patients: this is especially true when long-term follow-up is planned such as was the case in TRILOGY ACS. The incorporation of additional information beyond simple event-free survival (as in the case of the traditional composite) does miss potentially important time dependent information related to subsequent events. Moreover, each of these methods remains limited by the duration of follow-up where events occurring after the time frame are censored. Prior studies comparing the performance of the Andersen–Gill to the traditional models in the presence of heterogeneity across individuals have shown that in cases where there is a restriction on the maximum number of events per subject and censoring at the end of the follow-up, the Anderson–Gill method may give a biased result. Ultimately, these are key considerations in the design and execution of a clinical trial.11 The use of repeated events in general is also an important consideration as the rates of individual events should be reported to provide an understanding of which events occurred and when (e.g. 10 individuals with one event are different than one individual with 10 events). An important consideration in the present study is that neither the traditional nor the Andersen–Gill model provide any differentiation on event type irrespective of potential bias or discrimination ability. Although there have been suggested modifications which allow the baseline hazard rate to vary with each event, these methods still count events similarly.12 For example, if we consider a patient with an MI and subsequent death, the Andersen–Gill counts these two events as identical in a single patient, whereas all of the other methods count only a single event; in this example, the traditional method would only count the MI, the win ratio would use only the death event, and the WCE would count the MI as a fraction and death as the remainder. Hence, in this scenario, the Andersen–Gill provides the only method which identifies a statistical difference but this comes at the expense of not discriminating between types of event. The Andersen–Gill analysis is intended to address repeated events of the same type and as such both events are considered equally important; this approach inherently adds additional weight to non-fatal events since patients are eligible for inclusion for multiples of non-fatal events. Further in the TRILOGY ACS trial, some patients in both treatment arms experienced six or more MIs. The results from this study are also similar to the primary TRILOGY ACS study which essentially utilized the same outcomes, except it assessed ‘cardiovascular’ death rather than all death. By utilizing all death as an outcome, we facilitated the analysis by reducing the censoring events.

An advantage of the design of the TRILOGY ACS trial was that there were no limitations on how many non-fatal events could be reported, and there was a relatively long (30-month) follow-up. Methods that do not account for the relative severity of non-fatal endpoints may overstate the difference between treatments.13 In the win ratio, if death occurs, only it would be counted. With the WCE method, multiple events are given importance based on their severity. When multiple MIs occur with continued survival, this scenario is considered an ‘improved state’ as compared with the death of a patient. The differing analyses explored in the current study demonstrate similar results between treatment groups, due in large part to the homogeneity in the magnitude and direction of the individual components. As observed in Figure 3C, late events in the clopidogrel arm appeared more severe than those in the prasugrel arm. Notwithstanding these findings, our major interest was to pursue the optimal use of available data rather than necessarily seeking a difference in the interpretation of the results. We have chosen to use weights derived by clinician-investigators, who are familiar with the impacts, severity, and implications to treatment of each one of these outcomes. Since these weights were utilized for comparative analysis in the current trial, we felt that, until there were well validated weights for the relative states, the outcome weights should be calculated based on their relative impact to the others.

By design, the win ratio approaches the data differently, and amounts to a ‘worst-event’ methodology that places increased relative importance on death events when compared with traditional time-to-first-event analysis. However, because it continues to weight the individual components such that if the worst event in the pairing was a death, then death is counted and other preceding non-fatal events experienced by a subject are excluded. Indeed, the largest percentage of events that went ‘unused’ by the differing methods we evaluated was observed in the win ratio setting. Further, if the worst event is an MI in a matched pair of patients, it is considered equivalent to death. An additional consideration when the win ratio is employed relates to the matching process itself. If not specified a priori, there is an opportunity to undermine the randomization process by changing the way the risks are matched. Further it relies on the capability of an existing risk score to quantify the risk of each of the outcomes accurately, and risks residual confounding through variables not included. Additionally, the proposed template aims to include the most patients using a rank-based matching procedure which optimizes sample size, rather than a risk-based match, which can easily shift the comparison between groups if there is even a small group of lower risk patient in one arm of the trial.

There have been recent developments suggesting the number needed to treat and number needed to harm framework1 deserve emphasis. A key issue arising from this approach is that the definition of ‘treat’ and ‘harm’ needs to be extended beyond the conventional dichotomous (yes/no) analysis in order to better understand its application and the incorporation of multiple events.

Because TRILOGY ACS was a randomized and blinded trial patients were evenly assigned to each arm the chance of differences in observed or unobserved covariates was minimized. Notwithstanding this process, systematic differences may exist and impose potential bias in all of the methods we evaluated including the win ratio where the model is selected post hoc. Another key methodological difference evident in the current report is how the results are portrayed. In both the traditional composite and WCE approaches, time to event information is exhibited both graphically through a modified Kaplan–Meier curve and a confidence region is provided, whereas the win ratio and Andersen–Gill methods present only a single value and confidence interval for the trial results. Although the single value is sufficient to determine a potential treatment difference, understanding the pathway whereby this is derived is challenging (e.g. did the new treatment prevent more early events?).

In the present analysis, we also extended our prior use of the WCE approach by considering the differential severity of the non-fatal endpoints and identified two important novel features. First, in the survey we noted a 3.5-fold difference in the weight assigned to the mild vs. severe types of both MI and stroke; secondly, we noted that most of the adjudicated non-fatal events were of the mild type, highlighting the importance of differentiating types of non-fatal events. There were numerically fewer severe events within the prasugrel vs. the clopidogrel group.

Of further interest in the primary analysis of TRILOGY ACS was the differentiation of the events in the under 75 years of age vs. 75 years of age and older (i.e. those who received the reduced dose of prasugrel). We examined the groups separately for all of the included analyses and found that the results were similar in magnitude and interpretation.

There are both strengths and limitations to the use of the WCE methodology. Clinically it is understood that there is a benefit to the meaningful incorporation of all events and event severity in the analysis. The inclusion of the non-fatal event severity was included after the trial protocol was finalized and as such our classification of event severity was done with the best available information but deserves further assessment. Similarly, future validation of the associated weights is an exercise that should be completed before employing the method demonstrated here as a primary analysis. We have demonstrated that the WCE method is fairly robust to sensitivity analyses around the weights used.9 The longer-term follow-up in the TRILOGY ACS trial also demonstrates the importance of including all events, given that many of the early events were of the least-severe non-fatal types. Should the WCE be incorporated into future trial design? It does require some additional effort: (i) it would require that weights be established a priori, ultimately a standardized library of weights should be created and (ii) additional understanding of how the treatment is expected to impact each candidate endpoint is required as opposed to a single global effect estimate. However, based on our previous work10 there is unlikely to be a penalty in the number of subjects required and there may even be an advantage. In the simulation analysis presented previously, we demonstrated an improved ability to identify a difference in treatment effects in cases where the assumption that all component endpoints do not change in the same direction or magnitude. Pre-specified weighting of the severity of differing non-fatal endpoints may also be worthy of prospective consideration by clinical events committees charged with adjudicating non-fatal outcomes in clinical trials.

Each of these methods studied herein offers a different analysis of typical clinical trial data. In order to maximize the efficiency of these analyses, we suggest that a WCE be considered since it not only incorporates all events but also addresses their clinical relevance. In this way, a more efficient and informed (i.e. appropriate priority given to more important components such as death) use of all events occurring in a clinical trial could help to generate a more optimal evidence-based approach to the interpretation of studies aimed at improving care.

Supplementary material

Supplementary material is available at European Heart Journal online.

Funding

The TRILOGY ACS study was funded by Eli Lilly and Daiichi Sankyo. S.G.G. is supported by the Heart & Stroke Foundation of Ontario Polo Chair, Department of Medicine at the University of Toronto.

Conflict of interest: E.M.O. reports grants from Eli Lilly and grants from Daiichi Sankyo, during the conduct of the study; grants from Datascope, grants from Gilead Sciences, personal fees from AstraZeneca, personal fees from Boehringer Ingelheim, personal fees from Bristol–Myers Squibb, personal fees from Gilead Sciences, personal fees from Janssen Pharmaceuticals, personal fees from Liposcience, personal fees from Merck, personal fees from Pozen, personal fees from Roche, personal fees from Sanofi Aventis, personal fees from The Medicines Company, personal fees from WebMD, outside the submitted work; M.T.R. reports grants and personal fees from Eli Lilly, personal fees from Daiichi Sankyo, during the conduct of the study; grants and personal fees from Bristol–Myers Squibb, grants from Roche, grants and personal fees from Kai Pharmaceuticals, grants from Novartis, personal fees from AstraZeneca, personal fees from GlaxoSmithKline, personal fees from Janssen Pharmaceuticals, personal fees from Kai Pharmaceuticals, personal fees from Merck, personal fees from Sanofi Aventis, personal fees from Helsinn Pharmaceuticals, personal fees from Regeneron, outside the submitted work; S.G.G. reports personal fees and other from Duke Clinical Research Institute, during the conduct of the study; grants and personal fees from AstraZeneca, grants and personal fees from Bristol–Myers Squibb, grants and personal fees from Eli Lilly, grants and personal fees from Sanofi Aventis, outside the submitted work; K.A.A.F. reports grants and personal fees from Eli Lilly, during the conduct of the study; grants and personal fees from Bayer, grants and personal fees from Johnson & Johnson, grants and personal fees from Janssen Pharmaceuticals, grants and personal fees from Sanofi Aventis, grants and personal fees from AstraZeneca, outside the submitted work; P.W.A reports grants and personal fees from Eli Lilly, during the conduct of the study; grants and personal fees from Regado Biosciences, grants and personal fees from Eli Lilly, personal fees from F. Hoffmann La Roche, personal fees from Axio Research/ Orexigen, grants from GlaxoSmithKline, grants and personal fees from Boehringer Ingelheim, grants from Sanofi Aventis, grants from Amylin Pharmaceuticals, grants from Merck, personal fees from Bayer, outside the submitted work; E.B.B. reports other from Eli Lilly (she was an employee) during the conduct of the study; J.S.H. reports personal fees from Eli Lilly, during the conduct of the study; personal fees from GlaxoSmithKline, outside the submitted work; J.A.B., Y.L., C.M.W., Y.Z. have nothing to disclose.

References

1
Claggett
B
Wei
LJ
Pfeffer
MA
Moving beyond our comfort zone
Eur Heart J
 , 
2013
, vol. 
34
 (pg. 
869
-
871
)
2
Subherwal
S
Anstrom
KJ
Jones
WS
Felker
MG
Misra
S
Conte
MS
Hiatt
WR
Patel
MR
Use of alternative methodologies for evaluation of composite end points in trials of therapies for critical limb ischemia
Am Heart J
 , 
2012
, vol. 
164
 (pg. 
277
-
284
)
3
Pocock
SJ
Ariti
CA
Collier
TJ
Wang
D
The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities
Eur Heart J
 , 
2012
, vol. 
33
 (pg. 
176
-
182
)
4
Bakal
JA
Westerhout
CM
Cantor
WJ
Fernández-Avilés
F
Welsh
RC
Fitchett
D
Goodman
SG
Armstrong
PW
Evaluation of early percutaneous coronary intervention vs. standard therapy after fibrinolysis for ST-segment elevation myocardial infarction: contribution of weighting the composite endpoint
Eur Heart J
 , 
2013
, vol. 
34
 (pg. 
903
-
908
)
5
Chin
CT
Roe
MT
Fox
KA
Prabhakaran
D
Marshall
DA
Petitjean
H
Lokhnygina
Y
Brown
E
Armstrong
PW
White
HD
Ohman
EM
on behalf of the TRILOGY ACS Steering Committee
Study design and rationale of a comparison of prasugrel and clopidogrel in medically managed patients with unstable angina/non-ST-segment elevation myocardial infarction: the TaRgeted platelet Inhibition to cLarify the Optimal strateGy to medicallY manage Acute Coronary Syndromes (TRILOGY ACS) trial
Am Heart J
 , 
2010
, vol. 
160
 (pg. 
16
-
22
)
6
Roe
MT
Armstrong
PW
Fox
KA
White
HD
Prabhakaran
D
Goodman
SG
Cornel
JH
Bhatt
DL
Clemmensen
P
Martinez
F
Ardissino
D
Nicolau
JC
Boden
WE
Gurbel
PA
Ruzyllo
W
Dalby
AJ
McGuire
DK
Leiva-Pons
JL
Parkhomenko
A
Gottlieb
S
Topacio
GO
Hamm
C
Pavlides
G
Goudev
AR
Oto
A
Tseng
CD
Merkely
B
Gasparovic
V
Corbalan
R
Cinteză
M
McLendon
RC
Winters
KJ
Brown
EB
Lokhnygina
Y
Aylward
PE
Huber
K
Hochman
JS
Ohman
EM
for the TRILOGY ACS Investigators
Prasugrel versus clopidogrel for acute coronary syndromes without revascularization
N Engl J Med
 , 
2012
, vol. 
367
 (pg. 
1297
-
1309
)
7
Andersen
PK
Gill
RD
Cox's regression model for counting processes: a large sample study
Ann Statistics
 , 
1982
, vol. 
10
 (pg. 
1100
-
1120
)
8
Eagle
KA
Lim
MJ
Dabbous
OH
Pieper
KS
Goldberg
RJ
Van de Werf
F
Goodman
SG
Granger
CB
Steg
G
Gore
JM
Budaj
A
Avezum
A
Flather
MD
Fox
KAA
for the GRACE Investigators. A validated prediction model for all forms of acute coronary syndrome: Estimating the Risk of 6-Month Postdischarge Death in an International Registry
JAMA
 , 
2004
, vol. 
291
 (pg. 
2727
-
2733
)
9
Armstrong
PW
Westerhout
CM
Van de Werf
F
Califf
RM
Welsh
RC
Wilcox
RG
Bakal
JA
Refining clinical trial composite outcomes: an application to the Assessment of the Safety and Efficacy of a New Thrombolytic-3 (ASSENT-3) trial
Am Heart J
 , 
2011
, vol. 
161
 (pg. 
848
-
854
)
10
Bakal
JA
Westerhout
CM
Armstrong
PW
Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials
Stat Methods Med Res
 , 
2012
 
.
11
Metcalfe
C
Thompson
SG
The importance of varying the event generation process in simulation studies of statistical methods for recurrent events
Statist Med
 , 
2006
, vol. 
25
 (pg. 
165
-
179
)
12
Wei
LJ
Lin
DY
Weissfeld
L
Regression-analysis of multivariate incomplete failure time data by modeling marginal distributions
J Am Statist Assoc
 , 
1989
, vol. 
84
 (pg. 
1065
-
1073
)
13
Borgia
F
Goodman
SG
Halvorsen
S
Cantor
WJ
Piscione
F
Le May
MR
Fernandez-Aviles
F
Sanchez
PL
Dimopoulos
K
Scheller
B
Armstrong
PW
Di Mario
C
Early routine percutaneous coronary intervention after fibrinolysis vs.: standard therapy in ST-segment elevation myocardial infarction: a meta-analysis
Eur Heart J
 , 
2010
, vol. 
31
 (pg. 
2156
-
2169
)

Comments

0 Comments