-
PDF
- Split View
-
Views
-
Cite
Cite
Tom Treasure, Johanna J M Takkenberg, Randomized trials and big data analysis: we need the best of both worlds, European Journal of Cardio-Thoracic Surgery, Volume 53, Issue 5, May 2018, Pages 910–914, https://doi.org/10.1093/ejcts/ezy056
- Share Icon Share
RANDOMIZED TRIALS OR SOPHISTICATED ANALYSES OF ‘BIG DATA’
Seventy years ago, establishing the worth of an operation was more straightforward. For example, there was little of any use to be done for structural heart disease. Cyanotic heart disease was particularly lethal—most of the ‘blue babies’ died. Some struggled through childhood, burdened by symptoms, only to die young [1]. This bleak outlook was transformed in the 1940s, first by an extracardiac systemic to pulmonary artery shunt devised by Alfred Blalock [2] and then by a direct intracardiac operation on right ventricular outflow tract devised by Russell Brock [3]. As their mechanistic effects were clearly seen and could be consistently achieved, and the clinical course of the patient was observed to be substantially improved, successive operations entered practice.
With the refinement of surgical techniques, as the old and the new operations were compared by a simple observational comparison of outcomes, it was still easy to see whether the new treatment offered better survival and/or relief of symptoms and a better quality of life. We say ‘easy to see’ with some reservation because it should not be overlooked that many ineffective treatments also became accepted and continued in practice for generations. It was only after thousands of years that bloodletting was abandoned in the treatment of fever and sepsis. It took 90 years for surgeons to turn their backs on radical mastectomy for breast cancer, in favour of less mutilating operations which were proved, in randomized trials, to be no less effective in controlling the primary cancer and to be greatly superior in terms of complications [4]. There have been and, continue to be, numerous reversals in clinical practice brought about by controlled trials [5].
With further progress, it has become increasingly difficult to discern ‘signal from noise’ [6]. In current practice, adjunctive systemic and interventional treatments are often applied synchronously or sequentially. Comorbidity in elderly patients fogs the issue further. To which, if any, of the several components of combined treatments should benefit be ascribed? There has been a tremendous diversification of both treatment options and patient populations, and more marginal differences are being tested, making a straightforward comparison of observational findings incapable of determining the better treatment. At the same time, this diversification hampers the generalizability of randomized controlled trials (RCTs). Outcomes are no longer black or white but a full spectrum of colours. Also, we have arrived, not before time, in an era of patient-centred care. Evidence-based, patient-specific and often value-sensitive decisions have to be made for a wide diversity of patients.
The RCT came to be regarded as the gold standard in finding evidence for a treatment in clinical practice (Fig. 1). The essential feature is that the treatment is randomly assigned, so that all known and unknown factors, which might influence the outcome of the treatment under test, are similarly present in both groups. Any difference in outcome can then be attributed to the relative effectiveness of the treatments themselves in achieving the prespecified desirable outcome. A downside of the RCT is that data are acquired specifically to answer one research question and as further questions arise, new data must be acquired by starting all over again.
Large databases and registers are now available. In theory, using sophisticated statistical analyses, the difference in outcome attributable to the treatment may be discernible by statistical adjustment for other factors influencing outcome or by matching patients to exclude effects other than those due to the treatment. The dataset continues to accrue patients and can be used repeatedly to answer further questions. As larger and better organized observational datasets are collected, and new meta-analytic techniques are developed, is the RCT’s place unassailable? If, as seems reasonable, RCTs and more complex analytical methods are to coexist, what are their relative merits? To explore the issues, we will use the example of bilateral internal mammary artery versus left-only internal mammary artery (BIMA versus LIMA) grafts in coronary artery surgery. This was the subject of a much lauded debate at European Association for Cardio-Thoracic Surgery (EACTS) 2017 [7]. We will then touch upon the choice of surgery versus ablative radiotherapy for lung cancer.
THE DOUBLE- OR SINGLE MAMMARY ARTERY DEBATE
The 31st Annual Meeting of EACTS in October 2017 hosted high-level discussions about the evidence that might guide practice. Prominently placed was a session on whether the goal of coronary artery bypass surgery should be BIMA grafts for all coronary operations or whether the standard of care should be an operation including a LIMA. Professor Nick Freemantle was quoted in EACTS Daily News applauding our association saying ‘It is to the credit of EACTS that they are having a debate on this topic at the meeting’. The superiority of a LIMA for the left anterior descending was established on the basis of the observation that a left internal mammary artery as a pedicled graft to the left anterior descending coronary artery (LAD) had better patency rates at 10 years than aortocoronary saphenous vein grafts and with commensurately better clinical outcomes [8]. The unresolved question remained: does the addition of a right internal mammary artery graft provide a further useful incremental gain in long-term clinical outcomes?
Two major high-quality studies provide us with material on which to make comparisons of the relative merits of the 2 methods of seeking evidence: an RCT versus a sophisticated analysis of observational data.
The Arterial Revascularization Trial (ART), an RCT in 3102 patients was published in 2016 in the New England Journal of Medicine [9].
A meta-analysis of 29 observational studies including 89 399 patients including 12 propensity-matched studies in 20 525 patients was published in 2017 in Heart [10].
THE ARTERIAL REVASCULARIZATION TRIAL
The means of assigning patients
When ART was mooted in about 2004, it was proposed that expertise-based randomization would be used. This method was being promulgated for use where 2 operations were to be compared, and the dyad of a surgeon and her preferred operation was not disrupted [11]. Although this was proposed as a means of helping surgeons to engage in RCTs, in fact, the principle was already widely applied and is inherent in all trials of surgeon versus cardiologist delivered therapies. It also applies in surgical resection versus radiotherapy or other ablative techniques for cancer. The practitioner and the technique are inextricably linked.
In ART, this would have meant that each patient was randomly assigned either to a surgeon who favours BIMA or to a surgeon who prefers 1 arterial graft—a LIMA to the LAD. This would make sense if one can assume that the surgeons are of comparable competence and when a device is under evaluation, such as a choice between heart valves. For a LIMA/BIMA comparison, the problem is obvious. Surgeons who prefer to use 2 mammary arteries may include the most deft and speedy operators, working with teams more practiced at mammary artery dissection, who perform the additional surgery more expeditiously. Any result from such a trial would have been confounded by differing expertise. Expertise-based randomization was, therefore, counselled against. Surgeons had to be competent to do either and then be willing to allow their preference to be overridden by randomization. Even if they had personal preferences, the existence of the 2 approaches indicated that ‘group equipoise’ existed. Once randomly assigned, the allocation to LIMA or BIMA must be adhered to; subsequent modification of the operative plan by the surgeon would undermine the trial design. Once the protocol is agreed, those carrying out the trial must put individualized judgement aside. This highlights a problem in encouraging surgeons to accept random allocation of their patients.
The outcomes in the Arterial Revascularization Trial
In the RCT at 5 years, there was no difference in the primary outcome of interest which is survival; 10-year survival rates are not yet available. There was no difference in hospital mortality, bleeding, myocardial infarction or stroke. There were more sternal wound complications with BIMA attributable to the added risk of bilateral interference with sternal blood supply [9].
THE META-ANALYSIS OF THE LEFT ONLY INTERNAL MAMMARY ARTERY VERSUS BILATERAL INTERNAL MAMMARY ARTERY
Acquiring the data
Systematic reviews are now greatly facilitated by electronic searching and retrieval of a large number of sources. In this case, 3678 articles were identified. Adhering to prespecified inclusion and exclusion criteria, these were narrowed down to 120 potentially relevant articles. Finally, 29 studies were pooled for analysis. The large majority (27/29) were retrospective observational studies, and in 12 studies, there was propensity matching [10].
The outcomes in the meta-analysis
Five-year survival was higher for BIMA than LIMA other than in a diabetic subset and was observed throughout the 25 years of follow-up in the pooled analysis with the difference widening at 10 and 20 years. The authors calculated an overall hazard ratio of 0.78, which translates to a pooled cumulative 5- and 10-year mortality of 7.7% and 17.9% for BIMA and 13% and 29.5% for LIMA, respectively. The need for subsequent revascularization after BIMA was half of that after LIMA. Stroke, sternal wound infection and revascularization were all significantly higher for LIMA than BIMA.
Why the difference in conclusions?
For early and late mortality and for important in-hospital events, there seemed to be a clear answer in favour of BIMA in the meta-analysis and differences not seen in the RCT. Because the RCT was based on random assignment, current received opinion (which we share) is that the RCT provides the more trustworthy answer, with the caveat that the conclusion may only be applicable under the circumstances of the study. As an exercise in weighing the comparative worth of an RCT versus a big data matching study, let us consider how the differences may have come about and the implications for selecting and interpreting the 2 contrasting research methods.
Randomized controlled trials struggle to accrue sufficient patients
The collected observational data provided a pool of patients 30 times larger than the RCT. Big data are very attractive and suggest more reliability and generalizability. The important point here, however, is that the RCT was big enough for us to be confident that we have not missed any possible important difference, but it does illustrate the attractiveness of accessing big datasets.
Sex, age and the diabetic incidence of included patients
Registries are ‘real-world’ populations, but RCTs are a selected sample so there are inherent limitations in the interpretation and application of evidence from RCTs. The inclusions and exclusions are there in the trial protocol to satisfy all the considerations of ethics and equipoise, but the resulting populations and the ways in which they are treated may have departed from the typical clinical scenario under evaluation. From Table 1, we can deduce that because of the constraints in selection and equipoise, women patients may have been under-represented in RCT when compared with observational ‘Big Data’.
Patient characteristics and outcomes for LIMA and BIMA in 2 studies: an RCT and a meta-analysis of observational studies, which reached fundamentally different conclusions
. | ART RCT [9] . | . | Meta-analysis [10] . | . | ||
---|---|---|---|---|---|---|
LIMA . | BIMA . | LIMA . | BIMA . | |||
Number | 1554 | 1548 | 66 958 | 19 644 | ||
Female gender (%) | 14 | 15 | 26 | 15 | ||
Age (years), mean ± SD | 64 ± 9 | 64 ± 8 | ||||
Diabetes (%) | 23 | 24 | 39 | 25 | ||
Hospital mortality (%) | 1.2 | 1.2 | 2.1 | 1.2 | P = 0.04 | |
Major bleeding (%) | 2.6 | 3.1 | HR 1.18; P = 0.44 | 3.2 | 2.9 | P = 0.51 |
Myocardial infarction (%) | 3.5 | 3.4 | HR 0.97; P = 0.86 | |||
Stroke (%) | 3.2 | 2.5 | HR 0.78; P = 0.24 | 2.9 | 1.3 | P = 0.0003 |
Sternal wound complication/infection (%) | 1.9 | 3.5 | HR 1.87; P = 0.005 | 1.4 | 1.8 | P = 0.0008 |
Revascularization (%) | 6.6 | 6.5 | HR 0.98; P = 0.91 | 10 | 4.8 | P = 0.005 |
5-year mortality (%) | 8.4 | 8.7 | HR 1.04; P = 0.77 | 13 | 7.7 | HRa 0.78; P < 0.00001 |
Composite of death, MI and stroke at 5 years (%) | 12.7 | 12.2 | HR 0.96; P = 0.69 |
. | ART RCT [9] . | . | Meta-analysis [10] . | . | ||
---|---|---|---|---|---|---|
LIMA . | BIMA . | LIMA . | BIMA . | |||
Number | 1554 | 1548 | 66 958 | 19 644 | ||
Female gender (%) | 14 | 15 | 26 | 15 | ||
Age (years), mean ± SD | 64 ± 9 | 64 ± 8 | ||||
Diabetes (%) | 23 | 24 | 39 | 25 | ||
Hospital mortality (%) | 1.2 | 1.2 | 2.1 | 1.2 | P = 0.04 | |
Major bleeding (%) | 2.6 | 3.1 | HR 1.18; P = 0.44 | 3.2 | 2.9 | P = 0.51 |
Myocardial infarction (%) | 3.5 | 3.4 | HR 0.97; P = 0.86 | |||
Stroke (%) | 3.2 | 2.5 | HR 0.78; P = 0.24 | 2.9 | 1.3 | P = 0.0003 |
Sternal wound complication/infection (%) | 1.9 | 3.5 | HR 1.87; P = 0.005 | 1.4 | 1.8 | P = 0.0008 |
Revascularization (%) | 6.6 | 6.5 | HR 0.98; P = 0.91 | 10 | 4.8 | P = 0.005 |
5-year mortality (%) | 8.4 | 8.7 | HR 1.04; P = 0.77 | 13 | 7.7 | HRa 0.78; P < 0.00001 |
Composite of death, MI and stroke at 5 years (%) | 12.7 | 12.2 | HR 0.96; P = 0.69 |
HR of 0.78 is not specifically for 5 years but an overall hazard for death throughout the study.
ART: arterial revascularization trial; BIMA: bilateral internal mammary artery; HR: hazard ratio; LIMA: left only internal mammary artery; RCT: randomized controlled trial; SD: standard deviation.
Patient characteristics and outcomes for LIMA and BIMA in 2 studies: an RCT and a meta-analysis of observational studies, which reached fundamentally different conclusions
. | ART RCT [9] . | . | Meta-analysis [10] . | . | ||
---|---|---|---|---|---|---|
LIMA . | BIMA . | LIMA . | BIMA . | |||
Number | 1554 | 1548 | 66 958 | 19 644 | ||
Female gender (%) | 14 | 15 | 26 | 15 | ||
Age (years), mean ± SD | 64 ± 9 | 64 ± 8 | ||||
Diabetes (%) | 23 | 24 | 39 | 25 | ||
Hospital mortality (%) | 1.2 | 1.2 | 2.1 | 1.2 | P = 0.04 | |
Major bleeding (%) | 2.6 | 3.1 | HR 1.18; P = 0.44 | 3.2 | 2.9 | P = 0.51 |
Myocardial infarction (%) | 3.5 | 3.4 | HR 0.97; P = 0.86 | |||
Stroke (%) | 3.2 | 2.5 | HR 0.78; P = 0.24 | 2.9 | 1.3 | P = 0.0003 |
Sternal wound complication/infection (%) | 1.9 | 3.5 | HR 1.87; P = 0.005 | 1.4 | 1.8 | P = 0.0008 |
Revascularization (%) | 6.6 | 6.5 | HR 0.98; P = 0.91 | 10 | 4.8 | P = 0.005 |
5-year mortality (%) | 8.4 | 8.7 | HR 1.04; P = 0.77 | 13 | 7.7 | HRa 0.78; P < 0.00001 |
Composite of death, MI and stroke at 5 years (%) | 12.7 | 12.2 | HR 0.96; P = 0.69 |
. | ART RCT [9] . | . | Meta-analysis [10] . | . | ||
---|---|---|---|---|---|---|
LIMA . | BIMA . | LIMA . | BIMA . | |||
Number | 1554 | 1548 | 66 958 | 19 644 | ||
Female gender (%) | 14 | 15 | 26 | 15 | ||
Age (years), mean ± SD | 64 ± 9 | 64 ± 8 | ||||
Diabetes (%) | 23 | 24 | 39 | 25 | ||
Hospital mortality (%) | 1.2 | 1.2 | 2.1 | 1.2 | P = 0.04 | |
Major bleeding (%) | 2.6 | 3.1 | HR 1.18; P = 0.44 | 3.2 | 2.9 | P = 0.51 |
Myocardial infarction (%) | 3.5 | 3.4 | HR 0.97; P = 0.86 | |||
Stroke (%) | 3.2 | 2.5 | HR 0.78; P = 0.24 | 2.9 | 1.3 | P = 0.0003 |
Sternal wound complication/infection (%) | 1.9 | 3.5 | HR 1.87; P = 0.005 | 1.4 | 1.8 | P = 0.0008 |
Revascularization (%) | 6.6 | 6.5 | HR 0.98; P = 0.91 | 10 | 4.8 | P = 0.005 |
5-year mortality (%) | 8.4 | 8.7 | HR 1.04; P = 0.77 | 13 | 7.7 | HRa 0.78; P < 0.00001 |
Composite of death, MI and stroke at 5 years (%) | 12.7 | 12.2 | HR 0.96; P = 0.69 |
HR of 0.78 is not specifically for 5 years but an overall hazard for death throughout the study.
ART: arterial revascularization trial; BIMA: bilateral internal mammary artery; HR: hazard ratio; LIMA: left only internal mammary artery; RCT: randomized controlled trial; SD: standard deviation.
In ART, 25% of patients were older than 70 years. In fact, the average ages were very similar (64 in ART and 63 in the meta-analysis), but the point still merits consideration. Older patients are more vulnerable to perioperative hazards such as stroke, infarction and death, whereas, as a group, the elderly may gain less benefit from a difference in graft patency beyond 10 years. Factors other than the 2nd internal mammary artery graft will exert more weight in determining survival so BIMA versus LIMA advantages, even if confirmed in an RCT, may matter less to ‘real-world’ patients.
Diabetic patients were less likely to be randomized in the trial and are more frequent among non-randomly assigned LIMA patients in ‘real-world’ practice.
Post-randomization differences in treatments
The intended purity of the comparison may be eroded by well-intended adjustments in treatments to redress the perceived imbalance in benefit between LIMA and BIMA. It has been pointed out that in approximately 22% of patients assigned to LIMA, the surgeon has used a radial artery graft to the right coronary artery. As explained at the EACTS by Mario Gaudino [12, 13], a radial artery graft has superior characteristics to a vein graft and may be as good as a right internal mammary artery, thus reducing any separation in survival attributable to the 2nd mammary artery graft. However, there was a comparable (20%) radial artery use in the BIMA group. Although this would not generally have been to the right coronary artery, it nevertheless weakens the cogency of the argument to undermine the reliability of the RCT.
Analysis of existing data can answer a question more quickly than a randomized controlled trial
RCTs take a very long time from conception to publication. It is more than 12 years since the ART protocol was agreed upon and things have changed meanwhile. This makes RCTs irksomely inflexible to the individual surgeon wanting to constantly exercise updated clinical judgement. It also means that the clinical research question may have moved on.
Complication rates
Recognized complications should be just as reliably recorded for trial and prospectively documented non-trial patients, but recording bias might be less in registry data where no particular hypothesis is under test. Significantly higher rates of sternal wound problems were seen in both studies with a comparable magnitude of difference, a finding that has face validity—that is, that it ‘makes sense’ to the clinically well informed. However, in the non-RCT data analyses, BIMA was associated with a significantly lower in-hospital mortality and stroke rate. These early differences are not likely to be attributable to the addition of a right internal mammary artery graft to the heart. Therefore (in our opinion), they lose face validity. It suggests to us that better risk ‘real-world’ patients are given elective BIMA operations, and perhaps marginally more skilful surgeons are operating on them. The reason for counselling against expertise-based randomization in ART became evident in the meta-analysis.
Mid- and long-term survival
The failure of BIMA to show the anticipated benefit in survival in ART may be explicable in a philosophical way. Both arms will also have had the opportunity of the best medical advice including optimizing their ‘life style’ with respect to smoking, diet, weight and exercise. Antiplatelet medication, cholesterol lowering and other pharmacological secondary prevention, incrementally reduce the risk of coronary events, the need for further interventions and death. All patients would have received revascularization for all affected territories, delivered by trial quality teams. Any theoretical benefit to be gained by the marginal effect of the 2nd mammary artery graft may be just too small to show against the marginal disadvantages imposed by more complex surgery. That is not to say that the better biological characteristics of an arterial artery are negated. Individual patients may have benefitted from longer lasting myocardial perfusion, but as a policy, the clinical advantage was too small to show by 5 years [9].
A BIG QUESTION IN THE TREATMENT OF PRIMARY LUNG CANCER
We looked for a similar example in general thoracic surgery, and the treatment of lung cancer seemed an obvious candidate. As radiotherapy has become more efficacious with sophisticated stereotactic techniques, and on the other hand, older and frailer patients are more harmed by surgery, is it time for the less invasive radiotherapy treatment to begin to replace the more tried and tested surgical method?
What have randomized controlled trials told us?
In contrast to coronary artery surgery, there have been vanishingly few randomized trials of lung cancer surgery and none of any significant size [14]. Comparing surgery and radiotherapy, 2 incomplete and undersized trials were pooled. The analysis suggested that radiotherapy might not be inferior to surgery [15].
Analysis of observational data
To answer this question on available observational data, a very large surveillance, epidemiology and end results (SEER) database has been used. It shows a clear advantage for surgery [16]. All the flaws suggested above for the LIMA–BIMA comparison are of course present. Very few patients suitable to have either treatment would have had radiotherapy for primary lung cancer within current practice guidelines. The largest biasing factor is that in current practice, any patient suitable and fit for surgery is offered surgery as the ‘gold standard’; the frail, elderly and marginal patients are more likely to have radiotherapy.
Is a ‘fair test’ by random assignment feasible?
To do an RCT would require surgeons and radiation oncologists, respecting each other’s position and their own inherent beliefs, to seek neutral informed expert help in devising a robust trial [17]. Patients deemed suitable for either treatment would have to be introduced to uncertainty by trained trial staff who would present the pros and cons to the patient from a clearly stated standpoint of ‘not knowing’ which treatment is better under these circumstances. All questions such as ‘what would you have if it were you?’ must be deflected. It is difficult for a clinician, to whom a patient has come because of their expertise in this disease, to baldly reply ‘I don’t know’. It would work better if it is only after random assignment that the patient goes to the assigned practitioner according to expertise-based randomization. The surgeon is then free to boost the individual patient’s trust and confidence in a clinical consultation and need not appear to dissemble by saying she does not know which is the better treatment [18].
Where are we now?
The traditional pyramid of evidence may no longer be sustainable in the current era due to the diversification and the increased complexity of clinical decision making [19]. Perhaps it is time to move towards a more integrated approach to advancing knowledge where clinical trials are embedded in large registries and networks of large datasets, and outcomes are no longer only death and complications but more focused on well-being. Other study design options are cluster-randomized trials, adaptive trials and trials that are embedded within clinical care data or administrative platforms. For an outstanding analysis of the difficulty we face in obtaining evidence for practice, we recommend a Nature Review [20]. Cardiac surgery has followed cardiology in performing trials. Because the effects are more obviously mechanistic, in the early days, observational studies were deemed sufficient in cardiac surgery, but for the important question of choosing the best combination of vascular conduits, as we have seen, big, well-done studies obtained different answers. Getting reliable data is not easy by either route, so combining all available methods is the best way to get to trustworthy guidelines for practice. Thoracic oncology has proved to be a much harder field to evaluate in the modern era. Thoracic surgery was established as a specialty, and the repertoire of anatomical lung resection techniques were already well rehearsed at a time when heart surgery was ruled out of bounds [1]. The place of surgical resection should rightly be evaluated alongside other ablative techniques, established systemic therapies and if we are fortunate, as yet undreamt of methods of treatments. Treatment of lung cancer, a very common disease, may be a fruitful testing ground for the new imaginative methods of seeking evidence for practice.
Conflict of interest: none declared.

The mountain of evidence is difficult to scale. The depiction illustrates that there is a higher volume of ‘mountain’ in the lower slopes and less at the pinnacle, which is the hardest to attain. RCTs: randomized controlled trials.