Association is not causation: treatment effects cannot be estimated from observational data in heart failure

Abstract Aims Treatment ‘effects’ are often inferred from non-randomized and observational studies. These studies have inherent biases and limitations, which may make therapeutic inferences based on their results unreliable. We compared the conflicting findings of these studies to those of prospective randomized controlled trials (RCTs) in relation to pharmacological treatments for heart failure (HF). Methods and results We searched Medline and Embase to identify studies of the association between non-randomized drug therapy and all-cause mortality in patients with HF until 31 December 2017. The treatments of interest were: angiotensin-converting enzyme inhibitors, angiotensin receptor blockers, beta-blockers, mineralocorticoid receptor antagonists (MRAs), statins, and digoxin. We compared the findings of these observational studies with those of relevant RCTs. We identified 92 publications, reporting 94 non-randomized studies, describing 158 estimates of the ‘effect’ of the six treatments of interest on all-cause mortality, i.e. some studies examined more than one treatment and/or HF phenotype. These six treatments had been tested in 25 RCTs. For example, two pivotal RCTs showed that MRAs reduced mortality in patients with HF with reduced ejection fraction. However, only one of 12 non-randomized studies found that MRAs were of benefit, with 10 finding a neutral effect, and one a harmful effect. Conclusion This comprehensive comparison of studies of non-randomized data with the findings of RCTs in HF shows that it is not possible to make reliable therapeutic inferences from observational associations. While trials undoubtedly leave gaps in evidence and enrol selected participants, they clearly remain the best guide to the treatment of patients.


Introduction
Randomized controlled trials (RCTs) are widely acknowledged to be the gold standard test of whether or not a drug is beneficial. [1][2][3][4] Although the biases and limitations of non-randomized, observational studies have been recognized for decades (Figure 1), studies of this type purporting to describe the effects of treatment continue to be published, even in high-impact journals. [5][6][7][8][9][10] Indeed, the 'comparative effectiveness' and 'big data' movements have given non-randomized studies a new respectability in some peoples' eyes. [11][12][13] Advocates point to the use of more sophisticated analytical techniques than in the past and increasingly larger 'real-world' datasets. [14][15][16][17] If the findings of observational studies could validly determine the effect of treatments, such information would clearly be of considerable value. On the other hand, if such analyses are inherently flawed they serve only to cause confusion, e.g. the association between hormone replacement therapy studies (see Supplementary material online, Tables S1 and S2). 24,25 Studies judged as having a low risk of bias have been presented separately from those with a high or unclear risk of bias in the Supplementary material online, Figures S6-S19.

Results
We identified 92 publications reporting 94 non-randomized studies.  Together, these described 158 estimates of the 'effect' of the six treatments of interest on all-cause mortality. These six treatments had been tested in 25 RCTs.  The results of our analyses are summarized in Table 1 and described in detail in Tables 2- 6. The forest plots in the Supplementary material online, Figures S6-S19 illustrate the treatment effects/association between treatment and outcomes in the trials and observational studies, respectively, reported in Tables 2-6 and include a quality assessment of these trials/studies.

Angiotensin-converting enzyme inhibitors and angiotensin receptor blockers
Heart failure with reduced ejection fraction Two landmark randomized trials in heart failure with reduced ejection fraction (HFrEF) demonstrated a reduction in mortality with an ACEI [118][119][120] and one further trial showed a consistent benefit with an ARB. 121 We identified one non-randomized study showing lower mortality in patients with HFrEF treated with an ACEI. 26 Most studies, however, examined patients treated with either an ACEI or ARB. Of six such studies, four reported an association between ACEI/ARB use and lower mortality, [26][27][28][29] whereas two did not. 30 Overall, therefore, in HFrEF five non-randomized estimates of treatment 'effect' found that use of an ACEI or ARB was associated with lower mortality and two did not ( Table 2).
Heart failure with preserved ejection fraction One moderately large randomized trial showed no effect of perindopril on mortality, although the estimate of treatment effect was not robust because of limited power. 122 However, two large RCTs showed no effect of irbesartan 123 and candesartan (in Candesartan in Heart failure: Assessment of Reduction in Mortality and morbidity-CHARM) 124 on mortality. Of eight observational studies examining ACEI use and outcome in heart failure with preserved ejection fraction (HFpEF), four suggested that use of this treatment was associated with a lower mortality, [31][32][33][34] whilst four did not not [35][36][37][38] (Table  2). We identified one observational study of ARB use in patients with HFpEF which suggested no mortality benefit. 39 A further three nonrandomized studies reported estimates of a treatment 'effect' for use of either an ACEI or ARB in HFpEF. One study found an association between ACEI/ARB use and better survival 29 and two studies did not. 30 Overall, therefore, in HFpEF, five non-randomized studies found that use of an ACEI or ARB was associated with lower mortality and seven did not ( Table 2).
Mixed/unspecified heart failure phenotype The CHARM Programme showed a neutral effect of candesartan on mortality in patients with HFpEF and HFrEF combined. 127 Nine nonrandomized studies were identified, which reported 10 estimates of a 'treatment-effect' for use of either an ACEI or ARB in patients with HFrEF or HFpEF (i.e. both major HF phenotypes). Of these analyses, eight suggested a benefit [40][41][42][43][44][45][46] and two reported a neutral effect 30 ( Table 2).

Heart failure with preserved ejection fraction
The effect of beta-blockers on mortality was examined in one small randomized trial 136 and a pre-specified subgroup analysis of a randomized trial which included patients with both HFrEF and HFpEF. 132 Overall, we identified 13 non-randomized studies of betablockers in HFpEF, of which nine reported an association between beta-blocker use and better survival, 32,46,50,51,55,[60][61][62][63] whereas four did not 30,53,64 (Table 3).
Mixed/unspecified heart failure phenotype One moderately large RCT evaluated the effects of nebivolol in patients with both HFrEF and HFpEF, demonstrating a neutral effect on mortality. 137 We identified 17 observational studies reporting 19 estimates of the 'effect' of treatment, with 17 suggesting benefit, 41,44-46,55,65-74 and two reporting no difference in outcome between those treated with and not treated with a beta-blocker 30 ( Table 3).
Heart failure with preserved ejection fraction One large RCT showed no effect of spironolactone on mortality in patients with HFpEF. 141 Two observational studies also found a neutral effect, 30,85 but a further non-randomized study reported an association between MRA use and lower mortality 84 (Table 4).
Mixed/unspecified heart failure phenotype Of five studies of patients with a mixed HF phenotype, two suggested benefit, 84,86 and three reported a neutral effect 30,46 (Table 4).

Statins
Heart failure with reduced ejection fraction Two large RCTs showed a neutral effect of rosuvastatin on mortality in HFrEF (one trial included a small number of patients with HFpEF). 142,144 Sixteen non-randomized studies reported 17 estimates of the 'effect' of statin treatment in HFrEF. Of these, 14 reported an association between statin use and better
Heart failure with preserved ejection fraction The use of statins has not been evaluated in a randomized trial in patients with HFpEF, therefore, no relevant non-randomized studies were identified.
Mixed/unspecified heart failure phenotype One large statin trial included patients with both HFrEF and HFpEF and showed no effect of treatment on mortality. 144 Eleven observational studies reported 12 estimates of the 'effect' of a statin in patients with a mixture of HFrEF and HFpEF phenotypes, or where EF was not specified. Of these, 11 reported an association between statin use and better outcome, 45,68,88,[98][99][100][101][102][103][104] with only one describing no relationship between treatment and mortality 30 ( Table 5).

Digoxin
Heart failure with reduced ejection fraction A single RCT, the Digitalis Investigators Group (DIG) trial, showed that, in sinus rhythm, digoxin had a neutral effect on death but reduced the risk of HF hospitalization. 145 Nine non-randomized studies reported 10 estimates of the 'effect' of digoxin treatment in HFrEF, with five concluding digoxin was harmful, [107][108][109][110] four reporting a neutral effect, 30,55,106 and one suggesting digoxin was beneficial 105 ( Table 6).
Heart failure with preserved ejection fraction A single randomized trial of modest size, the DIG ancillary trial in HFpEF (n = 988), showed no effect of digoxin on mortality in patients with HFpEF in sinus rhythm, although the estimate of the effect of treatment was not robust because of limited power. 146 Four observational studies were identified, one suggesting that non-randomized digoxin treatment was beneficial, 105 and three showing a neutral association between treatment and mortality 30,55 ( Table 6).
Mixed/unspecified heart failure phenotype The combined main and ancillary DIG trials showed a neutral effect of digoxin on mortality. 147 Fourteen observational studies reported effect estimates for digoxin in patients with HFrEF and HFpEF in combination, or where EF was not specified. These studies reported 16 estimates of 'treatment-effect'. Seven found an association between the use of digoxin and a higher mortality, 41,65,[113][114][115][116][117] seven were neutral, 30,42,55,112,113 and two suggested better outcomes associated with digoxin use 105,111 (Table 6).

Discussion
There is a particularly strong evidence base for the treatment of HF, making it an appropriate condition in which to compare treatment effects established in RCTs with those estimated in non-randomized and observational studies.
Looking first at patients with HFrEF, six observational studies (reporting seven 'effect' estimates) fulfilled our inclusion criteria, and examined the association between treatment with an ACEI/ARB and mortality. Of these, five showed a lower mortality in patients HFpEF, heart failure with preserved ejection fraction; HFrEF, heart failure with reduced ejection fraction; HR, hazard ratio; ICD, implantable cardioverter defibrillator cohort; IHD, ischaemic heart disease cohort; IPTW, inverse-probabilityof-treatment weighted study; KPNC, Kaiser Permanente Northern California; NHC, National Heart Care; OR, odds ratio; PEARL, Pitavastatin Heart Failure study; PRAISE, Prospective Randomized Amlodipine Survival Evaluation; PROBE, prospective randomized open blind endpoint study; PSM, propensity score matched study; RCT, randomized controlled trial; RR, risk ratio/relative risk; SCD-HeFT, Sudden Cardiac Death in Heart Failure Trial; THIN, The Health Improvement Network; Val-HeFT, Valsartan Heart Failure Trial.
. was relatively good concordance between these non-randomized studies and the pivotal RCTs. However, the same concordance was not found in studies in HFpEF (see below). The non-randomized analyses of beta-blockers in HFrEF also showed good agreement with the RCTs, with 16 of 18 analyses concordant. 28,30,[46][47][48][49][50][51][52][53][54][55][56][57][58][59] However, this was not the case in observational studies of patients with a mixed HF phenotype, where the Study of the Effects of Nebivolol Intervention on Outcomes and Rehospitalisation in Seniors with Heart Failure (SENIORS) trial had shown a neutral effect on mortality. 137 Of the 19 non-randomized analyses, 17 showed a lower mortality among patients of this type treated with a beta-blocker. 30,41,[44][45][46]55,[65][66][67][68][69][70][71][72][73][74] However, the picture was quite different for MRAs, which reduce mortality in HFrEF. Of 12 observational studies, one reported lower mortality in patients treated with a MRA, 75 10 did not find a better or worse outcome (i.e. were neutral), 30,54,[76][77][78][79][80][81][82] and one found a higher mortality (worse outcome) in the MRA treated patients. 83 It is worth exploring this discordance in more detail. By far the largest study included 18 852 patients from Sweden and is worth examining in detail. 79 The authors of this study used matching of spironolactone treated (n = 6551) and untreated (n = 12 301) patients. The authors also attempted to adjust for residual confounding in several different ways. Despite these statistical approaches, the multivariate HR for all-cause mortality with spironolactone vs. no spironolactone was 1.05 [95% confidence interval (CI) 1.00- 1.11; P = 0.054] in the model adjusted for propensity score and 1.10 (95% CI 1.02- 1.19; P = 0.020) in a 1:1 matched model. These findings stand in stark contrast to two separate trials of MRAs in HFrEF. The authors of the above observational study argued that the severity of HF symptoms and concomitant use of beta-blockers might explain the difference between their findings and the Randomized Aldactone Evaluation Study (RALES) trial, which used spironolactone in severely symptomatic patients, few of which were treated with a beta-blocker. 139 However, patients with mild symptoms, the large majority of which were treated with a beta-blocker, were enrolled in the Eplerenone in Mild Patients Hospitalization And Survival Heart Failure (EMPHASIS-HF) trial, which demonstrated a clear mortality benefit of the MRA eplerenone. 138 As an alternative explanation for their discrepant findings, the authors postulated that trial inclusion/exclusion criteria select patients more likely to benefit and less likely to experience harm pointing out, for example, the younger average age of patients in RALES (65 years) compared with the Swedish registry (71 years); however, the average age in EMPHASIS-HF was 69 years. In any case (and counterintuitively), the authors own analysis showed a significant treatment-by-age interaction whereby older (rather than younger) patients did better with MRA treatment. 79 Several other of the authors' subgroup analyses (e.g. significantly better outcome with an MRA in patients without diabetes compared to with diabetes) are directly contradicted by independent but consistent subgroup analyses from RALES and EMPHASIS-HF. The authors of the Swedish study also speculated that patients in the 'real-world' treated with a MRA maybe at greater risk of harm because of less careful monitoring of renal function and potassium.
Another notable example of a discrepancy between observational data and randomized trials does address issues of safety and generalisability. All but three of a remarkable 17 observational 'effect' estimates suggested that statins have a mortality benefit in HFrEF, 28,30,59,[87][88][89][90][91][92][93][94][95][96][97][98] yet two large independent RCTs showed no effect of this type of treatment on death. 142,144 In patients with the mixed/ unspecified HF phenotype, a further 11 of 12 analyses reported an association of statin use with mortality benefit. 30,45,68,88,[98][99][100][101][102][103][104] Again, it is instructive to examine one of the observational studies in detail. Go et al. 104 used a Kaiser Permanente dataset with almost 25 000 patients to conduct careful propensity score-adjusted analyses of outcome related to statin treatment; the authors also used timevarying covariate adjustment for statin initiation during follow-up.
The adjusted HR for all-cause mortality in patients treated with a statin (compared with those who were not) was 0.66 (95% CI 0.61-0.71) in individuals with CHD and 0.60 (95% CI 0.54-0.67) in those without CHD. Apart from the improbably large 'reduction' in mortality (34-40%), the similar 'effect' in patients with and without CHD seems unlikely given everything we know about the actions of statins. Moreover, the prior arguments made about generalisability and safety would need to be inverted here as the observational datasets included broad populations of patients with HF, presumably, receiving less intense monitoring than in the clinical trials. Even in HFpEF, there are clearly discrepant findings between a large observational dataset and two randomized trials with an ARB 123,124 and one trial with an ACEI. 122 Once again, the most obvious example involves the Swedish HF registry. 29 As previously, the authors of this study used an age-and propensity score-matched cohort. The adjusted HR for all-cause mortality in patients treated with an ACEI or ARB, compared with those not treated with one of these agents, was 0.90 (95% CI 0.85-0.96; P = 0.001). The authors also described a 'dose-response' relationship whereby the HR for highdose treatment compared with no treatment was 0.85 (95% CI 0. 78-0.83) and compared with low-dose treatment was 0.94 (95% CI 0.87-1.02). For this study, the authors used the issue of generalisability to explain why they saw benefit compared with the prior trials, in contradistinction to the case for MRAs where the opposite argument was made. Specifically, in this case, with ACEIs and ARBs, they argued that the broader, older and higher-risk population in the registry responded favourably to treatment compared with the more selected participants enrolled in the trials.
Much has been written recently in relation to the safety of digoxin in atrial fibrillation. Indeed, in a very illustrative example of the unreliability of observational data, Bavendiek et al. 148 highlighted how in three separate and independent post hoc analyses of the same dataset, digoxin treatment was variably associated with increased all-cause mortality, was not associated with increased mortality and, in the third analysis, was associated with decreased in mortality in patients with an EF less than 30%. In HF, there is the same type of discrepancy between observational data and the single large RCT in HFrEF, an ancillary trial in HFpEF, and the combined analysis of the effect of digoxin in both HF phenotypes. [145][146][147] In each of these analyses, digoxin had a neutral effect on all-cause mortality. A total of 30 observational analyses variously show better, worse, and neutral outcomes. 30,41,42,55,65,[105][106][107][108][109][110][111][112][113][114][115][116][117] Why the non-randomized analyses of outcomes related to use of ACEI/ARB and beta-blockers in HFrEF were generally (but not absolutely) concordant with the RCTs, in contrast to the other treatments examined, is an interesting question. There may be less confounding by indication, i.e. ACEIs/ARBs and beta-blockers are recommended in essentially all patients with HFrEF, whereas digoxin and, at least until recently, MRAs were reserved for patients with more advanced HF. There may also have been particularly strong publication bias making it difficult to report studies suggesting that use of ACEIs/ARBs or beta-blockers is not associated with better outcomes (or even associated with worse outcomes). Of course, with both treatments there is also a strong selection bias whereby the sickest patients are least likely to be prescribed (and to tolerate) these therapies. The opposite consideration may apply to the nonrandomized studies showing an association between treatments such as statins and lower mortality, with the possibility of other biases such as the 'healthy-user effect' not fully adjusted for.
Although our analyses show that the findings of non-randomized studies of the association between treatment use and outcomes are frequently inconsistent, they do not mean observational studies/ registries are of no value. Registry-based analyses may be all that is available where randomized trials are not possible, such as in rare diseases or for rare outcomes. The latter forms the basis of pharmacoepidemiological surveillance for rare adverse effects of drugs not identified in clinical trials. Non-randomized analyses may provide information on under-studied groups or subgroups excluded from clinical trials. However, the results of such analyses must be interpreted with caution, especially if the results of different analyses of this type conflict. Registries serve an important function in describing the use (or under-use) of evidence-based therapies in the 'real-world', often leading to initiatives to improve prescribing. Perhaps the greatest value of registries is the potential they offer to conduct more 'realworld' randomized trials, i.e. to randomize patients in a registry to treatment and follow their outcomes within the registry. This approach has been pioneered in a study of thrombus aspiration in STsegment elevation myocardial infarction using the Swedish Coronary Angiography and Angioplasty Registry 149 and a similar approach is now being used to conduct the Spironolactone Initiation Registry Randomized Interventional Trial in Heart Failure with Preserved Ejection Fraction (SPIRRIT-HFpEF) 150 in the Swedish HF Registry [NCT02901184].
Our study has a number of strengths and limitations. The strengths include the robust evidence base in HF, with often more than one randomized trial supporting the use or avoidance of specific therapies. There is a specific limitation in relation to the effect of MRAs in HFpEF. In the single, prospective, RCT, ineligible patients were included, and study drug was not administered, at certain investigative sites. 141 As a result, the integrity of the trial has been questioned, as has the overall treatment effect observed. 151 Examination of the effect of therapy in regions where the trial is thought to have been conducted as intended suggested possible benefit of spironolactone, compared with placebo. 140 Consequently, the effect of spironolactone in this RCT and in the one observational analysis which suggested no benefit from MRA therapy may not be in agreement.

Conclusion
This comprehensive comparison of the robust evidence base in HF with an increasing number of non-randomized data shows that it is not possible to make reliable therapeutic inferences from observational associations. While trials undoubtedly leave gaps in evidence Association is not causation