## Abstract

Meta-analyses of psychological interventions typically find a pooled effect of “psychological intervention” compared with usual care. This answers the research question, “Are psychological interventions in general effective?” In fact, psychological interventions are usually complex with several different components. The authors propose that mixed treatment comparison meta-analysis methods may be a valuable tool when exploring the efficacy of interventions with different components and combinations of components, as this allows one to answer the research question, “Are interventions with a particular component (or combination of components) effective?” The authors illustrate the methods using a meta-analysis of psychological interventions for patients with coronary heart disease for a variety of outcomes. The authors carried out systematic literature searches to update an earlier Cochrane review and classified components of interventions into 6 types: usual care, educational, behavioral, cognitive, relaxation, and support. Most interventions were a combination of these components. There was some evidence that psychological interventions were effective in reducing total cholesterol and standardized mean anxiety scores, that interventions with behavioral components were effective in reducing the odds of all-cause mortality and nonfatal myocardial infarction, and that interventions with behavioral and/or cognitive components were associated with reduced standardized mean depression scores.

Systematic review and meta-analysis are well-established methods for describing and summarizing a body of literature that compares interventions for a given patient population (1). Standard meta-analytical methods are typically restricted to comparisons of 2 interventions using direct, head-to-head evidence alone. So, for example, if we are interested in the A versus B comparison, we would include only studies that compare A versus B directly. Consequently, meta-analyses of psychological interventions typically group all such interventions together as the same “treatment” and make the pairwise comparison of “all psychological interventions” versus some comparator (often “usual care”) (2). This allows us to answer the research question, “Are psychological interventions in general effective?” However, psychological interventions are usually complex, consisting of several different components, and so in some sense no 2 interventions are exactly alike. If there is no systematic structure to the differences between interventions, then this heterogeneity can be incorporated by using random effects models (1). In fact, psychological interventions often have well-defined components that can readily be classified. For example, 1 intervention may be primarily cognitive, another primarily behavioral, while others may have both cognitive and behavioral components. Therefore, differences between interventions can be highly structured.

Meta-regression methods (3) allow the potential for modeling systematic structured differences between interventions. Mixed treatment comparison meta-analysis (4–7) is a special form of meta-regression that enables the simultaneous comparison of multiple interventions in a single analysis, allowing us to make indirect comparisons while fully respecting the randomized structure of the evidence. For example, if we have some studies comparing intervention C with intervention A and some studies comparing intervention C with intervention B, then we can indirectly compare interventions A and B via the common comparator treatment C using the relation $dAB=dAC−dBC$, where d is the treatment effect for the indicated pairwise comparison. All 3 treatments A, B, and C can then be simultaneously compared and ranked according to a given outcome. In addition, we can summarize the uncertainty regarding which is the best intervention type with the probability that it is the most effective. This means that we can put together whole structured networks of evidence and can then answer questions such as, “Which type of intervention has the greatest probability of being most effective?” or “Which combinations of components have the greatest probability of being most effective?”

Applications of these methods have begun to appear in medical journals (8) and in medical decision-making (9, 10). However, there have been no applications, thus far, in the area of psychological interventions and, in particular, not for complex interventions. Our main aim was to develop a framework in which mixed treatment comparison methods could be used to explore the effects of different components of complex interventions. The methods are illustrated by using a Cochrane review of the effect of psychological interventions on coronary heart disease outcomes, that is, coronary heart disease-related morbidity and mortality (2), which is extended here from December 2001 until the end of December 2006.

We first describe the literature searches and the classification scheme for the interventions. We then describe the statistical models that we investigated. Results from the model are presented, followed by a discussion of the methods, some issues related to meta-analysis with continuous outcomes, in particular, when measured on different scales, and further modeling challenges in this area.

## MATERIALS AND METHODS

### Updated systematic literature search

Medline was searched from January 2002 to December 2006 for randomized, controlled trials of psychological interventions for adults with coronary heart disease. To ensure consistency with the earlier review, inclusion criteria stated that trials should use a parallel group design, with at least 6 months’ follow-up, and report at least 1 of the following outcomes: all-cause mortality, cardiac mortality, nonfatal myocardial infarction, total cholesterol, systolic or diastolic blood pressure, depression, or anxiety. Twenty-two studies were identified and added to the 34 studies included in the earlier review. From the earlier review, 1 study was omitted because it was not randomized, and another was omitted because it didn't report the outcome of interest. Of the 22 new studies, we were able to extract useful information from only 19. This gave a total of 51 studies, 7 of which were 3-arm trials. Table 1 shows the number of studies reporting each outcome from the earlier review (2) and from our update; references to the included studies are listed separately as “References to Articles Used in Systematic Review” posted on the Journal’s website (http://aje.oxfordjournals.org/). Full details of the updated searches are available from the authors.

Table 1.

Number of Included Studies Reporting Each Outcome in the Earlier Cochrane Reviewa and Our Update

 Outcome No. of Studies Reporting Outcome Cochrane Review Update All-cause mortality 22 14 Cardiac mortality 11 4 Nonfatal myocardial infarction 18 4 Total cholesterol 9 5 Systolic blood pressure 5 4 Diastolic blood pressure 5 4 Depression 10 9 Anxiety 9 5
 Outcome No. of Studies Reporting Outcome Cochrane Review Update All-cause mortality 22 14 Cardiac mortality 11 4 Nonfatal myocardial infarction 18 4 Total cholesterol 9 5 Systolic blood pressure 5 4 Diastolic blood pressure 5 4 Depression 10 9 Anxiety 9 5
a

Rees et al. Cochrane Database Syst Rev. 2004;(2):CD002902 (2).

### Classification of interventions

Psychological interventions, over and above “usual care,” consisted of components that were classified into 5 groups: educational, behavioral, cognitive, relaxation, and psychosocial support. Educational (EDU) interventions were defined as those explicitly educating patients about health risks, cardiovascular health risks, and basic anatomy of the cardiovascular system. Behavioral (BEH) interventions focused on achieving change in behavioral domains relevant to coronary heart disease (e.g., smoking cessation courses, physical exercise training, food preparation classes, and nutritional counseling sessions). The main purpose of the cognitive (COG) interventions was to change patients' beliefs and perceptions about the factors that lead to coronary heart disease to help them manage their stress and adjust to their medical condition. Usually, the cognitive interventions were implemented by highly specialized clinical psychologists, psychotherapists, or psychiatrists. Relaxation (REL) interventions were focused on training patients in different relaxation techniques, such as yoga and breathing courses. Finally, psychosocial support (SUP) interventions included attempts to bring patients together to encourage practical and/or emotional support. The studies included in our review covered 19 of the 32 possible combinations of these components (Table 2). Table 2 also shows the number of trial arms with each type of intervention, in total and also for each outcome. A full breakdown of the components of interventions on each arm, including a definition of “usual care” used in each study, is given in Web Appendix Table 1. (This information is described in the first of 4 supplementary tables; each is referred to as “Web Appendix Table” in the text and is posted on the Journal’s website (http://aje.oxfordjournals.org/).)

Table 2.

Intervention Components by Study Arma

 Intervention Total No. of Arms No. of Trial Arms by Outcome With Intervention All-Cause Mortality Cardiac Mortality Nonfatal Myocardial Infarction Total Cholesterol Systolic Blood Pressure Diastolic Blood Pressure Depression Anxiety Usual care only 51 (7) 36 (5) 15 (2) 22 (4) 14 9 9 19 (3) 14 Educational 3 (1) 3 (1) 1 1 1 1 1 1 (1) 0 Behavioral 6 (2) 6 (1) 4 (1) 5 (2) 2 0 0 1 0 Cognitive 9 (5) 7 (2) 5 (3) 6 (4) 2 2 2 5 (1) 3 Support 1 1 0 0 1 1 1 0 1 Educational + behavioral 3 2 1 2 0 1 1 2 1 Educational + cognitive 5 (4) 5 (4) 1 2 (1) 1 1 1 4 2 Educational + relaxation 2 2 0 0 0 0 0 1 1 Behavioral + cognitive 4 2 1 1 2 0 0 0 0 Behavioral + relaxation 1 1 1 1 0 0 0 0 0 Cognitive + relaxation 2 1 0 1 1 1 1 1 1 Cognitive + support 1 1 0 1 0 0 0 0 0 Educational + behavioral + cognitive 2 1 1 1 0 0 0 1 0 Educational + behavioral + relaxation 3 (1) 3 (1) 0 3 (1) 1 0 0 2 2 Educational + behavioral + support 1 2 1 0 1 1 1 0 0 Educational + cognitive + relaxation 2 (1) 2 (1) 1 1 0 0 0 0 0 Behavioral + cognitive + relaxation 1 0 0 0 0 0 0 1 1 Behavioral + cognitive + support 1 2 0 1 1 1 1 1 1 Educational + behavioral + cognitive + relaxation 2 0 0 0 1 0 0 0 1
 Intervention Total No. of Arms No. of Trial Arms by Outcome With Intervention All-Cause Mortality Cardiac Mortality Nonfatal Myocardial Infarction Total Cholesterol Systolic Blood Pressure Diastolic Blood Pressure Depression Anxiety Usual care only 51 (7) 36 (5) 15 (2) 22 (4) 14 9 9 19 (3) 14 Educational 3 (1) 3 (1) 1 1 1 1 1 1 (1) 0 Behavioral 6 (2) 6 (1) 4 (1) 5 (2) 2 0 0 1 0 Cognitive 9 (5) 7 (2) 5 (3) 6 (4) 2 2 2 5 (1) 3 Support 1 1 0 0 1 1 1 0 1 Educational + behavioral 3 2 1 2 0 1 1 2 1 Educational + cognitive 5 (4) 5 (4) 1 2 (1) 1 1 1 4 2 Educational + relaxation 2 2 0 0 0 0 0 1 1 Behavioral + cognitive 4 2 1 1 2 0 0 0 0 Behavioral + relaxation 1 1 1 1 0 0 0 0 0 Cognitive + relaxation 2 1 0 1 1 1 1 1 1 Cognitive + support 1 1 0 1 0 0 0 0 0 Educational + behavioral + cognitive 2 1 1 1 0 0 0 1 0 Educational + behavioral + relaxation 3 (1) 3 (1) 0 3 (1) 1 0 0 2 2 Educational + behavioral + support 1 2 1 0 1 1 1 0 0 Educational + cognitive + relaxation 2 (1) 2 (1) 1 1 0 0 0 0 0 Behavioral + cognitive + relaxation 1 0 0 0 0 0 0 1 1 Behavioral + cognitive + support 1 2 0 1 1 1 1 1 1 Educational + behavioral + cognitive + relaxation 2 0 0 0 1 0 0 0 1
a

Numbers in parentheses indicate the number of arms from 3-arm trials.

### Statistical methods

#### Binary outcome data.

Three binary outcome measures were included in our review: all cause mortality, cardiac mortality, and nonfatal myocardial infarction. These can be summarized as binomial counts, $rj,k$, out of total number at risk, $nj,k$, on intervention k in study j. These data provide information on the probability, $pj,k$, of the outcome (risk of mortality, cardiac-specific mortality, and nonfatal myocardial infarction, respectively). We use a logistic regression model. Each study has a reference “baseline” intervention arm, $bj$, with study-specific “baseline” log-odds of outcome, $μj$, for intervention arm, $bj$. The log-odds ratio, $δj,k$, of outcome for intervention k, relative to baseline, $bj$, is assumed to come from a random effects model with mean log-odds ratio, $(dk−dbj)$, and between-study standard deviation, τ, where $dk$ is the mean log-odds ratio of outcome for intervention relative to usual care (so that $d1=0$). This model can be written as follows:

(1)
where
Note that, although in this example all studies had a usual care arm, providing a common baseline comparator, b, so that $(dk−dbj)$ can be replaced by $(dk−db)$ in equation 1, this is not a necessary requirement for the mixed treatment comparison methods (7). The model for the mean log-odds ratios for the interventions, $dk$, is described below. It is through the intervention effect model that we can combine all of the studies comparing different types of intervention.

#### Continuous outcome data measured on the same scale.

Three continuous outcomes measured on the same scale across studies were included in our review: total cholesterol, systolic blood pressure, and diastolic blood pressure. These can be summarized as mean change from baseline, $yj,k$, assumed to have a normal likelihood with corresponding standard error, $SEj,k$, for intervention k in study j. It was necessary, in some trials, to assume a correlation between pre- and posttrial measures in order to obtain these summaries (11). Following the earlier review, we assumed a correlation of 0.5 but also obtained results for a correlation of 0.7 as a sensitivity analysis. The data provide information on the mean outcome, $θj,k$, that is, the mean change in total cholesterol, systolic blood pressure, and diastolic blood pressure, respectively. The model for the intervention effects is identical to that described above (equation 1), with the exception that it is on a natural rather than a logistic scale, giving a linear regression model:

(2)
where
The parameters, however, have a different interpretation. $μj$ is the “baseline” mean change in outcome on the $bj$ arm in study j, and $δj,k$ is the mean difference in change in outcome for intervention k relative to intervention $bj$. $δj,k$ is assumed to come from a random effects model with the mean of the mean differences equal to ($dk−dbj$) and with between-study standard deviation, τ. The model for the mean intervention effects, $dk$, is described below.

#### Continuous outcome data measured on different scales.

For the depression and anxiety outcomes, there were a variety of different scales of measurement across studies. Again, these can be summarized as the mean change from baseline, $yj,k$, assumed to have a normal likelihood with corresponding standard error, $SEj,k$, for intervention k in study j. Again, the data provide information on the mean change in outcome, $φj,k$ (mean depression or anxiety score, respectively). However, the $φj,k$s will be measured on different scales in different studies. We standardize these using the pooled standard deviation across arms within each study, $θj,k=φj,k/SDj$. The linear regression model is on these standardized means, $θj,k$, and is identical to equation 2, although the interpretation of the parameters is different.

$μj$ is the change from baseline in the standardized mean outcome for “baseline” intervention, $bj$, and $δj,k$ is the standardized mean difference (SMD) in change in outcome for intervention k, relative to intervention $bj$. $δj,k$ is assumed to come from a random effects model with the mean of the SMDs equal to ($dk−dbj$) and between-study standard deviation, τ. The model for the mean SMDs, $dk$, is described below.

#### Models for intervention effects.

We explored 4 different models for the intervention effects.

Model 1 (single-effect model). In this model, all psychological interventions are grouped together as a single “treatment,” and all intervention effects are set equal, $dk=d$. This is the model that was used in the original Cochrane review (2).

Model 2 (additive main effects model). In this model, there is a separate effect for each of the different components of an intervention. The total intervention effect, $dk$, is a sum of the relevant component effects, $dEDU,dBEH,dCOG,dREL,dSUP$, for a particular intervention, k. So for an intervention with behavioral and cognitive components, we have $dk=dBEH+dCOG$; for an intervention with educational, cognitive, and support components, we have $dk=dEDU+dCOG+dSUP$, and so on. In general, this model can be written as follows:

where the notation $Ik⊃ EDU$ means that intervention k contains an educational component.

Model 3 (2-way interaction model). This is an extension of the main effects model with additional terms for the combination of each pair of components. This model allows interventions with particular pairs of components to have either a bigger (synergistic) or smaller (antagonistic) effect than would be expected from the sum of their effects alone. As an example, an intervention with educational and cognitive components would have $dk=dEDU+dCOG+dEDU*COG$, and an intervention with behavioral, cognitive, and support components would have $dk=dBEH+dCOG+dSUP+dBEH*COG+dBEH*SUP+dCOG*SUP$. In general, this model can be written as follows:

where the notation, $Ik⊃ {EDU,BEH}$, indicates whether an intervention has both educational and behavioral components.

Model 4 (full interaction model). In this model, each of the 26 possible different interventions has a different effect, $dk=dk$; that is, each different combination of components is considered a different intervention in its own right.

Models 1–4 were compared by using the deviance information criterion (DIC) (12), which is the sum of a measure of goodness of fit (posterior mean deviance) and a measure of model complexity (effective number of parameters). Models with a smaller DIC are preferred; however, differences of less than 3 are not considered important (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/dicpage.shtml (12)). However, it is important to note that the power to estimate and detect evidence of interactions will be limited by the data available on the various combinations of components.

#### Implementation.

All models were fitted by using Bayesian inference computed with Monte Carlo Markov chain simulation in WinBUGS version 1.4.1 (http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml (13)). All baseline and intervention effect parameters were given flat normal(0,1000) priors and the between-study standard deviation flat uniform distributions with an appropriately large range given the scale of measurement. Convergence was assessed by using the Brooks-Gelman-Rubin diagnostic (14) in WinBUGS, and in all cases a burn-in of at least 20,000 simulations was discarded. All results presented are based on a further sample of at least 40,000 simulations. WinBUGS programs can be downloaded from http://www.bris.ac.uk/cobm/research/mpes.

#### Other summary measures.

The Monte Carlo Markov chain simulation framework of WinBUGS allows us to present summaries that are of key interest, such as the probability that a particular intervention is the most effective. This is calculated by recording the proportion of iterations that a given intervention gave the greatest relative effect. We can also obtain relative effect estimates between 2 active interventions. For example, under the main effects model, model 2, the estimated relative effect between an intervention with behavioral and cognitive components compared with an intervention with an educational component is obtained by calculating the function, $dBEH+dCOG−dEDU$, for each iteration of the simulation.

## RESULTS

Table 3 shows that, for all outcomes except cardiac mortality and anxiety, there is little to choose among models 1–4. This suggests that there is little evidence of synergy/attenuation, although the evidence structure here provides low power to detect such interactions (Table 2). Further examination of the cardiac mortality outcome shows a single study (15) that gave results that were inconsistent with the other evidence reporting this outcome. Removing this study shows that there is little to choose among the models for cardiac mortality also (Table 3). For the anxiety outcome, the best fitting model is model 1, where there is a single effect for any psychological intervention, with a DIC at least 6.5 lower than for the more complex models. For all outcomes, we report the results from the simplest model (model 1) and also from the main effects model (model 2), as the most easily interpretable alternative and of most practical interest. This assumes that the effect of the different components is additive. When interpreting the results from model 2, the reader should bear in mind that this analysis is effectively performing 5 different significance tests, and so ideas of significance should be adjusted accordingly (e.g., Bonferroni correction).

Table 3.

Deviance Information Criterion to Compare Models 1–4 for Each of the Outcomes Measures, Assuming a Correlation of 0.5 Between Pre- and Postmeasures for the Continuous Outcomesa

 Outcome Deviance Information Criterionb Model 1 (Single Effect) Model 2 (Additive Main Effects) Model 3 (2-Way Interaction) Model 4 (Full Interaction) All-cause mortality 361.1 360.6 360.9 362.9 Cardiac mortality 160.8 (147.6) 161.2 (150.4) 157.2 (151.7) 157.4 (151.5) Nonfatal myocardial infarction 243.7 241.0 247.2 244.4 Total cholesterol −21.0 −20.0 −18.7 −18.5 Systolic blood pressure 86.3 87.0 87.7 87.6 Diastolic blood pressure 71.0 70.7 70.5 70.5 Depression 121.9 123.5 121.6 123.2 Anxiety 72.4 78.9 82.0 82.1
 Outcome Deviance Information Criterionb Model 1 (Single Effect) Model 2 (Additive Main Effects) Model 3 (2-Way Interaction) Model 4 (Full Interaction) All-cause mortality 361.1 360.6 360.9 362.9 Cardiac mortality 160.8 (147.6) 161.2 (150.4) 157.2 (151.7) 157.4 (151.5) Nonfatal myocardial infarction 243.7 241.0 247.2 244.4 Total cholesterol −21.0 −20.0 −18.7 −18.5 Systolic blood pressure 86.3 87.0 87.7 87.6 Diastolic blood pressure 71.0 70.7 70.5 70.5 Depression 121.9 123.5 121.6 123.2 Anxiety 72.4 78.9 82.0 82.1
a

The numbers in parentheses for cardiac mortality are obtained after omitting study 5 (Cowan et al. Nurs Res. 2001;50(2):68–76 (15)). Note that, in this example, there is little power to detect interaction effects (models 3 and 4).

b

The sum of a measure of goodness of fit (posterior mean deviance) and a measure of model complexity (effective number of parameters).

### Primary outcomes

If we fit model 1, with a single effect for any psychological intervention, then we see an intervention effect only on the nonfatal myocardial infarction outcome (Table 4, model 1) with a posterior mean log-odds ratio of −0.35 (95% credible interval: −0.65, −0.10). The results from model 2, which has an additive effect for each component of an intervention, show that interventions with behavioral components have the strongest effects on all-cause mortality (Table 4, model 2) with a posterior mean log-odds ratio of −0.58 (95% credible interval: −1.13, −0.05) and on nonfatal myocardial infarction (Table 4, model 2) with a posterior mean log-odds ratio of −0.64 (95% credible interval: −1.13, −0.16). From model 2, interventions with a behavioral component were the most effective for all-cause mortality (with probability 0.61) (Table 5) and for cardiac mortality (with probability 0.40) (Table 5), whereas interventions with a psychosocial support component were most effective for nonfatal myocardial infarction (with probability 0.76) (Table 5).

Table 4.

Posterior Mean and 95% Credible Intervals for the Estimated Intervention Effect for Model 1 and Estimated Component Effects for Model 2a

 Outcome Summary Model 1 (Single Effect) Model 2 (Additive Main Effects) d dEDU dBEH dCOG dREL dSUP Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval All-cause mortality Log-odds ratio −0.14 −0.47, 0.15 0.29 −0.27, 0.85 −0.58 −1.13, −0.05 −0.01 −0.52, 0.45 −0.38 −1.16, 0.37 0.21 −0.66, 1.06 Cardiac mortalityb Log-odds ratio −0.16 −0.44, 0.07 0.27 −0.46, 0.98 −0.34 −1.00, 0.30 −0.33 −0.83, 0.03 0.03 −1.49, 1.53 0.10 −1.12, 1.31 Nonfatal myocardial infarction Log-odds ratio −0.35 −0.65, −0.10 −0.16 −0.71, 0.34 −0.64 −1.13, −0.16 −0.09 −0.41, 0.28 −0.005 −0.61, 0.57 −1.49 −3.42, 0.19 Total cholesterol, mmol/L Mean difference −0.32 −0.50, −0.13 −0.13 −0.71, 0.42 −0.14 −0.60, 0.33 −0.29 −0.71, 0.13 0.49 −0.23, 1.24 −0.05 −0.70, 0.61 Systolic blood pressure, mm Hg Mean difference −1.21 −4.24, 2.33 −2.81 −12.84, 7.18 5.53 −8.61, 19.78 −0.95 −9.13, 7.80 −0.07 −17.54, 16.50 −0.74 12.38, 11.63 Diastolic blood pressure, mm Hg Mean difference −1.37 −3.31, 0.62 −3.77 −10.42, 3.00 3.18 −6.61, 12.48 0.89 −4.87, 6.44 −2.39 −14.00, 9.43 −0.85 −8.74, 7.40 Depression SMD −0.23 −0.35, −0.11 −0.01 −0.24, 0.22 −0.26 −0.55, 0.02 −0.24 −0.42, −0.06 0.08 −0.20, 0.34 0.57 −0.07, 1.21 Anxiety SMD −0.15 −0.29, −0.04 −0.19 −0.49, 0.14 −0.02 −0.37, 0.34 −0.12 −0.37, 0.10 0.02 −0.31, 0.34 −0.04 −0.39, 0.38
 Outcome Summary Model 1 (Single Effect) Model 2 (Additive Main Effects) d dEDU dBEH dCOG dREL dSUP Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval Posterior Mean 95% Credible Interval All-cause mortality Log-odds ratio −0.14 −0.47, 0.15 0.29 −0.27, 0.85 −0.58 −1.13, −0.05 −0.01 −0.52, 0.45 −0.38 −1.16, 0.37 0.21 −0.66, 1.06 Cardiac mortalityb Log-odds ratio −0.16 −0.44, 0.07 0.27 −0.46, 0.98 −0.34 −1.00, 0.30 −0.33 −0.83, 0.03 0.03 −1.49, 1.53 0.10 −1.12, 1.31 Nonfatal myocardial infarction Log-odds ratio −0.35 −0.65, −0.10 −0.16 −0.71, 0.34 −0.64 −1.13, −0.16 −0.09 −0.41, 0.28 −0.005 −0.61, 0.57 −1.49 −3.42, 0.19 Total cholesterol, mmol/L Mean difference −0.32 −0.50, −0.13 −0.13 −0.71, 0.42 −0.14 −0.60, 0.33 −0.29 −0.71, 0.13 0.49 −0.23, 1.24 −0.05 −0.70, 0.61 Systolic blood pressure, mm Hg Mean difference −1.21 −4.24, 2.33 −2.81 −12.84, 7.18 5.53 −8.61, 19.78 −0.95 −9.13, 7.80 −0.07 −17.54, 16.50 −0.74 12.38, 11.63 Diastolic blood pressure, mm Hg Mean difference −1.37 −3.31, 0.62 −3.77 −10.42, 3.00 3.18 −6.61, 12.48 0.89 −4.87, 6.44 −2.39 −14.00, 9.43 −0.85 −8.74, 7.40 Depression SMD −0.23 −0.35, −0.11 −0.01 −0.24, 0.22 −0.26 −0.55, 0.02 −0.24 −0.42, −0.06 0.08 −0.20, 0.34 0.57 −0.07, 1.21 Anxiety SMD −0.15 −0.29, −0.04 −0.19 −0.49, 0.14 −0.02 −0.37, 0.34 −0.12 −0.37, 0.10 0.02 −0.31, 0.34 −0.04 −0.39, 0.38

Abbreviations: SMD, standardized mean difference; subscript abbreviations: BEH, behavioral intervention; COG, cognitive intervention; EDU, educational intervention; REL, relaxation intervention; SUP, psychosocial support intervention.

a

Results are shown for the relevant summary measure for each of the outcome measures, assuming a correlation of 0.5 between pre- and postmeasures for the continuous outcomes.

b

Results presented for the cardiac mortality outcome omit study 5 (Cowan et al. Nurs Res. 2001;50(2):68–76 (15)).

Table 5.

Proportion of Simulations From Model 2 (Additive Main Effects) in Which Each Component Was the Most Effective (Had the Lowest Log-Odds Ratio, Mean Difference, or Standardized Mean Difference as Appropriate) for Each of the Outcome Measures, Assuming a Correlation of 0.5 Between Pre- and Postmeasures for the Continuous Outcomes

 Outcome Probability of Being Most Effective Usual Care Educational Behavioral Cognitive Relaxation Psychosocial Support All-cause mortality 0.000 0.009 0.614 0.024 0.316 0.038 Cardiac mortalitya 0.000 0.022 0.398 0.207 0.239 0.135 Nonfatal myocardial infarction 0.000 0.010 0.226 0.001 0.007 0.756 Total cholesterol 0.000 0.194 0.212 0.420 0.012 0.162 Systolic blood pressure 0.004 0.325 0.063 0.195 0.226 0.187 Diastolic blood pressure 0.001 0.465 0.056 0.043 0.317 0.117 Depression 0.000 0.039 0.515 0.423 0.015 0.008 Anxiety 0.000 0.483 0.142 0.176 0.075 0.125
 Outcome Probability of Being Most Effective Usual Care Educational Behavioral Cognitive Relaxation Psychosocial Support All-cause mortality 0.000 0.009 0.614 0.024 0.316 0.038 Cardiac mortalitya 0.000 0.022 0.398 0.207 0.239 0.135 Nonfatal myocardial infarction 0.000 0.010 0.226 0.001 0.007 0.756 Total cholesterol 0.000 0.194 0.212 0.420 0.012 0.162 Systolic blood pressure 0.004 0.325 0.063 0.195 0.226 0.187 Diastolic blood pressure 0.001 0.465 0.056 0.043 0.317 0.117 Depression 0.000 0.039 0.515 0.423 0.015 0.008 Anxiety 0.000 0.483 0.142 0.176 0.075 0.125
a

Results presented for the cardiac mortality outcome omit study 5 (Cowan et al. Nurs Res. 2001;50(2):68–76 (15)).

### Intermediate outcomes

There was no evidence of an effect of either psychological intervention or particular components of interventions on systolic or diastolic blood pressure (Table 4, model 1). There is evidence that psychological interventions, in general, lead to a reduction in mean total cholesterol (Table 4), with a mean difference of −0.32 (95% credible interval: −0.50, −0.13) mmol/L, but this could not be attributable to any single intervention type; interventions with a cognitive, behavioral, educational, or psychosocial support component have a probability of being most effective (P = 0.42, 0.21, 0.19, and 0.16, respectively) (Table 5). However, it is clear that interventions with a relaxation component do not appear to be effective for total cholesterol (P = 0.01) (Table 5). Interventions with an educational component were most effective for systolic and diastolic blood pressure (P = 0.33 and 0.47, respectively) (Table 5).

### Psychological outcomes

There was evidence that psychological interventions reduced standardized mean depression and anxiety scores (Table 4, model 1), with a SMD of −0.23 (95% credible interval: −0.35, −0.11) for depression and a SMD of −0.15 (95% credible interval: −0.29, −0.04) for anxiety. There was some evidence that an intervention with cognitive and/or behavioral components was associated with a reduction in standardized mean depression scores, with a posterior mean SMD of −0.26 (95% credible interval: −0.55, 0.02) for behavioral and a posterior mean SMD of −0.24 (95% credible interval: −0.42, −0.06) for cognitive components, respectively. From model 2, interventions with a behavioral component were most effective for depression (with probability 0.52) (Table 5), whereas interventions with an educational component were most effective for anxiety (with probability 0.48) (Table 5).

## DISCUSSION

Standard methods for the meta-analysis of complex interventions typically “lump” together all intervention arms as the same “treatment” and can therefore only be used to answer the research question, “Are psychological interventions in general effective?” Although this may be of interest, it is hard to see how the results of such an analysis could inform decisions on implementation of interventions or inform the design of new interventions. In our review, there were 19 different types of intervention according to our classification scheme. If psychological interventions are, in general, effective, we are then left with the question, “Which of these 19 in particular should we consider implementing?” If there is a common comparator group across all trials, then standard methods supported by subgroup analyses may shed some light, albeit often with limited power. The mixed treatment comparison methods presented here allow us to investigate further whether interventions with particular components are more likely to be effective. In other words, we can pose the research questions, “Which type of intervention has the greatest probability of being most effective?” with our main effects model (model 2) or “With which types or combinations of component do interventions have the greatest probability of being most effective?” with the interaction models (models 3 and 4). For example, we found—as did the original Cochrane review (2)—that psychological interventions were associated with a reduction in standardized mean depression scores. However, we additionally showed that there was some evidence that interventions with either behavioral or cognitive components were more likely to be effective than interventions without these components. For anxiety, we found—as did the original Cochrane review (2)—that psychological interventions were associated with a reduction in the standardized mean anxiety score. However, we were also able to conclude that different intervention types did not have differential effects, which can only be gleaned from the mixed treatment comparison analyses that we have presented here.

The mixed treatment comparison approach allows us to make indirect comparisons. In this example, there were no trials of relaxation versus usual care, but there were trials on behavioral + relaxation interventions and trials on behavioral interventions, which provide an indirect estimate of the effect of relaxation compared with usual care, as long as we assume that there are no interactions between components. There is a limitation on how many parameters can be identified for a given data structure. In particular, as in this example, there may be little power to detect evidence of interaction effects, and attempts to fit interaction models may result in overfitting of the data. The unified mixed treatment comparison approach does, however, provide greater power than a standard pairwise approach with interaction tests based on subgroup analyses.

Suppose that every trial had included an arm with every distinct possible intervention. The key assumption made by mixed treatment comparison is that arms are missing at random. A further assumption for meta-analyses of complex interventions is that interventions have been defined in a similar way across trials. This is unlikely to be the case; in particular, we might expect heterogeneity in the definition of “usual care.” While standard procedures for the treatment of coronary heart disease patients exist in many countries (e.g., in the United Kingdom (16)), they may well vary between countries and possibly also vary in implementation between health-care regions/hospitals. The definition of “usual care” in this meta-analysis is heterogeneous across studies and even between institutions within studies (Web Appendix Table 1). However, there do not appear to be any systematic patterns in definition of “usual care” across types of interventions, so although we expect such heterogeneity to weaken the efficacies observed, we do not expect it to induce any systematic biases. Any implementation of interventions such as these will be subject to heterogeneity in usual care.

Trials of complex interventions very often report continuous outcome measures that bring with them specific technical challenges for meta-analysis (1). Continuous outcomes are usually measured pre- and postintervention, with interest being on the mean change from the “baseline” measurement. We have summarized the 2 measures using the mean and standard deviation change from baseline which, depending on how results are reported, requires an assumption on the correlation between pre- and postmeasures (11). We assumed in these cases that the correlation is 0.5; however, we found that our results were robust (Web Appendix Tables 2–4) to a higher assumed correlation of 0.7 that has been observed in blood pressure and cholesterol (17, 18). A more sophisticated approach would be to use trial arms that report both pre- and postmeasures to provide information on this correlation (19), which could be applied here. Another issue with continuous outcome measures is that they may be measured on different scales. For example, depression and anxiety scores were measured by using 9 and 3 different instruments, respectively. We have worked with standardized mean scores to allow studies to be pooled regardless of scale of measurement. It is common to summarize each study with a single measure across arms—the SMD (20). However, taking away a fixed mean value from the active intervention arms ignores the uncertainty associated in the usual-care arm measurement. We have instead modeled the uncertainty in each arm and formed the SMD as part of the statistical model on which intervention effects act. While we do not expect the estimated mean effects to differ using this approach, the uncertainty in the intervention effect estimates will be characterized more accurately by use of arm-based data.

We have performed meta-analyses for several different outcomes independently and, in fact, the original review included yet more outcomes. These outcomes, however, are clearly related. We would expect the intermediate outcomes (total cholesterol, systolic blood pressure, and diastolic blood pressure) to be highly related with each other and furthermore to be surrogate markers for the primary endpoint outcomes (all-cause mortality, cardiac mortality, and nonfatal myocardial infarction). Similarly, we would expect the psychological outcomes (depression and anxiety) to be correlated with each other and also associated with other morbidity. There is a need for methods that characterize and incorporate these relations to be developed, which could potentially strengthen the resulting analysis (21).

In health technology assessment, what is required for an intervention to be recommended is not just its effectiveness but also its cost-effectiveness (22). The simulation environment that we have used allows us to easily rank competing interventions in order of their effectiveness (Table 5). To extend this to cost-effectiveness would require cost and quality-of-life data (quality-adjusted life-years) (23). This presents a possible solution to the correlated and multiple outcomes if they could jointly be mapped into a single quality-adjusted life-years outcome measure. Similarly, if all of the continuous outcomes reported on different scales could be mapped onto a common quality-adjusted life-years scale, then we have a standard metric on which they may be combined.

Complex interventions are increasingly being developed and researched in trials on the context of health and disease, and methods of analysis are still developing. The methods presented here will be useful in providing pooled effect estimates that have the potential to answer key questions concerning efficacy, cost-effectiveness, and prioritizing areas for further research.

### Abbreviations

Abbreviations
• DIC

deviance information criterion

• SMD

standardized mean difference

Author affiliations: Academic Unit of Health Primary Care, Department of Community Based Medicine, University of Bristol, Bristol, United Kingdom (N. J. Welton, D. M. Caldwell); and Department of Social Medicine, University of Bristol, Bristol, United Kingdom (E. Adamopoulos, K. Vedhara).

N. J. W., D. M. C., and K. V. were supported by the Medical Research Council's Health Services Research Collaboration, and E. A. received support from a Medical Research Council Health Services Research Collaboration Research Initiation Grant (06/IG2038).

The authors thank Margaret Burke and Karen Rees for providing data from the earlier Cochrane review (2).

Conflict of interest: none declared.

## References

1.
Egger
M
Davey-Smith
G
Altman
DG
Systematic Reviews in Health Care: Meta-Analysis in Context
,
2001
London, United Kingdom
BMJ Publishing Group
2.
Rees
K
Bennett
P
West
R
, et al.  .
Psychological interventions for coronary heart disease
Cochrane Database Syst Rev.
,
2004
2
pg.
CD002902

3.
Sutton
AJ
Abrams
KR
Jones
DR
, et al.  .
Methods for Meta-Analysis in Medical Research
,
2000
London, United Kingdom
Wiley
4.
AE
Sculpher
M
Sutton
A
, et al.  .
Bayesian methods for evidence synthesis in cost-effectiveness analysis
Pharmacoeconomics
,
2006
, vol.
24

1
(pg.
1
-
19
)
5.
AE
Welton
N
Lu
G
Introduction to Mixed Treatment Comparisons
,
2007
Bristol, United Kingdom
University of Bristol

6.
Caldwell
D
A
Higgins
J
Simultaneous comparison of multiple treatments: combining direct and indirect evidence
BMJ
,
2005
, vol.
331

7521
(pg.
897
-
900
)
7.
Lu
G
AE
Combination of direct and indirect evidence in mixed treatment comparisons
Stat Med
,
2004
, vol.
23

20
(pg.
3105
-
3124
)
8.
Psaty
B
Lumley
T
Furberg
CD
, et al.  .
Health outcomes associated with various antihypertensive therapies used as first-line agents: a network meta-analysis
JAMA
,
2003
, vol.
289

19
(pg.
2534
-
2544
)
9.
Bridle
C
Bagnall
A
Duffy
S
, et al.  .
A rapid and systematic review of the clinical and cost-effectiveness of newer drugs for treatment of mania associated with bipolar affective disorder
,
2003
York, United Kingdom
University of York

10.
Wilby
J
Kainth
A
Hawkins
N
, et al.  .
A rapid and systematic review of the clinical effectiveness, tolerability and cost effectiveness of newer drugs for epilepsy in adults
Health Technol Assess
,
2005
, vol.
9

15
(pg.
1
-
172
)
11.
Follmann
D
Elliott
P
Suh
I
, et al.  .
Variance imputation for overviews of clinical trials with continuous response
J Clin Epidemiol
,
1992
, vol.
45

7
(pg.
769
-
773
)
12.
Spiegelhalter
DJ
Best
NG
Carlin
BP
, et al.  .
Bayesian measures of model complexity and fit
J R Stat Soc (B)
,
2002
, vol.
64
(pg.
583
-
616
)
13.
Lunn
DJ
Thomas
A
Best
N
, et al.  .
WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility
Stat Comput.
,
2000
, vol.
10
(pg.
325
-
337
)
14.
Brooks
SP
Gelman
A
Alternative methods for monitoring convergence of iterative simulations
J Comput Graph Stat
,
1998
, vol.
7
(pg.
434
-
455
)
15.
Cowan
MJ
Pike
KC
Kogan Budzynski
H
Psychosocial nursing therapy following sudden cardiac arrest: impact on two year survival
Nurs Res.
,
2001
, vol.
50

2
(pg.
68
-
76
)
16.
Department of Health
National Service Framework for Coronary Heart Disease—Modern Standards and Service Models
,
2000
Norwich, United Kingdom
The Stationery Office
17.
Wilsgaard
T
Jacobsen
BK
Schirmer
H
, et al.  .
Tracking of cardiovascular risk factors
Am J Epidemiol
,
2001
, vol.
154

4
(pg.
418
-
426
)
18.
Rosner
B
Hennekens
CH
Kass
EH
, et al.  .
Age-specific correlation analysis of longitudinal blood pressure data
Am J Epidemiol
,
1977
, vol.
106

4
(pg.
306
-
313
)
19.
Abrams
KR
Gillies
CL
Lambert
PC
Meta-analysis of heterogeneously reported trials assessing change from baseline
Stat Med
,
2005
, vol.
24

24
(pg.
3823
-
3844
)
20.
Higgins
J
Green
S
Cochrane Handbook for Systematic Reviews of Interventions 4.2.4
,
2005
Chichester, United Kingdom
John Wiley & Sons, Ltd
21.
Daniels
MJ
Hughes
MD
Meta-analysis for the evaluation of potential surrogate markers
Stat Med
,
1997
, vol.
16

17
(pg.
1965
-
1982
)
22.
Guide to Methods of Technology Appraisal
,
2004
London, United Kingdom
National Institute for Clinical Excellence
23.
Drummond
MF
Sculpher
MJ
Torrance
GW
, et al.  .
Methods for the Economic Evaluation of Health Care Programmes
,
1997
2nd ed
Oxford, United Kingdom
Oxford University Press