-
PDF
- Split View
-
Views
-
Cite
Cite
Gerta Rücker, Guido Schwarzer, James R. Carpenter, Harald Binder, Martin Schumacher, Treatment-effect estimates adjusted for small-study effects via a limit meta-analysis, Biostatistics, Volume 12, Issue 1, January 2011, Pages 122–142, https://doi.org/10.1093/biostatistics/kxq046
Close -
Share
Abstract
Statistical heterogeneity and small-study effects are 2 major issues affecting the validity of meta-analysis. In this article, we introduce the concept of a limit meta-analysis, which leads to shrunken, empirical Bayes estimates of study effects after allowing for small-study effects. This in turn leads to 3 model-based adjusted pooled treatment-effect estimators and associated confidence intervals. We show how visualizing our estimators using the radial plot indicates how they can be calculated using existing software. The concept of limit meta-analysis also gives rise to a new measure of heterogeneity, termed G2, for heterogeneity that remains after small-study effects are accounted for. In a simulation study with binary data and small-study effects, we compared our proposed estimators with those currently used together with a recent proposal by Moreno and others. Our criteria were bias, mean squared error (MSE), variance, and coverage of 95% confidence intervals. Only the estimators arising from the limit meta-analysis produced approximately unbiased treatment-effect estimates in the presence of small-study effects, while the MSE was acceptably small, provided that the number of studies in the meta-analysis was not less than 10. These limit meta-analysis estimators were also relatively robust against heterogeneity and one of them had a relatively small coverage error.
1. INTRODUCTION
Systematic reviews and meta-analysis are invaluable tools of collating and synthesizing evidence in the life sciences. However, 2 main threats exist to the validity of meta-analysis, heterogeneity, and small-study effects. Heterogeneity may have different sources. One is “clinical heterogeneity” between patients from different studies, measured, for example, in patient baseline characteristics and not necessarily reflected in the outcome measure. There may also be “heterogeneity related to study design” or other study-level characteristics. In this article, we are interested in “statistical heterogeneity,” quantified on the effect measurement scale. That is, we look at the extent of treatment-by-study interaction (Senn, 2000). Heterogeneity on this scale essentially measures remaining between-study variation, the clinical implications of which are often context specific.
There is a substantial literature on statistical heterogeneity in meta-analysis, see, for example, DerSimonian and Laird (1986), Hardy and Thompson (1998), Thompson and Sharp (1999), Senn (2000), Engels and others (2000), Higgins and Thompson (2002), Sidik and Jonkman (2005), Knapp and others (2006), Mittlböck and Heinzl (2006), Jackson (2006), Viechtbauer (2007) and Rücker and others (2008b).
“Small-study effects” is a generic term for a phenomenon sometimes observed in meta-analysis that small studies have systematically different (often stronger) treatment effects compared to large ones (Sterne and others, 2000). Reasons for this may be publication bias, heterogeneity, selective outcome reporting bias, a mathematical artifact (Schwarzer and others, 2002), or genuine random variation (Rothstein and others, 2005). There is a vast range of tests for small-study effects, most of them based on funnel plots (Begg and Mazumdar, 1994), (Egger and others, 1997), (Harbord and others, 2006), (Peters and others, 2006), (Schwarzer and others, 2007), (Rücker and others, 2008a). Copas and Malley (2008), building on radial plots, developed robust P-values adjusting for small-study effects. Stanley (2008) was probably the first who proposed a regression-based treatment-effect estimate adjusting for small-study effects. Moreno and others (2009a) systematically evaluated adjusted estimates of a similar type in a comprehensive simulation study, including estimates derived from various linear regression tests and the trim-and-fill method introduced by Duval and Tweedie (2000).
The starting point of the present work is that small-study effects cannot easily be separated from heterogeneity. Rather, they can be seen as a particular case of heterogeneity. Consequently, this article has 2 objectives. First, we develop a new model-based method of calculating adjusted treatment-effect estimates for a meta-analysis potentially affected by heterogeneity, including small-study effects. This is done via a so-called limit meta-analysis, a concept developed in Section 2. Second, we use this concept of limit meta-analysis to introduce a new measure of heterogeneity, called G2, which measures only systematic heterogeneity that is not accounted for by small-study effects.
The article is organized as follows. In Section 2, we introduce the limit meta-analysis model. In Section 3, we derive the limit meta-analysis using an empirical Bayes argument. In Section 4, maximum likelihood (ML) estimates of the model parameters are derived. In Section 5, we describe how the model can be interpreted using radial plots. In Section 6, we report results of a simulation study, comparing 3 estimates, based on the limit meta-analysis, with established methods. In Section 7, we build on the limit meta-analysis to derive the new measure, G2. Its properties are explored with real data examples. We conclude with a discussion in Section 8.
2. LIMIT META-ANALYSIS
The concept of limit meta-analysis is based on increasing the precision of a given meta-analysis using a random-effects model that allows for small-study effects. Let k be the number of studies in a meta-analysis, and let xi be the within-study treatment-effect estimate (e.g. a log-odds ratio), σi2 the (true) within-study variance of xi (estimated by si2), and wi = 1/si2 the estimated inverse variance (also called precision) used as the weight of study i(i = 1,…,k) in the usual fixed-effect model.







R,
from fitting (2.2), we define the “limit meta-analysis” with study-specific treatment-effect estimates 3. RELATIONSHIP BETWEEN LIMIT META-ANALYSIS AND STANDARDEMPIRICAL BAYES ESTIMATES

and Var(ui) = σi2 + τ2. Thus, we can split up xi into 
and variance Var(δi) = σi2, with E(δi) = 0 only when αR = 0. The empirical Bayes method estimates the trial-specific heterogeneity residual βi by its posterior mean E(βi), given the observed xi, the prior distribution
, with estimates substituted for τ2,σ2,βR, and αR. Using Bayes formula, we obtain for βi a normal posterior distribution with expectation 

, interpreting the square root of the variance as its standard error
, and furthermore using (2.4), we see that the limit meta-analysis (2.6) can be written as 
. This justifies our choice of si as standard errors of yi. We can view
as the “specific z-score” or “specific standardized residual” for trial i. One can think of setting up a new fixed-effects model, where the population mean is
0 and the (usually unknown) measurement error is now a shrunken estimate of its typical magnitude (
) scaled by the trial-specific standard error si. This gives us yi, which can be viewed as an estimate of the effect from trial i that is more robust with respect to random as well as systematic error. We can then use the (yi,si), i∈1,…,k to estimate the treatment effect (fixed or random-effects model), adjusted for small-study effects, and assess and investigate heterogeneity. Compared to standard empirical Bayes estimation, the limit meta-analysis (2.6) has a shrinkage factor of
, means less shrinkage than with empirical Bayes
(3.1).4. MODEL FITTING USING ML
, representing heterogeneity. As usual in meta-analysis, the observed data are the within-study means xi and their within-study variance estimates
. Using the expectation
and variance Var(xi) = σi2 + τ2 of xi and inserting method-of-moment estimates si2 and
for σi2 and τ2, we obtain the log-likelihood contribution of study i (omitting summands not depending on αR or βR) 
, summing up over all studies and setting the partial derivatives to zero yields the estimates 

R (the treatment-effect estimate) and
R (the small-study effect estimate) can be interpreted as slope and intercept in linear regression on so-called generalized radial plots.



5. MODEL FITTING AND INTERPRETATION USING RADIAL PLOTS
In this section, we show how to fit and interpret the extended random-effects model (2.2) in practice. The idea underlying the extended model (2.2) is best motivated by looking at radial and generalized radial plots, see Galbraith (1988), Copas and Malley (2008), and Copas and Lozada-Can (2009).
5.1 Radial plot
as x-coordinates and
as y-coordinates. As easily seen (and well known), (i) the slope of the regression line through the origin is the treatment-effect estimate of the fixed-effect model, denoted
F and (ii) the sum of squared residuals with respect to this line is Q, which measures the weighted squared deviation of study treatment effects from the overall fixed-effect estimate: 
Thrombolytic therapy data example (Olkin, 1995): radial plot (bottom left panel), generalized radial plot (bottom right panel), and limit radial plot (top panel). Dashed: line through the origin. Solid: best fitting line. Details in text.
Thrombolytic therapy data example (Olkin, 1995): radial plot (bottom left panel), generalized radial plot (bottom right panel), and limit radial plot (top panel). Dashed: line through the origin. Solid: best fitting line. Details in text.
F represents small-study effects. In general, the intercept differs from zero and thus the slope of the line, denoted
F, differs from the fixed treatment-effect estimate. Following Copas and Malley (2008), we can interpret this slope as a fixed treatment-effect estimate, when allowing for small-study effects. This adjustment often results in smaller estimates of the overall treatment effect (Stanley, 2008), (Moreno and others, 2009a). Estimates of slope and intercept are identical to the ML estimates (4.2) and (4.3). In analogy to Q, we can define a measure Q′ of heterogeneity with respect to the best fitting line: 

5.2 Generalized radial plot
Following Copas and Malley (2008), the radial plot can be generalized to incorporate between-study heterogeneity, taking now the values
as x-coordinates and
as y-coordinates (Figure 1, bottom right panel). We do not use different notation for fixed and random-effects weights. This generalized radial plot represents the random-effects model since the slope
R of the line through the origin in the generalized radial plot is the random-effects model (2.1) estimate (dashed line). We propose to complement the plot by a regression line that allows for an intercept (bottom right panel of Figure 1, solid line). Its slope
R is the treatment-effect estimate of the extended model (2.2), allowing for small-study effects. The intercept corresponds to
R, the bias introduced by small-study effects, interpreted as the expected shift in the standardized treatment-effect estimate for a hypothetical “small” study with zero precision.
The generalized radial plot provides an interpretation for β0 = βR + ταR, (2.4). This can be interpreted as the expected adjusted treatment effect of a hypothetical study with infinite precision (s = 0) and coordinates 1/τ and β0/τ since inserting 1/τ in the regression equation provides β0/τ = βR/τ + αR. This regression problem is essentially equivalent to model (1c) by Moreno and others (2009a), see also Stanley (2008).
5.3 Limit radial plot
For M→∞, the radial plots (representing the fixed-effect model) approach a stable limit with x-coordinates 1/si and y-coordinates yi/si (Figure 1, top panel). This is equivalent to shrinking the points of the funnel plot toward a “new” mean, see Section 3. In our data example, both lines become very similar. One may add a generalized limit radial plot, representing the random-effects model for the limit meta-analysis (not shown). As heterogeneity is much reduced by the shrinkage process, this is mostly very similar to the limit radial plot.
5.4 Adjusted treatment-effect estimates
The basis of estimation is either a meta-analysis with raw data (xi,si) or the corresponding limit meta-analysis (yi,si), calculated using (2.6). We use the convention that treatment-effect parameters denoted by the letter β indicate models including an intercept, while parameters denoted by μ indicate models without an intercept. We consider the treatment-effect estimates
F (fixed-effect model),
R (random-effects model), and
lim (limit meta-analysis, fixed-effect model) from a line through the origin, and
R,
F,
lim, and
0 from a best fitting line for the respective models. Our consideration in terms of radial plots shows that all parameters can be estimated using standard meta-analysis software as follows:
Given xi and si for each trial, calculate
F,
R, and an estimate for
, for example, the method-of-moments estimate (DerSimonian and Laird, 1986).Using the radial plot, determine the slope
F.Using
, construct the generalized radial plot and determine the slope
R and the intercept
R.Compute
.Use (2.6) for calculating the limit meta-analysis treatment-effect estimates yi, construct the limit radial plot and compute the slopes
lim of the line through the origin and
lim, allowing for an intercept.
and y-coordinates
, where the weights wi correspond to the chosen model. The slope of the line through the origin is 
lim and
lim, with xi replaced with yi. Standard errors and confidence intervals are calculated in 2 different ways, both based on the study weights wi. For models without an intercept, the usual meta-analytic approach is used, which takes
as standard error of the pooled estimate. For models including an intercept, the standard errors are derived from the variances given in (4.4) and (4.5). The same standard errors are used for
lim (version without an intercept) and
0 and
lim (including intercept). Notice that this approach, in line with the usual random-effects model, treats τ2 as fixed.6. SIMULATION STUDY
In this section, we report results of a simulation study. We computed estimates of μF (usual fixed-effect model), μR (random-effects model), βF (fixed-effect model, allowing for an intercept), βR (random-effects model, allowing for an intercept), μlim (fixed-effect model for the limit meta-analysis, no intercept), βlim (fixed-effect model for the limit meta-analysis, allowing for an intercept), and β0 = βR + ταR. These estimates were compared to the Mantel–Haenszel and the Peto estimate (Greenland and Robins, 1985), (Yusuf and others, 1985). Moreover, we included one of the most successful adjusting methods identified by the recent study of Moreno and others (2009a), the so-called Peters method, see also Peters and others (2006). Criteria were (i) the absolute bias of the treatment-effect estimate, (ii) the MSE, (iii) the observed variance between treatment-effect estimates of the same scenario, and (iv) the coverage of 95% confidence intervals. In particular, we were interested to know to what extent the methods were able to reduce bias in the treatment-effect estimates in the presence of small-study effects.
6.1 Design
We evaluated a number of scenarios, all based on binary response data, with varying number of trials in the meta-analysis (5, 10, 20), control group event probability (0.05, 0.10, 0.20, 0.30), true odds ratio (0.5, 0.667, 0.75, 1), and heterogeneity variance (τ2 = 0,0.05,0.10,0.20). Small-study effects were simulated on the basis of the Copas selection model, see Copas and Shi (2000a), Copas and Shi (2001). This procedure is extensively described in Rücker and others (2008a). The selection parameter of the model, ρ2, was varied from 0 (no selection), 0.36 (low selection), 0.64 (moderate selection), to 1 (strong selection). Trial sizes were drawn from a log-normal distribution that was fitted to the sample sizes of the rosiglitazone meta-analysis (Nissen and Wolski, 2007). The parameters were 6.056 (mean) and 0.69 (variance); quartiles were 244 (25%), 427 (50%), and 747 (75%). Each of 768 scenarios was repeated 1000 times to provide Monte–Carlo estimates of bias and coverage probabilities.
6.2 Results
Results for bias and MSE are shown in Figures 2 and 3 for moderate number of studies (k = 10, fixed) and substantial heterogeneity (τ2 = 0.1), which we believe are representative for many real meta-analyses. Results were similar for the “classical” methods (fixed/random-effects model, Mantel–Haenszel method, and Peto method) with respect to all criteria. Of these, the Peto method had both the least absolute bias and the least MSE. Therefore, only the Peto method was chosen to represent the classical methods on the plots. Not unexpectedly, in general, the absolute bias was larger for smaller number of trials, larger heterogeneity, higher selection (small-study effects), and smaller event rates in the control group. The influence of the true odds ratio was weak. All (unadjusted) classical methods produced estimates biased downward (that is, odds ratios more distant from one) if there were small-study effects, and the MSE increased. The fixed-effect model allowing for an intercept (βF) tended to a small positive bias (that is, an odds ratio nearer to one) and large MSE and variance. For the random-effects model allowing for an intercept (βR), we often found markedly biased estimates, for which reason it is not shown.
Bias of treatment-effect estimate on log-odds ratio scale
for τ2 = 0.1 and 10 studies per meta-analysis, 6 models. Various scenarios.
Bias of treatment-effect estimate on log-odds ratio scale
for τ2 = 0.1 and 10 studies per meta-analysis, 6 models. Various scenarios.
MSE of treatment-effect estimate on log-odds ratio scale for τ2 = 0.1 and 10 studies per meta-analysis, 6 models. Various scenarios.
MSE of treatment-effect estimate on log-odds ratio scale for τ2 = 0.1 and 10 studies per meta-analysis, 6 models. Various scenarios.
By contrast, the estimates based on the limit meta-analysis,
lim,
lim and
0 were nearly unbiased if there were small-study effects.
lim had the smallest bias, but a larger variance of estimates between runs of the same scenario than
lim and
0. Thus,
lim and
0 had the smallest MSE, together with the Peto method. The Peto method had the smallest variance, but tended to bias if there was selection, particularly for small event proportions. For meta-analyses with a very small number of trials (k = 5) or without any selection, however, the new methods were markedly inferior to the classical methods (results not shown). This is no surprise, as it is well known that for small meta-analyses, all funnel plot methods work poorly with respect to both size and power. We thus recommend not to use adjusting methods if the number of trials is less than 10. Moreover, the extended model has an additional parameter which must be estimated. Bias and MSE of the Peters method were small, often ranging between that of the unadjusted methods and those based on the limit meta-analysis.
Coverage was generally poor, if there was heterogeneity, see Figure 4. This is particularly true for large heterogeneity (τ2 = 0.2, results not shown) or strong selection. Naturally, the random-effects model was best within the classical models. The larger the small-study effect, the greater was the superiority of the new methods over the random-effects model. The limit meta-analysis expectation
0 and the Peters method had best coverage within all adjusted models. There was almost no dependence on the true odds ratio but a strong dependence on the control event rate, with an interaction between this and the model used: If there was selection, coverage increased with increasing control event rate for the classical models, while it always decreased for both the new methods and the Peters method. The poor coverage of
lim seems to be due to underestimation of its standard error.
Coverage of 95% confidence intervals of treatment-effect estimate on log-odds ratio scale for τ2 = 0.1 and 10 studies per meta-analysis, 6 models. Various scenarios.
Coverage of 95% confidence intervals of treatment-effect estimate on log-odds ratio scale for τ2 = 0.1 and 10 studies per meta-analysis, 6 models. Various scenarios.
7. A NEW MEASURE OF HETEROGENEITY
The promising results for adjusted treatment-effect estimates based on the limit meta-analysis motivated us to derive a new measure for heterogeneity, called G2. It is determined on a percentage scale, following the established measure I2 introduced by Higgins and Thompson (2002). G2 assesses the proportion of the variance that is unexplained after we have allowed for possible small-study effects in the limit meta-analysis.



For a test of heterogeneity in the usual sense (that is, not distinguishing between heterogeneity caused by small-study effects and heterogeneity from other causes), take Q, which under the null hypothesis of no heterogeneity follows a χk − 12 distribution. Stop if the test is not significant.
If the null hypothesis of no heterogeneity is rejected, carry out an appropriate test of small-study effects, for example, by taking Q − Q′, which under the null-hypothesis of no small-study effects follows a χ12 distribution.
If the null hypothesis of no heterogeneity is rejected, test for residual heterogeneity beyond small-study effects, using Q′, which under the null hypothesis of no residual heterogeneity follows a χk − 22 distribution. Residual heterogeneity can be quantified by G2.
7.1 Examples
In this subsection, we look at 4 examples, representing typical settings: no heterogeneity; small-study effects without additional heterogeneity; and heterogeneity including minor or major small-study effects.
7.1.1 No statistical heterogeneity.
If no statistical heterogeneity is found in the given meta-analysis (that is,
= 0), the limit meta-analysis yields equal yi =
F for all trials, and the limit radial plot is perfectly fitted by the regression line, that is, G2 = 0. Though a test on potential small-study effects may be significant, notice that—as an immediate consequence of the fixed-effect model—all deviations from the mean are random by definition, which means that any apparent small-study effect must be spurious as well. In this case, funnel plot asymmetry disappears with increasing precision. Note, in passing, that the requirement of homogeneity when testing for funnel plot asymmetry—see, for instance, Ioannidis and Trikalinos (2007)—may be reduced ad absurdum.
7.1.2 Small-study effects without additional heterogeneity.
G2 is based on a notion of heterogeneity different from that usually used, in the sense that it does not incorporate potential small-study effects. Thus, it is possible that G2 = 0 while
> 0, that is, there is no other heterogeneity apart from that due to small-study effects (to which τ2 is sensitive but not G2). This can be illustrated by a fictional, somewhat pathological example (Figure 5). Here, we see a striking small-study effect so that all dots in the radial plot lie more or less exactly on one line not going through the origin. Therefore, all variation is explained by a fixed-effect model that allows for small-study effects, and residual heterogeneity measured by G2 is almost zero, though τ2 > 0(Q = 15.46(p = 0.031),I2 = 54.7%[0%;79.5%],G2 = 0.01%). Adjusting for small-study effects yields residual heterogeneity Q′ = 0.052(p = 1.000), that is, there is no heterogeneity left beyond the small-study effect, which itself is significant: Q − Q′ = 15.41(p ≤ 0.001).
Forest plot (left), funnel plot (middle), and radial plot (right) for original (top) and limit (bottom) meta-analysis (fictional example, see text).
Forest plot (left), funnel plot (middle), and radial plot (right) for original (top) and limit (bottom) meta-analysis (fictional example, see text).
7.1.3 Heterogeneity: thrombolytic therapy data.
We use the example introduced above, see Figure 1 and also Figure 6. The various estimates are given in Table 1. This is an example showing some heterogeneity (
= 0.018). The limit radial plot looks similar to the original one and shows some residual variation. We find I2 = 18.6%, whereas G2 = 15.4%. For this meta-analysis, most tests indicate a minor small-study effect (e.g. p = 0.075,0.088,0.091,0.063 for Egger's test, Harbord's test, Peters' test, and the arcsine test, respectively, see Egger and others, 1997, Harbord and others, 2006, Peters and others, 2006, Rücker and others, 2008a). The value of G2 indicates residual heterogeneity beyond this, not explained solely by a fixed-effect model allowing for small-study effects. However, it is is not significant, if tested via Q′ (Q′ = 80.84,p = 0.137). Further, we find Q = 84.73(p = 0.096) and Q − Q′ = 3.884, again consistent with a small-study effect (p = 0.049).
Estimated odds ratios from different models for the thrombolytic therapy data example (Olkin, 1995)
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 0.753 [0.710; 0.798] | 0.732 [0.664; 0.808] | 0.779 [0.735; 0.826] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 0.787 [0.731; 0.847] | 0.840 [0.710; 0.993] | 0.772 [0.717; 0.831] | |
| Expectation | exp(β0) | ||
| 0.796 [0.739; 0.857] | |||
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 0.753 [0.710; 0.798] | 0.732 [0.664; 0.808] | 0.779 [0.735; 0.826] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 0.787 [0.731; 0.847] | 0.840 [0.710; 0.993] | 0.772 [0.717; 0.831] | |
| Expectation | exp(β0) | ||
| 0.796 [0.739; 0.857] | |||
Estimated odds ratios from different models for the thrombolytic therapy data example (Olkin, 1995)
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 0.753 [0.710; 0.798] | 0.732 [0.664; 0.808] | 0.779 [0.735; 0.826] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 0.787 [0.731; 0.847] | 0.840 [0.710; 0.993] | 0.772 [0.717; 0.831] | |
| Expectation | exp(β0) | ||
| 0.796 [0.739; 0.857] | |||
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 0.753 [0.710; 0.798] | 0.732 [0.664; 0.808] | 0.779 [0.735; 0.826] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 0.787 [0.731; 0.847] | 0.840 [0.710; 0.993] | 0.772 [0.717; 0.831] | |
| Expectation | exp(β0) | ||
| 0.796 [0.739; 0.857] | |||
Thrombolytic therapy data (Olkin, 1995). Forest plot (left), funnel plot (middle), and radial plot (right) for original (top) and limit (bottom) meta-analysis.
Thrombolytic therapy data (Olkin, 1995). Forest plot (left), funnel plot (middle), and radial plot (right) for original (top) and limit (bottom) meta-analysis.
7.1.4 Heterogeneity and small-study effects: passive smoking data.
This example by Hackshaw and others (1997) was intensively discussed in the literature for several reasons (for details, see Copas and Shi, 2000b, Senn, 2009). It was also used by Copas and Malley (2008) when deriving a robust P-value for the treatment effect in meta-analysis. We find Q = 47.52(p = 0.095),Q′ = 40.96(p = 0.225), and therefrom Q − Q′ = 6.55(p = 0.010), indicating a small-study effect. The plots are shown in Figure 7, the estimates given in Table 2. Both the fixed-effect model (μF) and the random-effects model (μR) find an odds ratio of about 1.2 and thus a significant excess risk of lung cancer for persons exposed to passive smoking. The effect is reduced when using the limit meta-analysis but still significant (μlim). By contrast, it vanishes completely if adjusting for small-study effects, with nearly concordant estimates for βF,βR,βlim, and β0. For this meta-analysis, heterogeneity seems moderate, measured by
= 0.0168 and also I2 = 24.2%. However, at first glance surprisingly, G2 is very large (G2 = 94.6%), indicating much residual heterogeneity, caused by the fact that just 2 of the 3 large dominating studies show reciprocal effects with mutually exclusive confidence intervals. Since these studies are large, they are relatively insensitive to the shrinkage process, see the bottom left and bottom right panels of Figure 7. The passive smoking data example therefore shows that G2 is particularly sensitive to heterogeneity in large trials.
Estimated odds ratios from different models for the passive smoking data (Hackshaw and others, 1997)
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 1.204 [1.120; 1.295] | 1.238 [1.129; 1.357] | 1.086 [1.010; 1.168] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 1.009 [0.865; 1.176] | 0.973 [0.757; 1.250] | 1.065 [0.913; 1.242] | |
| Expectation | exp(β0) | ||
| 1.094 [0.939; 1.276] | |||
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 1.204 [1.120; 1.295] | 1.238 [1.129; 1.357] | 1.086 [1.010; 1.168] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 1.009 [0.865; 1.176] | 0.973 [0.757; 1.250] | 1.065 [0.913; 1.242] | |
| Expectation | exp(β0) | ||
| 1.094 [0.939; 1.276] | |||
Estimated odds ratios from different models for the passive smoking data (Hackshaw and others, 1997)
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 1.204 [1.120; 1.295] | 1.238 [1.129; 1.357] | 1.086 [1.010; 1.168] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 1.009 [0.865; 1.176] | 0.973 [0.757; 1.250] | 1.065 [0.913; 1.242] | |
| Expectation | exp(β0) | ||
| 1.094 [0.939; 1.276] | |||
| Original meta-analysis | Limit meta-analysis | ||
| Model | Fixed-effect model | Random-effects model | Fixed-effect model |
| Plot | Radial plot | Generalized radial plot | Limit radial plot |
| Model without intercept | exp(μF) | exp(μR) | exp(μlim) |
| 1.204 [1.120; 1.295] | 1.238 [1.129; 1.357] | 1.086 [1.010; 1.168] | |
| Model with intercept | exp(βF) | exp(βR) | exp(βlim) |
| 1.009 [0.865; 1.176] | 0.973 [0.757; 1.250] | 1.065 [0.913; 1.242] | |
| Expectation | exp(β0) | ||
| 1.094 [0.939; 1.276] | |||
Passive smoking data (Hackshaw and others, 1997). Forest plot (left), funnel plot (middle), and radial plot (right) for original (top) and limit (bottom) meta-analysis.
Passive smoking data (Hackshaw and others, 1997). Forest plot (left), funnel plot (middle), and radial plot (right) for original (top) and limit (bottom) meta-analysis.
8. DISCUSSION
We have introduced the concept of limit meta-analysis, derived from an extended random-effects model for meta-analysis which includes a parameter for the bias introduced by potential small-study effects. The limit meta-analysis takes into account small-study bias to yield shrunken estimates of individual study effects. We showed that these shrunken estimates can also be justified from an empirical Bayesian viewpoint. Our approach is thus consistent with the philosophy of random-effects modeling that “inference for each particular study is performed by “borrowing strength” from the other studies” (Higgins and others, 2009). We are essentially correcting for possible small-study effects before we “borrow strength” from other studies. The limit meta-analysis gives rise to 3 possible estimators for the overall intervention effect, adjusting for small sample effects, as well as a measure of heterogeneity after accounting for small sample effects.
We derived ML estimates of the quantities in the limit meta-analysis in Section 4. However, we show—as a direct consequence of interpreting the limit meta-analysis in terms of the radial plot in Section 5—that parameter estimation in limit meta-analyses can be carried out using existing software. This removes one hurdle to its use in practice.
The ultimate aim of quantitative meta-analysis is to arrive at a pooled estimate of the intervention effect. We thus performed a comprehensive simulation study with binary response data to compare the 3 proposed estimators of the pooled effect that emerged from our limit meta-analysis with currently used estimators. The simulation study explored a range of effect sizes, underlying event probabilities, heterogeneity (unrelated to small-study effects) and selection (as a surrogate for small-study effects). All 3 proposed estimators had small bias and comparable mean square error in the presence of small-study effects. However, the “β0” estimator (2.4) gave the best confidence interval coverage and is thus our preferred estimator in the presence of small-study effects.
Unfortunately, neither the proposed nor existing estimators performed acceptably in all situations—that is, in both situations where small-study effects were present and situations where small-study effects were not present. In practice, analysts must therefore decide which estimator to use. To support this decision, we advocate one of the more recent tests for publication bias. A useful summary is given in Chapter 10, Section 4, of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins and Green, 2009). We acknowledge that some authors take a censorious attitude to such tests, believing them misleading (Lau and others, 2006), (Terrin and others, 2003), (Tang and Liu, 2000), stigmatizing them as “pseudo tests” (Ioannidis, 2008), and questioning whether funnel plots are a suited means at all for judging small-study effects (Terrin and others, 2005). However, when applied following a prespecified analysis protocol, with their limitations duly acknowledged, we argue that such concerns (Ioannidis and Trikalinos, 2007) are minimized and that testing is a useful aide to researchers in judging funnel plots. After all, adjusted treatment-effect estimates were used successfully for predicting the effect of the whole database of antidepressant trials in the food and drug administration registry from a biased subset of published trials (Moreno and others, 2009b).
In this paper, we have not considered the source of small-study effects, be it publication bias or heterogeneity arising from differing patient, or other study specific, characteristics. Specifically, when adjusting the treatment-effect estimate for small-study effects, it does not matter where the small-study effect comes from (Moreno and others, 2009a). The limit meta-analysis can be readily extended to adjust for any covariates which explain heterogeneity; it then would address remaining unexplained small-study effects.
An even more provoking question was raised by Stanley and others (2010), whether it “could be better to discard 90% of the data,” arguing that in the presence of small-study effects all adjusting methods lead to estimates that are very similar to the results of the one or 2 largest studies. However, these may also disagree, as illustrated by the passive smoking data example.
Of course, the above process may not explain all the heterogeneity, and we propose the test statistic Q′ and the measure G2 to assess and quantify, respectively, the remaining heterogeneity after adjusting for small-study effects. If we believe the principal source of small-study effects is publication bias, then detecting and investigating heterogeneity “after this has been accounted for” is arguably of greater scientific relevance—as it relates directly to factors affecting the efficacy of the intervention in practical settings.
A potential drawback of our approach is its dependence on the estimation of τ2, for which a number of competing estimators are given in the literature. In this article, we have used the methods-of-moments estimator (DerSimonian and Laird, 1986). This estimator is both the most widely accepted and used. It is implemented in the Review Manager software for Cochrane reviewers (The Cochrane Collaboration, 2009). Unfortunately, the difference between these estimators tends to be greater the smaller the number of studies in the meta-analysis and the smaller the true heterogeneity. We therefore revisited our analysis of both the thrombolytic therapy and the passive smoking meta-analyses, using 7 options for estimating τ2 available in the R package metafor (R Development Core Team, 2008). We found that while our 3 pooled effect estimators were relatively robust, the estimate of G2 varied considerably. Thus, we prefer to use the test statistic Q′ to assess heterogeneity and report G2 as a measure of such heterogeneity, possibly also reporting the latter for a range of estimates of τ2.
Another issue is the use of the Copas selection model for generating the data in the simulation study. Strictly, in using this model, we are generating data from a slightly different model than we are fitting to the data. However, if a method is reliable in this setting, this provides reassurance for its use in practice, where we cannot know the data generation model.
To conclude, we have introduced the idea of a limit meta-analysis which we believe is a promising approach for finding “shrunk,” empirical Bayes, estimates of study effects in the presence of small sample bias. This led to 3 proposed estimators for an overall effect in the presence of small sample bias. Our simulation study suggested all 3 methods had smaller bias and mean square error than estimators which did not account for small sample bias. One of these 3 methods, the “expected limit estimate,” also had good confidence interval coverage and is our preferred method for use in practice. We have also described an approach for assessing heterogeneity after accounting for small-study effects, and illustrated its utility with a reanalysis of data on the effects of passive smoking.
SOFTWARE
All calculations were carried out using the freely available software R, version R-2.10.1, particularly using the packages meta (Schwarzer, 2007) and metafor (R Development Core Team, 2008). R code for calculation of all estimates given in this paper can be obtained from the first author.
FUNDING
Deutsche Forschungsgemeinschaft (FOR 534 Schw 821/2-2 to G.R. and J.R.C).
Conflict of Interest: None declared.







