Abstract

Background

Numerous biomarkers have been associated with cancer risk. We assessed whether there is evidence for excess statistical significance in results of cancer biomarker studies, suggesting biases.

Methods

We systematically searched PubMed for meta-analyses of nongenetic biomarkers and cancer risk. The number of observed studies with statistically significant results was compared with the expected number, based on the statistical power of each study under different assumptions for the plausible effect size. We also evaluated small-study effects using asymmetry tests. All statistical tests were two-sided.

Results

We included 98 meta-analyses with 847 studies. Forty-three meta-analyses (44%) found nominally statistically significant summary effects (random effects). The proportion of meta-analyses with statistically significant effects was highest for infectious agents (86%), inflammatory (67%), and insulin-like growth factor (IGF)/insulin system (52%) biomarkers. Overall, 269 (32%) individual studies observed nominally statistically significant results. A statistically significant excess of the observed over the expected number of studies with statistically significant results was seen in 20 meta-analyses. An excess of observed vs expected was observed in studies of IGF/insulin ( P ≤ .04) and inflammation systems ( P ≤ .02). Only 12 meta-analyses (12%) had a statistically significant summary effect size, more than 1000 case patients, and no hints of small-study effects or excess statistical significance; only four of them had large effect sizes, three of which pertained to infectious agents ( Helicobacter pylori , hepatitis and human papilloma viruses).

Conclusions

Most well-documented biomarkers of cancer risk without evidence of bias pertain to infectious agents. Conversely, an excess of statistically significant findings was observed in studies of IGF/insulin and inflammation systems, suggesting reporting biases.

Since the 1980s, biomarkers have frequently been used to refine exposure assessment and evaluate more accurately potential associations with cancer incidence. Biomarkers may also further our understanding of the mechanisms of carcinogenesis. However, biomarker-based epidemiologic studies are not free from biases. Empirical evidence from diverse fields suggests that the literature of biomarker studies may sometimes highlight strong effects that are irreproducible or found to be smaller when larger studies are performed ( 1 , 2 ). One particular concern is that the literature of biomarkers seems to suffer from selective reporting biases favoring statistically significant (“positive”) results ( 3 , 4 ). Studies for biomarkers of cancer prognosis almost always report some statistically significant result ( 5 ). However, it is unclear whether this also equally applies to studies that try to identify associations of biomarkers with cancer risk rather than prognosis.

Bias in favor of positive results may be generated with several different mechanisms ( 6 ). First, bias against the publication of negative results or publication of those results after considerable delay may exist ( 7 ). Second, selective analysis and outcome reporting bias may emerge when there are many analyses that can be performed (using, for example, different outcome definitions, different adjustments for confounders, or models with different statistical terms for exposures and confounders), but only the analysis with the “best” results is presented ( 8–11 ). Third, in theory, positive results may be totally faked, although fraud and data fabrication is unlikely to be nearly as common as the other mechanisms. All of these mechanisms eventually end up producing a literature where an inflated proportion of published studies with positive results exists.

Detecting these biases is not a straightforward process. There are several statistical methods that try to probe for publication bias in studies included in meta-analyses, the most popular of which are asymmetry tests evaluating whether small studies give different results than larger ones ( 12 ). However, these methods may not be very sensitive or specific for detecting such biases, especially when a limited number of studies is included in a meta-analysis ( 12–14 ). An alternative approach is to examine whether there are too many reported statistically significant results in single studies based on what would be expected under different assumptions about the plausible effect size of each association ( 15 ). An added advantage of this excess statistical significance test is that it can be applied not only to a single meta-analysis but also to many meta-analyses across a given field. Thus power is optimized to detect biases that pertain to larger fields and disciplines rather than just single associations. This test has been applied before and has found an excess of statistically significant findings in the fields of randomized trials for neuroleptic drugs, genetic association studies for Alzheimer’s disease, and brain volume abnormalities ( 15–17 ).

The literature on biomarkers and risk of cancer is rapidly expanding and is expected to grow even more in the future with the incorporation of transcriptomics, proteomics, and metabolomics ( 18 ). It is important to understand the extent of potential biases in this field as multiple associations accumulate in its literature. Therefore, in this paper, we probed whether there is evidence for excess statistical significance in studies of biomarkers and risk of cancer, and we tried to evaluate how many of the previously studied associations that have been synthesized with meta-analyses have robust evidence for their presence.

Methods

Study Identification

We systematically searched PubMed from 1966 through December 20, 2010, to identify meta-analyses of epidemiologic studies for the association between any biomarker and cancer risk. The search strategy used the key words “cancer or carcinoma or tumor or neoplasm or neoplasia or maligna*.” The search was limited to meta-analyses of human studies published in the English language, and the above keywords had to appear in the title of the papers. The full text of potentially eligible articles was scrutinized independently by two investigators (S. I. Papatheodorou and K. K. Tsilidis). We excluded meta-analyses that investigated the association between genetic markers and risk of cancer and also studies where biomarkers were used for screening, diagnostic, or prognostic purposes. Articles were retained if they included at least one meta-analysis in which information was provided per included study on a measure of association and its standard error between the biomarker and cancer risk and on the number of cancer cases and controls. When more than one meta-analysis on the same research question was eligible, the meta-analysis with the largest number of component studies was retained for the main analysis. We also performed a sensitivity analysis where we compared the summary effects and 95% confidence intervals between older and newer meta-analyses on the same question. We examined how many of these associations changed nominal statistical significance (statistically significant becoming statistically nonsignificant and vice versa) or retained nominal statistical significance but their effect was more than halved or more than doubled in the new vs the old meta-analysis.

Estimation of Summary Effect and Heterogeneity

For each meta-analysis, we estimated the summary effect size and its confidence intervals using both fixed and random effects models ( 19 ). We consistently used the odds ratio for all calculations, except for one meta-analysis that provided a standardized mean difference as a measure of association, and this was transformed to an odds ratio using an established formula ( 20 , 21 ). We also tested for between-study heterogeneity, and we reported the P value of the χ 2 -based Cochran Q test, and the I2 metric of inconsistency. The Q test is obtained from the weighted sum of the squared differences of the observed effect in each study minus the fixed summary effect ( 22 ). The I2 metric ranges between 0 and 100% and is the ratio of between-study variance over the sum of the within- and between-study variances ( 23 ). Its 95% confidence intervals were calculated as per Ioannidis et al. ( 24 ).

Asymmetry Tests for Small-Study Effects

We evaluated whether there is evidence for small-study effects (ie, whether smaller studies tend to give substantially larger estimates of effect size compared with larger studies). Small-study effects may offer a hint for publication and other selective reporting biases, but they may also reflect genuine heterogeneity, chance, or other reasons for differences between small and large studies ( 12 ). We used the regression asymmetry test proposed by Egger et al ( 25 ). When odds ratios are derived from 2×2 tables, this test may be modestly biased, and modified variants of the test have been proposed ( 26 ). However, the 2×2 table data were not reported by individual study in many meta-analyses to allow using the modified regression tests, and also most meta-analyses used confounder-adjusted estimates of effect in single studies rather than unadjusted odds ratios calculated from 2×2 tables; therefore, the chosen asymmetry test would be appropriate. We used the modified variant of the Egger test in six meta-analyses with available 2×2 table data for each component study that had a P value for the Egger test less than .25 ( 27–32 ). A P value less than .10 with more conservative effect in larger studies was considered evidence for small-study effects.

Evaluation of Excess Statistical Significance

We applied the excess statistical significance test, which evaluates whether the observed number of studies with nominally statistically significant results (positive studies, P < .05) differs from their expected number. We used a binomial test, as previously presented in detail ( 15 ). This test evaluates whether the number of positive studies among those in a meta-analysis is too large based on the power that these studies have to detect plausible effects at α equal to 0.05. The observed vs expected comparison is performed separately for each meta-analysis, and it is also extended to groups including many meta-analyses after summing the observed and expected from each meta-analysis.

The expected number of studies with statistically significant results is calculated in each meta-analysis by the sum of the statistical power estimates for each component study. The sum of the power estimates gives the expected number of positive studies. The estimated power of each component study depends on the plausible effect size for the tested biomarker–cancer association. The true effect size for any meta-analysis is not known, but it can be assumed to be the observed summary effect size in the absence of bias. We conducted sensitivity analyses using various plausible effect sizes: the fixed effects summary, the random effects summary, the effect size of the largest study (smallest standard error) in a meta-analysis, and the most conservative of these three estimates. In the presence of bias, the summary effects are likely to be larger than the true effect, and this situation may arise also for the effect of the largest study. Therefore, all these assumptions tend to be very conservative in testing for excess statistical significance.

The power of each study was calculated with an algorithm using a noncentral t distribution. When we repeated the power calculations in a subset of our sample using logistic regression models in freely available software ( 33 ), we observed very similar estimates. Excess statistical significance for single meta-analyses was claimed at P less than .10 [one-sided P < .05, with observed > expected as previously proposed ( 15 )].

We classified biomarkers into the following categories based on biological pathways or types of exposures involved: insulin-like growth factor (IGF)/insulin system, sex hormones, diet, inflammation, infectious agents, and environment. We assessed excess statistical significance separately in each of these categories because selective reporting bias may affect to a different extent different domains of research. These domains may also have different typical magnitudes of effect sizes and different analytical biases, even if research is sometimes conducted by the same teams across various domains.

We also performed subgroup analyses. The excess statistical significance test was performed separately for meta-analyses with I2 values less than or equal to 50% and greater than 50% because values exceeding 50% are typically considered evidence of large heterogeneity beyond chance ( 34 ). Analyses were also performed by evidence or not of small-study effects, by whether the eligible study was a meta-analysis or an analysis of shared data through a consortium, and by whether the summary effect of the meta-analysis was nominally statistically significant or not per random effects calculations. All statistical tests were two-sided.

Results

Description of Eligible Meta-analyses

The search identified 2840 items, of which 2775 were excluded after the title and abstract review ( Figure 1 ). Of the remaining 65 articles that entered the full-text review, 16 pertained to associations where a more recent meta-analysis was available, 11 did not report sufficient information for the calculation of the excess statistical significance, and one did not use a biomarker. Therefore, 37 articles were selected ( 20 , 27–32 , 35–64 ), which included data on 98 meta-analyses (comparisons) in six broad areas of biomarkers for cancer risk (IGF/insulin system [n = 21 comparisons], sex hormones [n = 13 comparisons], diet [n = 31 comparisons], inflammation [n = 3 comparisons], infectious agents [n = 22 comparisons] and environment [n = 8 comparisons]).

Figure 1.

Flow diagram of the study selection process

Figure 1.

Flow diagram of the study selection process

Supplementary Table 1 (available online) summarizes these 98 meta-analyses that included 847 studies. There were 2 to 42 studies per meta-analysis, with a median of seven studies. The median number of case and control subjects in each study was 120 and 229, respectively, whereas the median number of case and control subjects in each meta-analysis was 1157 and 2119, respectively. Fifty-seven meta-analyses included more than 1000 case patients. Overall, 269 (32%) individual studies observed nominally statistically significant results.

Table 1.

Description of the 43 meta-analyses with nominally statistically significant effect according to random effects calculations*

      Summary odds ratio (95% confidence interval) 
Meta-analyses  Comparison  No. of case/control subjects  Fixed effects  Random effects  Largest study† 
Rinaldi et al, 2010 (35)  IGF-1 and CRC  2862/4966  1.07 (1.01 to 1.14)  1.07 (1.01 to 1.14)  1.04 (0.94 to 1.14) 
Morris et al, 2006 (36)  IGF-2 and CRC  384/1301  1.95 (1.26 to 3.00)  1.95 (1.26 to 3.00)  2.09 (1.14 to 3.82) 
Key et al, 2010 (37)  IGF-1 and postmenopausal BrCA  2853/5332  1.30 (1.13 to 1.49)  1.30 (1.13 to 1.49)  1.48 (1.11 to 1.97) 
Key et al, 2010 (37)  IGFBP-3 and postmenopausal BrCA  2816/5196  1.21 (1.04 to 1.41)  1.22 (1.01 to 1.49)  1.12 (0.76 to 1.65) 
Rowlands et al, 2009 (39)  IGF-1 and PrCA  7481/11866  1.18 (1.14 to 1.23)  1.21 (1.07 to 1.36)  1.05 (0.92 to 1.19) 
Rowlands et al, 2009 (39)  IGFBP-3 and PrCA  6676/10484  0.97 (0.93 to 1.01)  0.88 (0.79 to 0.98)  1.23 (1.07 to 1.42) 
Pisani, 2008 (40)  C-peptide and CRC  1309/4233  1.36 (1.15 to 1.62)  1.51 (1.14 to 1.99)  1.37 (1.00 to 1.88) 
Pisani, 2008 (40)  C-peptide and BrCA  1403/2114  1.26 (1.07 to 1.48)  1.35 (1.01 to 1.81)  1.07 (0.80 to 1.44) 
Pisani, 2008 (40)  C-peptide and pancreatic CA  209/483  1.70 (1.11 to 2.61)  1.70 (1.11 to 2.61)  1.52 (0.87 to 2.64) 
Pisani, 2008 (40)  Glucose and CRC  1741/1380000  1.19 (1.07 to 1.32)  1.28 (1.06 to 1.54)  1.13 (0.98 to 1.30) 
Pisani, 2008 (40)  Glucose and pancreatic CA  298/1330000  1.98 (1.67 to 2.35)  1.98 (1.67 to 2.35)  2.09 (1.70 to 2.58) 
Roddam et al, 2008 (41)  SHBG and PrCA  3704/5998  0.86 (0.76 to 0.97)  0.86 (0.76 to 0.97)  0.81 (0.61 to 1.08) 
Key et al, 2002 (42)  E2 and postmenopausal BrCA  656/1709  1.29 (1.14 to 1.45)  1.26 (1.07 to 1.49)  1.16 (0.90 to 1.48) 
Barba et al, 2009 (43)  16a-OHE1 and PrCA  122/414  1.82 (1.08 to 3.05)  1.82 (1.08 to 3.05)  1.73 (0.96 to 3.12) 
Barba et al, 2009 (43)  2OHE1/16a-OHE1 and PrCA  122/414  0.52 (0.31 to 0.89)  0.52 (0.31 to 0.89)  0.54 (0.30 to 0.99) 
Chen et al, 2010 (46)  25(OH) vitamin D and BrCA  5489/5841  0.58 (0.51 to 0.66)  0.55 (0.38 to 0.80)  0.31 (0.23 to 0.41) 
Gallicchio et al, 2008 (47)  Lycopene and lung CA  727/4567  0.71 (0.51 to 0.99)  0.71 (0.51 to 0.99)  0.86 (0.52 to 1.43) 
Saadatian-Elahi et al, 2004 (49)  Eicosapentanoic acid and BrCA  931/1360  0.91 (0.87 to 0.95)  0.91 (0.87 to 0.95)  0.91 (0.87 to 0.95) 
Simon et al, 2009 (51)  A-linolenic acid and PrCA  1091/1270  1.51 (1.17 to 1.94)  1.54 (1.16 to 2.06)  1.31 (0.89 to 1.95) 
Collin et al, 2010 (53)  Vitamin B12 and PrCA  2906/6495  1.09 (1.03 to 1.14)  1.10 (1.01 to 1.19)  1.07 (0.99 to 1.15) 
Larsson et al, 2010 (54)  Vitamin B6 and CRC  883/1424  0.52 (0.38 to 0.71)  0.52 (0.38 to 0.71)  0.52 (0.29 to 0.92) 
Tsilidis et al, 2008 (55)  C-reactive protein and CRC  1159/37986  1.10 (1.02 to 1.18)  1.12 (1.01 to 1.25)  1.11 (0.95 to 1.30) 
Heikkila et al, 2009 (56)  C-reactive protein and CA  4438/70107  1.09 (1.05 to 1.13)  1.10 (1.02 to 1.18)  1.00 (0.94 to 1.07) 
Wang et al, 2007 (27)  H. pylori and early gastric CA  2722/13976  4.83 (4.27 to 5.48)  3.38 (2.15 to 5.32)  6.38 (5.50 to 7.41) 
Huang et al, 2003 (57)  H. pylori and gastric CA  2284/2770  2.05 (1.79 to 2.35)  2.29 (1.71 to 3.05)  2.07 (1.44 to 2.98) 
Huang et al, 2003 (57)  cagA and gastric CA  1707/2124  2.65 (2.29 to 3.05)  2.87 (1.95 to 4.22)  4.12 (2.97 to 5.72) 
Zhao et al, 2008 (58)  H. pylori and CRC  1709/1872  1.41 (1.22 to 1.65)  1.49 (1.16 to 1.90)  1.02 (0.69 to 1.50) 
Zhuo et al, 2008 (28)  H. pylori and Laryngeal CA  108/249  2.02 (1.27 to 3.23)  2.02 (1.27 to 3.23)  1.74 (0.84 to 3.58) 
Islami and Kamangar, 2008 (59)  H. pylori and esophageal adeno CA  840/2890  0.56 (0.48 to 0.67)  0.57 (0.47 to 0.69)  0.48 (0.33 to 0.69) 
Islami and Kamangar, 2008 (59)  cagA and esophageal adeno CA  275/1197  0.41 (0.29 to 0.59)  0.41 (0.28 to 0.62)  0.26 (0.14 to 0.49) 
Zhuo et al, 2009 (29)  H. pylori and lung CA  199/231  2.31 (1.46 to 3.65)  3.24 (1.11 to 9.41)  1.24 (0.63 to 2.43) 
Gutierrez et al, 2006 (60)  HPV (DNA) and bladder CA  478/179  2.29 (1.37 to 3.84)  2.30 (1.33 to 4.00)  2.44 (0.94 to 6.33) 
Gutierrez et al, 2006 (60)  HPV (no DNA) and bladder CA  176/203  2.98 (1.65 to 5.40)  2.98 (1.65 to 5.40)  2.59 (1.33 to 4.87) 
Hobbs et al, 2006 (31)  HPV and oral CA  1641/2335  1.68 (1.36 to 2.08)  1.99 (1.17 to 3.38)  1.50 (1.20 to 2.00) 
Hobbs et al, 2006 (31)  HPV and oropharynx CA  383/1816  3.01 (2.11 to 4.30)  4.31 (2.07 to 8.95)  2.50 (1.60 to 3.80) 
Hobbs et al, 2006 (31)  HPV and tonsil CA  217/163  15.1 (6.78 to 33.4)  15.1 (6.78 to 33.4)  10.2 (2.80 to 37.0) 
Taylor et al, 2005 (32)  HPV and PrCA  2242/2622  1.37 (1.11 to 1.69)  1.52 (1.12 to 2.06)  1.07 (0.75 to 1.54) 
Mandelblatt et al, 1999 (61)  HPV and cervical CA  589/3068  8.07 (6.49 to 10.0)  8.08 (6.04 to 10.8)  8.10 (5.50 to 11.2) 
Donato et al, 1998 (30)  HBV (HCV-) and liver CA  2881/6318  17.9 (15.7 to 20.5)  21.9 (14.9 to 32.3)  37.8 (26.2 to 54.5) 
Donato et al, 1998 (30)  HCV (HBV-) and liver CA  2079/5615  16.8 (14.1 to 20.0)  20.3 (12.2 to 33.7)  2.40 (1.70 to 4.80) 
Donato et al, 1998 (30)  HCV+HBV and liver CA  567/1870  65.0 (35.0 to 121)  61.2 (27.0 to 139)  40.1 (12.6 to 128) 
Zhang and Begg, 1994 (62)  T. vaginalis and cervical CA  642/65122  1.88 (1.29 to 2.74)  1.88 (1.29 to 2.74)  2.10 (1.30 to 3.40) 
Veglia et al, 2008 (20)  DNA adducts and CA (current smokers)  509/407  3.88 (3.31 to 4.54)  3.76 (1.75 to 8.05)  7.46 (5.89 to 9.43) 
      Summary odds ratio (95% confidence interval) 
Meta-analyses  Comparison  No. of case/control subjects  Fixed effects  Random effects  Largest study† 
Rinaldi et al, 2010 (35)  IGF-1 and CRC  2862/4966  1.07 (1.01 to 1.14)  1.07 (1.01 to 1.14)  1.04 (0.94 to 1.14) 
Morris et al, 2006 (36)  IGF-2 and CRC  384/1301  1.95 (1.26 to 3.00)  1.95 (1.26 to 3.00)  2.09 (1.14 to 3.82) 
Key et al, 2010 (37)  IGF-1 and postmenopausal BrCA  2853/5332  1.30 (1.13 to 1.49)  1.30 (1.13 to 1.49)  1.48 (1.11 to 1.97) 
Key et al, 2010 (37)  IGFBP-3 and postmenopausal BrCA  2816/5196  1.21 (1.04 to 1.41)  1.22 (1.01 to 1.49)  1.12 (0.76 to 1.65) 
Rowlands et al, 2009 (39)  IGF-1 and PrCA  7481/11866  1.18 (1.14 to 1.23)  1.21 (1.07 to 1.36)  1.05 (0.92 to 1.19) 
Rowlands et al, 2009 (39)  IGFBP-3 and PrCA  6676/10484  0.97 (0.93 to 1.01)  0.88 (0.79 to 0.98)  1.23 (1.07 to 1.42) 
Pisani, 2008 (40)  C-peptide and CRC  1309/4233  1.36 (1.15 to 1.62)  1.51 (1.14 to 1.99)  1.37 (1.00 to 1.88) 
Pisani, 2008 (40)  C-peptide and BrCA  1403/2114  1.26 (1.07 to 1.48)  1.35 (1.01 to 1.81)  1.07 (0.80 to 1.44) 
Pisani, 2008 (40)  C-peptide and pancreatic CA  209/483  1.70 (1.11 to 2.61)  1.70 (1.11 to 2.61)  1.52 (0.87 to 2.64) 
Pisani, 2008 (40)  Glucose and CRC  1741/1380000  1.19 (1.07 to 1.32)  1.28 (1.06 to 1.54)  1.13 (0.98 to 1.30) 
Pisani, 2008 (40)  Glucose and pancreatic CA  298/1330000  1.98 (1.67 to 2.35)  1.98 (1.67 to 2.35)  2.09 (1.70 to 2.58) 
Roddam et al, 2008 (41)  SHBG and PrCA  3704/5998  0.86 (0.76 to 0.97)  0.86 (0.76 to 0.97)  0.81 (0.61 to 1.08) 
Key et al, 2002 (42)  E2 and postmenopausal BrCA  656/1709  1.29 (1.14 to 1.45)  1.26 (1.07 to 1.49)  1.16 (0.90 to 1.48) 
Barba et al, 2009 (43)  16a-OHE1 and PrCA  122/414  1.82 (1.08 to 3.05)  1.82 (1.08 to 3.05)  1.73 (0.96 to 3.12) 
Barba et al, 2009 (43)  2OHE1/16a-OHE1 and PrCA  122/414  0.52 (0.31 to 0.89)  0.52 (0.31 to 0.89)  0.54 (0.30 to 0.99) 
Chen et al, 2010 (46)  25(OH) vitamin D and BrCA  5489/5841  0.58 (0.51 to 0.66)  0.55 (0.38 to 0.80)  0.31 (0.23 to 0.41) 
Gallicchio et al, 2008 (47)  Lycopene and lung CA  727/4567  0.71 (0.51 to 0.99)  0.71 (0.51 to 0.99)  0.86 (0.52 to 1.43) 
Saadatian-Elahi et al, 2004 (49)  Eicosapentanoic acid and BrCA  931/1360  0.91 (0.87 to 0.95)  0.91 (0.87 to 0.95)  0.91 (0.87 to 0.95) 
Simon et al, 2009 (51)  A-linolenic acid and PrCA  1091/1270  1.51 (1.17 to 1.94)  1.54 (1.16 to 2.06)  1.31 (0.89 to 1.95) 
Collin et al, 2010 (53)  Vitamin B12 and PrCA  2906/6495  1.09 (1.03 to 1.14)  1.10 (1.01 to 1.19)  1.07 (0.99 to 1.15) 
Larsson et al, 2010 (54)  Vitamin B6 and CRC  883/1424  0.52 (0.38 to 0.71)  0.52 (0.38 to 0.71)  0.52 (0.29 to 0.92) 
Tsilidis et al, 2008 (55)  C-reactive protein and CRC  1159/37986  1.10 (1.02 to 1.18)  1.12 (1.01 to 1.25)  1.11 (0.95 to 1.30) 
Heikkila et al, 2009 (56)  C-reactive protein and CA  4438/70107  1.09 (1.05 to 1.13)  1.10 (1.02 to 1.18)  1.00 (0.94 to 1.07) 
Wang et al, 2007 (27)  H. pylori and early gastric CA  2722/13976  4.83 (4.27 to 5.48)  3.38 (2.15 to 5.32)  6.38 (5.50 to 7.41) 
Huang et al, 2003 (57)  H. pylori and gastric CA  2284/2770  2.05 (1.79 to 2.35)  2.29 (1.71 to 3.05)  2.07 (1.44 to 2.98) 
Huang et al, 2003 (57)  cagA and gastric CA  1707/2124  2.65 (2.29 to 3.05)  2.87 (1.95 to 4.22)  4.12 (2.97 to 5.72) 
Zhao et al, 2008 (58)  H. pylori and CRC  1709/1872  1.41 (1.22 to 1.65)  1.49 (1.16 to 1.90)  1.02 (0.69 to 1.50) 
Zhuo et al, 2008 (28)  H. pylori and Laryngeal CA  108/249  2.02 (1.27 to 3.23)  2.02 (1.27 to 3.23)  1.74 (0.84 to 3.58) 
Islami and Kamangar, 2008 (59)  H. pylori and esophageal adeno CA  840/2890  0.56 (0.48 to 0.67)  0.57 (0.47 to 0.69)  0.48 (0.33 to 0.69) 
Islami and Kamangar, 2008 (59)  cagA and esophageal adeno CA  275/1197  0.41 (0.29 to 0.59)  0.41 (0.28 to 0.62)  0.26 (0.14 to 0.49) 
Zhuo et al, 2009 (29)  H. pylori and lung CA  199/231  2.31 (1.46 to 3.65)  3.24 (1.11 to 9.41)  1.24 (0.63 to 2.43) 
Gutierrez et al, 2006 (60)  HPV (DNA) and bladder CA  478/179  2.29 (1.37 to 3.84)  2.30 (1.33 to 4.00)  2.44 (0.94 to 6.33) 
Gutierrez et al, 2006 (60)  HPV (no DNA) and bladder CA  176/203  2.98 (1.65 to 5.40)  2.98 (1.65 to 5.40)  2.59 (1.33 to 4.87) 
Hobbs et al, 2006 (31)  HPV and oral CA  1641/2335  1.68 (1.36 to 2.08)  1.99 (1.17 to 3.38)  1.50 (1.20 to 2.00) 
Hobbs et al, 2006 (31)  HPV and oropharynx CA  383/1816  3.01 (2.11 to 4.30)  4.31 (2.07 to 8.95)  2.50 (1.60 to 3.80) 
Hobbs et al, 2006 (31)  HPV and tonsil CA  217/163  15.1 (6.78 to 33.4)  15.1 (6.78 to 33.4)  10.2 (2.80 to 37.0) 
Taylor et al, 2005 (32)  HPV and PrCA  2242/2622  1.37 (1.11 to 1.69)  1.52 (1.12 to 2.06)  1.07 (0.75 to 1.54) 
Mandelblatt et al, 1999 (61)  HPV and cervical CA  589/3068  8.07 (6.49 to 10.0)  8.08 (6.04 to 10.8)  8.10 (5.50 to 11.2) 
Donato et al, 1998 (30)  HBV (HCV-) and liver CA  2881/6318  17.9 (15.7 to 20.5)  21.9 (14.9 to 32.3)  37.8 (26.2 to 54.5) 
Donato et al, 1998 (30)  HCV (HBV-) and liver CA  2079/5615  16.8 (14.1 to 20.0)  20.3 (12.2 to 33.7)  2.40 (1.70 to 4.80) 
Donato et al, 1998 (30)  HCV+HBV and liver CA  567/1870  65.0 (35.0 to 121)  61.2 (27.0 to 139)  40.1 (12.6 to 128) 
Zhang and Begg, 1994 (62)  T. vaginalis and cervical CA  642/65122  1.88 (1.29 to 2.74)  1.88 (1.29 to 2.74)  2.10 (1.30 to 3.40) 
Veglia et al, 2008 (20)  DNA adducts and CA (current smokers)  509/407  3.88 (3.31 to 4.54)  3.76 (1.75 to 8.05)  7.46 (5.89 to 9.43) 

* BrCA, breast cancer; CA, cancer; CRC, colorectal cancer; E1, estrone; E2, estradiol; H. pylori, Helicobacter pylori; HBV, hepatitis B virus; HCV, hepatitis C virus; HPV, human papillomavirus; IGF, insulin-like growth factor; IGFBP, insulin-like growth factor binding protein; PrCA, prostate cancer; SHBG, sex hormone binding globulin; T. vaginalis, Trichomonas vaginalis.

† Odds ratio and 95% confidence interval of the largest study (smallest standard error) in each meta-analysis.

Summary Effect Sizes.

Of the 98 meta-analyses, 52 (53%) had nominally statistically significant findings using the fixed effects method, of which 37 reported increased risks and 15 showed decreased risks of cancer. A total of 43 (44%) meta-analyses reported statistically significant findings using the random effects method, of which 34 showed elevated risks. Table 1 shows the summary effect sizes per fixed and random effects and the results of the largest study for these 43 meta-analyses. Supplementary Table 1 (available online) provides this information for all 98 meta-analyses. Only 30 of the 98 associations were nominally statistically significant, based on the largest study estimate. Figure 2 juxtaposes the estimates of the largest studies against the random effects meta-analysis estimates. As shown, among the 41 meta-analyses with summary estimates suggesting decreased risk estimates, 24 showed smaller decreased risk estimates or even increased risk estimates in the largest studies; among the 57 meta-analyses with summary estimates suggesting increased risk, 38 showed smaller increased risk estimates or even decreased risk estimates in the largest studies. Thus, the largest studies were more conservative than the summary effects of the meta-analysis in 62 (63%) of the 98 meta-analyses. Most of the differences in effect size between the largest study and the meta-analyses were modest: among the 43 associations of Table 1 , there were nine for which the ratio of the odds ratios of meta-analyses vs largest studies exceeded 1.5 in either direction (five for which the result was more conservative in the meta-analysis and four for which the result was more conservative in the largest study).

Figure 2.

Odds ratio in each meta-analysis (random effects summary estimate) and in the largest study on each association. The size of each point is proportional to the weight of the evidence (inverse of the variance) of the largest study in each meta-analysis.

Figure 2.

Odds ratio in each meta-analysis (random effects summary estimate) and in the largest study on each association. The size of each point is proportional to the weight of the evidence (inverse of the variance) of the largest study in each meta-analysis.

There were marked differences in the proportion of associations that had nominally statistically significant summary effects across the six categories of biomarkers. Based on random effects calculations, 86%, 67%, and 52% of the meta-analyses on infectious agents, inflammation, and IGF/insulin systems, respectively, found nominally statistically significant summary effects, whereas this was seen only in 31%, 19%, and 13% of the meta-analyses on sex hormones, diet, and environment, respectively.

When we compared the summary effects in the 24 meta-analyses (derived from 16 older articles) that were excluded because a more recent meta-analysis was available to the effects of the more recent respective meta-analyses, we found that in five instances the association was statistically significant in the oldest available meta-analysis but became nonstatistically significant in the new meta-analysis, in two instances the opposite happened, and in another two instances the effect was nominally statistically significant and remained so, but its size decreased to less than half in the new vs the old meta-analysis.

Between-Study Heterogeneity

There was nominally statistically significant heterogeneity in 40 of the 98 meta-analyses ( Supplementary Table 1 , available online). The highest proportion of statistically significant heterogeneity was observed in meta-analyses on biomarkers of inflammation (67%) and IGF/insulin (57%). Values of I2 exceeding 50% were noted in 35 of 98 meta-analyses (36%), and 16 of those had values exceeding 75%. However, many of the 95% confidence intervals of this heterogeneity metric were large, especially when there was only a limited number of studies.

Small-Study Effects

Evidence for statistically significant small-study effects was noted in 9 of 98 meta-analyses ( Supplementary Table 1 , available online). This included four meta-analyses on the IGF/insulin system (IGF-1 and pre-menopausal breast cancer, IGF binding protein 3 [IGFBP-3] and prostate cancer, IGFBP-2 and prostate cancer, and C-peptide and colorectal cancer), three on infectious agents ( H. pylori and gastric and laryngeal cancer, and hepatitis B virus and liver cancer), one on a dietary exposure (docosahexanoic acid and breast cancer), and one on an inflammatory biomarker (C-reactive protein [CRP] and colorectal cancer).

Test of Excess Statistical Significance

Fifteen, 10, or 16 meta-analyses had evidence of a statistically significant excess of positive studies when the plausible effect was assumed to be equal to the fixed effects summary, the random effects summary, or the result of the largest study, respectively. Table 2 shows the results of the excess statistical significance test for the 20 meta-analyses with statistically significant excess of positive studies under any of these three assumptions for the plausible effect size. Eight of them pertained to the IGF/insulin system, four pertained to diet, four pertained to infectious agents, two pertained to inflammation, one pertained to sex hormones, and one pertained to an environmental association.

Table 2.

Observed and expected number of positive studies in the 20 meta-analyses with a statistically significant excess of positive studies under any assumption for the plausible effect size*

Meta-analysis Comparison No. of studies Observed positive Expected positive (fixed)† P (fixed)‡  Expected positive (random)§ P (random)‡  Expected positive (largest)‖ P (largest)‡  
Morris et al, 2006 (36) IGFBP-3 and CRC 0.4 .04 0.4 .05 1.5 .64 
Rowlands et al, 2009 (39) IGF-1 and PrCA 42 17 6.3        5.6×10 −5 7.3        3.5×10 −4 2.4        <1×10 −9 
Rowlands et al, 2009 (39) IGF-2 and PrCA 10 1.4 .04 1.0 .01 1.7 .08 
Rowlands et al, 2009 (39) IGFBP-3 and PrCA 29 1.6        7.3×10 −4 3.7 .09 7.2 1.00 
Rowlands et al, 2009 (39) IGF-1/BP-3 and PrCA 11 0.9 .06 1.2 .12 1.0 .07 
Rowlands et al, 2009 (39) IGFBP-1 and PrCA 0.2 .01 0.5 .09 0.3 .02 
Rowlands et al, 2009 (39) IGFBP-2 and PrCA 0.3 .04 0.8 .17 0.3 .03 
Pisani, 2008 (40) C-peptide and BrCA 11 2.1 .45 3.1 1.00 0.7 .03 
Key et al, 2002 (42) E2 and postmenopausal BrCA 1.5 .05 1.4 .04 0.8 .01 
Saadatian-Elahi et al, 2004 (49) Oleic acid and BrCA 1.5 .18 0.5 .01 5.6 .09 
Saadatian-Elahi et al, 2004 (49) Linoleic acid and BrCA 0.8 .04 1.1 .08 0.7 .02 
Buck et al, 2010 (50) Enterolactone and BrCA 12 2.6 .03 4.3 .37 6.4 1.00 
Collin et al, 2010 (53) Folate and PrCA 0.5 .07 1.0 .27 0.4 .05 
Tsilidis et al, 2008 (55) C-reactive protein and CRC 0.7 .02 0.9 .04 0.8 .03 
Heikkila et al, 2009 (56) C-reactive protein and CA 14 1.6 .07 1.9 .11 0.7 .004 
Zhao et al, 2008 (58) H. pylori and CRC 14 4.2 .57 5.1 1.00 0.7        4.6×10 −4 
Islami and Kamangar, 2008 (59) H. pylori and ESCC 0.6 .02 0.7 .02 0.7 .03 
Zhuo et al, 2009 (29) H. pylori and lung CA 2.5 .64 3.4 .10 0.4 .04 
Hobbs et al, 2006 (31) HPV and larynx CA 2.6  1.00 3.7 .30 0.5 .08 
Veglia et al, 2008 (20) DNA adducts and CA (never smokers) 0.6 .02 1.5 .18 0.9 .05 
Meta-analysis Comparison No. of studies Observed positive Expected positive (fixed)† P (fixed)‡  Expected positive (random)§ P (random)‡  Expected positive (largest)‖ P (largest)‡  
Morris et al, 2006 (36) IGFBP-3 and CRC 0.4 .04 0.4 .05 1.5 .64 
Rowlands et al, 2009 (39) IGF-1 and PrCA 42 17 6.3        5.6×10 −5 7.3        3.5×10 −4 2.4        <1×10 −9 
Rowlands et al, 2009 (39) IGF-2 and PrCA 10 1.4 .04 1.0 .01 1.7 .08 
Rowlands et al, 2009 (39) IGFBP-3 and PrCA 29 1.6        7.3×10 −4 3.7 .09 7.2 1.00 
Rowlands et al, 2009 (39) IGF-1/BP-3 and PrCA 11 0.9 .06 1.2 .12 1.0 .07 
Rowlands et al, 2009 (39) IGFBP-1 and PrCA 0.2 .01 0.5 .09 0.3 .02 
Rowlands et al, 2009 (39) IGFBP-2 and PrCA 0.3 .04 0.8 .17 0.3 .03 
Pisani, 2008 (40) C-peptide and BrCA 11 2.1 .45 3.1 1.00 0.7 .03 
Key et al, 2002 (42) E2 and postmenopausal BrCA 1.5 .05 1.4 .04 0.8 .01 
Saadatian-Elahi et al, 2004 (49) Oleic acid and BrCA 1.5 .18 0.5 .01 5.6 .09 
Saadatian-Elahi et al, 2004 (49) Linoleic acid and BrCA 0.8 .04 1.1 .08 0.7 .02 
Buck et al, 2010 (50) Enterolactone and BrCA 12 2.6 .03 4.3 .37 6.4 1.00 
Collin et al, 2010 (53) Folate and PrCA 0.5 .07 1.0 .27 0.4 .05 
Tsilidis et al, 2008 (55) C-reactive protein and CRC 0.7 .02 0.9 .04 0.8 .03 
Heikkila et al, 2009 (56) C-reactive protein and CA 14 1.6 .07 1.9 .11 0.7 .004 
Zhao et al, 2008 (58) H. pylori and CRC 14 4.2 .57 5.1 1.00 0.7        4.6×10 −4 
Islami and Kamangar, 2008 (59) H. pylori and ESCC 0.6 .02 0.7 .02 0.7 .03 
Zhuo et al, 2009 (29) H. pylori and lung CA 2.5 .64 3.4 .10 0.4 .04 
Hobbs et al, 2006 (31) HPV and larynx CA 2.6  1.00 3.7 .30 0.5 .08 
Veglia et al, 2008 (20) DNA adducts and CA (never smokers) 0.6 .02 1.5 .18 0.9 .05 

* BrCA, breast cancer; CA, cancer; CRC, colorectal cancer; E2, estradiol; ESCC, esophageal squamous cell carcinoma; H. pylori, Helicobacter pylori; HPV, human papillomavirus; IGF, insulin-like growth factor; IGFBP, insulin-like growth factor binding protein; PrCA, prostate cancer.

† Expected number of statistically significant studies using the summary fixed effects estimate of each meta-analysis as the plausible effect size.

‡ P value of the excess statistical significance test. All statistical tests were two-sided.

§ Expected number of statistically significant studies using the summary random effects estimate of each meta-analysis as the plausible effect size.

‖ Expected number of statistically significant studies using the effect of the largest study of each meta-analysis as the plausible effect size.

Table 3 shows aggregate data from all the meta-analyses and according to category of biomarkers. There was no evidence that the overall observed number of positive studies was greater than the expected according to the different assumptions about the plausible effect size. However, there were different patterns across the different categories of biomarkers. An excess of statistically significant studies was observed in meta-analyses that investigated the association between IGF/insulin concentrations and cancer risk ( P ≤ .04 regardless of assumptions) and in studies of inflammatory biomarkers and risk ( P ≤ .02 regardless of assumptions). However, the excess of statistically significant studies in the field of IGF/insulin and cancer risk was primarily driven by meta-analyses of the IGF system concentrations and risk of prostate cancer ( Table 2 ). There was no evidence for excess statistical significance in studies of other types of biomarkers.

Table 3.

Observed and expected number of positive studies by type of biomarker*

Area No. of studies Observed positive Expected positive (fixed)† P ‡ (fixed)  Expected positive (random)§ P ‡ (random)  Expected positive (largest)‖ P ‡(largest)  Expected positive (composite)¶ P ‡ (composite)  
All 847 269 274.9 NP 295.1 NP 288.7 NP 274.9 NP 
IGF/insulin system 230 63 38.9  6.6×10 −5 47.2 .01 49.8 .04 38.9  6.6×10 −5 
Sex hormones 106 14.1 NP 14.7 NP 17.9 NP 14.1 NP 
Diet 195 42 47.0 NP 51.6 NP 49.1 NP 47.0 NP 
Inflammation 26 2.5 .01 2.9 .02 1.7 .001 1.7 .001 
Infectious agents 223 134 159.0 NP 165.0 NP 144.5 NP 144.5 NP 
Environment 67 14 13.5 .88 13.7 .88 25.8 NP 13.5 .88 
Area No. of studies Observed positive Expected positive (fixed)† P ‡ (fixed)  Expected positive (random)§ P ‡ (random)  Expected positive (largest)‖ P ‡(largest)  Expected positive (composite)¶ P ‡ (composite)  
All 847 269 274.9 NP 295.1 NP 288.7 NP 274.9 NP 
IGF/insulin system 230 63 38.9  6.6×10 −5 47.2 .01 49.8 .04 38.9  6.6×10 −5 
Sex hormones 106 14.1 NP 14.7 NP 17.9 NP 14.1 NP 
Diet 195 42 47.0 NP 51.6 NP 49.1 NP 47.0 NP 
Inflammation 26 2.5 .01 2.9 .02 1.7 .001 1.7 .001 
Infectious agents 223 134 159.0 NP 165.0 NP 144.5 NP 144.5 NP 
Environment 67 14 13.5 .88 13.7 .88 25.8 NP 13.5 .88 

* IGF, insulin-like growth factor; NP, not pertinent, because the estimated is larger than the observed, and there is no evidence of excess statistical significance based on the assumption made for the plausible effect size.

† Expected number of statistically significant studies using the summary fixed effects estimate of each meta-analysis as the plausible effect size.

‡ P value of the excess statistical significance test. All statistical tests were two-sided.

§ Expected number of statistically significant studies using the summary random effects estimate of each meta-analysis as the plausible effect size.

‖ Expected number of statistically significant studies using the effect of the largest study of each meta-analysis as the plausible effect size.

¶ Expected number of statistically significant studies using the most conservative of the three estimates (fixed effects summary, random effects summary, largest study) of each meta-analysis as the plausible effect size.

Table 4 shows the observed and expected number of positive studies in different subgroups. Overall, the excess in positive results was driven by meta-analyses with large estimates of heterogeneity. The results were not different by whether the meta-analyses had statistically significant summary effects or not. There were only 10 meta-analyses with evidence of small-study effects and 14 articles with consortia-based analyses, thus limiting inferences regarding these factors.

Table 4.

Observed and expected number of positive studies in subgroups*

Area No. of studies Observed positive Expected positive (fixed)† P (fixed)‡  Expected positive (random)§ P (random)‡  Expected positive (largest)‖ P (largest)‡  Expected positive (composite)¶ P (composite)‡  
All           
  Consortia 152 16 21.7 NP 23.0 NP 27.5 NP 21.7 NP 
  Nonconsortia 695 253 253.3 NP 272.1 NP 261.2 NP 253.3 NP 
  I 2 ≤50  469 87 124.7 NP 131.2 NP 132.6 NP 124.7 NP 
  I 2 >50  378 182 150.2 .001 164.0  .06 156.1 .01 150.2 .001 
  Small-study effects 99 39 30.8 .08 38.7 1.00 31.2 .10 30.8 .08 
  No small-study effects 736 226 237.9 NP 249.9 NP 251.3 NP 237.9 NP 
  Meta-analysis statistically significant# 448 204 220.3 NP 233.2 NP 206.6 NP 206.6 NP 
  Meta-analysis not statistically significant# 399 65 54.7 .14 62.0 .68 82.1 NP 54.7 .14 
Area No. of studies Observed positive Expected positive (fixed)† P (fixed)‡  Expected positive (random)§ P (random)‡  Expected positive (largest)‖ P (largest)‡  Expected positive (composite)¶ P (composite)‡  
All           
  Consortia 152 16 21.7 NP 23.0 NP 27.5 NP 21.7 NP 
  Nonconsortia 695 253 253.3 NP 272.1 NP 261.2 NP 253.3 NP 
  I 2 ≤50  469 87 124.7 NP 131.2 NP 132.6 NP 124.7 NP 
  I 2 >50  378 182 150.2 .001 164.0  .06 156.1 .01 150.2 .001 
  Small-study effects 99 39 30.8 .08 38.7 1.00 31.2 .10 30.8 .08 
  No small-study effects 736 226 237.9 NP 249.9 NP 251.3 NP 237.9 NP 
  Meta-analysis statistically significant# 448 204 220.3 NP 233.2 NP 206.6 NP 206.6 NP 
  Meta-analysis not statistically significant# 399 65 54.7 .14 62.0 .68 82.1 NP 54.7 .14 

* NP, not pertinent, because the estimated is larger than the observed, and there is no evidence of excess statistical significance based on the assumption made for the plausible effect size.

† Expected number of statistically significant studies using the summary fixed effects estimate of each meta-analysis as the plausible effect size.

‡ P value of the excess statistical significance test. All statistical tests were two-sided.

§ Expected number of statistically significant studies using the summary random effects estimate of each meta-analysis as the plausible effect size.

‖ Expected number of statistically significant studies using the effect of the largest study of each meta-analysis as the plausible effect size.

¶ Expected number of statistically significant studies using the most conservative of the three estimates (fixed effects summary, random effects summary, largest study) of each meta-analysis as the plausible effect size.

# According to random effects calculations.

Biomarkers With Strong Evidence of Association

Of the 98 meta-analyses, only 30 had nominally statistically significant summary associations per random effects calculations and had neither evidence of small-study effects nor evidence for excess statistical significance ( Supplementary Table 1 , available online). Of those, only 12 (12%) had also compiled evidence on more than 1000 case patients: four pertained to infectious agents (cagA strains of H. pylori and gastric cancer, human papilloma virus [HPV] and oral cancer, HPV and prostate cancer, and hepatitis C virus and liver cancer), four pertained to IGF/insulin system concentrations (IGF-1 and colorectal cancer, IGF-1 and postmenopausal breast cancer, IGFBP-3 and postmenopausal breast cancer, and glucose and colorectal cancer), and four pertained to other biomarkers (sex hormone binding globulin and prostate cancer, vitamin D and breast cancer, a-linolenic acid and prostate cancer, and vitamin B12 and prostate cancer). Across these 12 associations, the effect sizes were small or moderate (odds ratio = 0.86 to 1.54) for eight of them; larger effect sizes were seen only for three associations of infectious agents (cagA strains of H. pylori and gastric cancer, HPV and oral cancer, and hepatitis C virus and liver cancer), and for the association of vitamin D and breast cancer.

Discussion

This empirical evaluation of the literature on associations between nongenetic biomarkers and risk of cancer examined 98 meta-analyses with 847 studies. We showed that only a minority of these biomarker associations have statistically significant results and no suggestion of bias, as can be inferred by small-study effects and excess statistical significance testing. Most of these associations pertain to infectious agents such as H. pylori , hepatitis virus, and human papilloma virus; the associations of these infectious agents with the risk of specific malignancies are very strong and uncontestable. Conversely, we found a statistically significant excess of positive studies among articles that investigated the association between biomarkers in the IGF/insulin or inflammation systems and cancer risk. This suggests the presence of potentially major selective reporting biases in these fields. Finally, biomarkers of dietary factors, environmental factors, or sex hormones usually had statistically nonsignificant associations with cancer risk in the majority of the meta-analyses where they were involved.

The excess of studies with positive results was driven predominantly by meta-analyses that had large between-study heterogeneity. This has been seen also in empirical evaluations of other research disciplines [eg, brain volume studies ( 16 )]. Heterogeneity may often be a manifestation of bias in some studies of a meta-analysis, and this may be difficult to dissect and differentiate from genuine differences across studies ( 65 ). Some cancer fields had a clustering of several heterogeneous meta-analyses with evidence for excess statistical significance. Specific biomarkers and/or specific cancer associations may be affected more than others by biases even within the same field.

For example, in the IGF/insulin system field, six of the eight meta-analyses with statistically significant excess of positive studies came from the same article and pertained to prostate cancer risk ( 39 ). That article reported an elevated risk between IGF-1 concentrations and total prostate cancer and an inverse association between IGFBP-3 and risk, both of which were statistically significant only in case–control studies, and null but highly heterogeneous results for IGF-2, IGFBP-1, and IGFBP-2 ( 39 ). Therefore, it seems that the evidence for an excess of statistically significant findings in the literature of IGF pathway biomarkers and prostate cancer risk is probably mostly due to biases in small primarily case–control investigations. An empirical survey showed that when the results of the IGF-1 meta-analysis on prostate cancer were sent to the authors of the primary studies and to prominent methodologists, the former group claimed that there was stronger evidence for the presence of an association compared with the latter ( 66 ). Methodologists consistently argued that the data are consistent, with no effect at all or a small effect, whereas primary authors were more favorable for the presence of an association only when they had authored studies with statistically significant results themselves. This does not necessarily mean that these biomarkers are not associated with cancer. In fact, we found several IGF system associations that had statistically significant results with colorectal and breast cancer without any hints of bias. However, the effect sizes in these associations were consistently of small magnitude; similar small magnitude effects may also exist for prostate cancer, and they may be slightly stronger for advanced vs indolent disease as some ( 39 ), but not all reports ( 67 ), have shown.

We also observed an excess of statistically significant studies in the literature of inflammatory biomarkers, and these excesses pertained specifically to meta-analyses of CRP ( 55 , 56 ). CRP has been proposed as a biomarker for a vast array of diseases and outcomes beyond cancer. The overall evidence for an association with cancer risk seems weak, and evidence for other previously considered major associations, such as cardiovascular events, has also weakened over time ( 68 ). Eight studies investigated the association between circulating concentrations of CRP and risk of colorectal cancer; three observed nominally statistically significant findings, and the meta-analysis showed a weak but statistically significantly elevated risk ( 55 ). However, data from the Women’s Health Study ( 69 ), a randomized clinical trial of low-dose aspirin and vitamin E, showed null results, with, if anything, a trend in the opposite direction. There is evidence suggesting that chronic low-grade colonic inflammation may be responsible for the development of colorectal neoplasia ( 70–72 ), but it is unclear whether circulating CRP correlates well with colonic inflammation. Further evidence suggesting a lack of a causal link between CRP and colorectal neoplasia comes from genetic association studies and Mendelian randomization analyses that have generally reported null results ( 73–75 ). It is thus probable that the statistically significant cancer studies in this field are subject to selective reporting or other biases (eg, varying degree of laboratory error in measuring certain biomarkers).

We did not observe any evidence of excess of statistically significant studies in the fields of sex hormones, dietary biomarkers, and environmental factors. Overall, the large majority of meta-analyses in these fields provide no evidence for statistically significant associations. In most of the remaining meta-analyses with nominally statistically significant summary results, there are no large-enough studies yet, and the total evidence is based on less than 1000 cancer cases. Thus even these few positive results may be spurious. Several large prospective cohort studies and randomized trials have generally found no or little evidence to support previously held nutritional associations ( 76 ). Consortia-based analyses of sex hormones also generally find mostly null associations with cancer ( 41 ). Finally, there was only one statistically significant meta-analysis for environmental factors, which reported a strong association between DNA adducts and cancer risk in current smokers ( 20 ), but this evidence is still limited, and no large-scale study has been published to date.

In contrast with the above, there are several very strong associations in the literature of infectious agents and cancer risk, such as the association between cagA strains of H. pylori and gastric cancer, HPV and oral cancer, or hepatitis C virus and liver cancer. These associations are uncontestable; they have a clear causal hypothesis, and their effect sizes are typically much larger than the more uncertain effects described for other types of biomarkers. However, even in the field of infectious agents, we found some specific cancer associations that had evidence of excess statistical significance. These were the meta-analyses suggesting increased risk of colorectal cancer and lung cancer and decreased risk of esophageal squamous-cell carcinoma with H. pylori and increased risk of laryngeal cancer with HPV. The summary effects for these more spurious associations were not as strong as those of the associations of H. pylori with gastric cancer or HPV with oral cancer. It is possible that this situation may reflect a type of bias, where once a strong risk factor has been documented for one type of cancer, a collateral literature of lesser credibility is built for other types of cancer.

Several limitations and caveats should be considered in the interpretation of our findings. First, both asymmetry and excess statistical significance tests offer hints of bias, not definitive proof thereof. The frequency of meta-analyses with small-study asymmetry effects was not high (9%), and this rate is commensurate with chance. Our finding of an excess of statistically significant findings in studies of the IGF/insulin and inflammation systems may be challenged by the fact that in none of these meta-analyses was the summary random effect dramatically larger than that of the largest study in the meta-analysis. However, when the effects are mostly small, dramatic differences are unlikely to be seen even in biased evidence. Not surprisingly, we did not observe many situations where the differences in the effect sizes in the largest sizes vs the meta-analyses exceeded 1.5-fold.

Second, most individual studies were relatively small, with approximately 100 cancer cases and 200 matched control subjects, and the median number of included studies in each meta-analysis was only seven. Therefore, the interpretation of the excess statistical significance test for the results of a single meta-analysis, especially one with few studies, should be very cautious, because a negative test does not exclude the potential for bias ( 15 ). The excess statistical significance test is better suited to provide an overall impression about the average level of reporting bias affecting larger fields with several meta-analyses. However, different meta-analyses in the field may not be equally affected by bias to the same extent.

Third, the results of studies included in a meta-analysis may have already been standardized to some extent (eg, cleaned or made to follow consistent definitions and adjustments) compared with the results presented in each study’s original paper. However, such standardization efforts are likely to reduce, if anything, inconsistency and selective reporting bias. Selective reporting may be more prominent in the primary study reports.

Finally, the exact estimation of excess statistical significance is influenced by the choice of plausible effect size and/or the miscalculation of power. We performed sensitivity analyses using different plausible effect sizes, which yielded similar findings. If anything, it is likely that our estimates of the plausible effect size are often larger than the true effect size because bias tends to inflate the estimates of the summary effect. Effect inflation may affect even the results of the largest studies because often these studies were not necessarily very large and/or may have had inherent biases themselves. Thus, our estimates of the extent of excess statistical significance are conservative and the problem may be more severe. We also calculated power estimates under different assumptions and using different software programs but again observed very similar estimates.

Acknowledging these caveats, our evaluation maps the status of the evidence on 98 associations between nongenetic biomarkers and cancer risk. There is substantial diversity in the strength of the evidence across different fields, ranging from highly credible uncontestable associations for several infectious agents to potentially spurious associations for several other popular biomarkers. Meta-analyses may help understand this diversity, but they are also subject to biases. It is possible that evidence eventually corrects itself, but data from more studies are needed to better support this statement. A substantial number of cancer biomarkers may be genuine but have small effect sizes, as has been documented recently for well-validated genetic biomarkers ( 77 ), and this means that great care needs to be taken to separate genuine small effects from the noise of bias. There are several ways to improve this evidence in the future. First, many of the biases in the scientific literature may be substantially lessened if studies were more completely and transparently reported according to published guidelines, such as the Strengthening the Reporting of Observational Studies in Epidemiology statement and its extension for Molecular Epidemiology ( 78–80 ). Second, statistical significance testing should not be used as a criterion for publication of biomarker studies. Third, large prospective and multicenter studies and collaborative consortia should be encouraged for biomarker associations that currently have limited evidence. The use of standardized definitions and protocols for exposures, outcomes, and statistical analyses may diminish the threat of biases and improve the reliability of this important literature.

Funding

This work was supported by the Seventh Framework Programme of the European Union (PIEF-GA-2010–276017 to KKT) and by an unrestricted gift from Sue and Bob O’Donnell to the Stanford Prevention Research Center.

Notes

The study sponsors had no role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the systematic review for publication.

References

1.
Ioannidis
JP
Panagiotou
OA.
Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses
JAMA
 
2011
305
(
21
):
2200
2210
2.
Bossuyt
PM.
The thin line between hope and hype in biomarker research
JAMA
 
2011
305
(
21
):
2229
2230
3.
Rifai
N
Altman
DG
Bossuyt
PM.
Reporting bias in diagnostic and prognostic studies: time for action
Clin Chem
 
2008
54
(
7
):
1101
1103
4.
Moons
KG
Altman
DG
Vergouwe
Y
Royston
P.
Prognosis and prognostic research: application and impact of prognostic models in clinical practice
BMJ
 
2009
338
b606
5.
Kyzas
PA
Denaxa-Kyza
D
Ioannidis
JP.
Almost all articles on cancer prognostic markers report statistically significant results
Eur J Cancer
 
2007
43
(
17
):
2559
2579
6.
Ioannidis
JP.
Why most published research findings are false
PLoS Med
 
2005
2
(
8
):
e124
7.
Dwan
K
Altman
DG
Arnaiz
JA
et al
Systematic review of the empirical evidence of study publication bias and outcome reporting bias
PLoS One
 
2008
3
(
8
):
e3081
8.
Ioannidis
JP.
Why most discovered true associations are inflated
Epidemiology
 
2008
19
(
5
):
640
648
9.
Chan
AW
Altman
DG.
Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors
BMJ
 
2005
330
(
7494
):
753
10.
Chan
AW
Hrobjartsson
A
Haahr
MT
Gotzsche
PC
Altman
DG.
Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles
JAMA
 
2004
291
(
20
):
2457
2465
11.
Chan
AW
Krleza-Jeric
K
Schmid
I
Altman
DG.
Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research
CMAJ
 
2004
171
(
7
):
735
740
12.
Sterne
JA
Sutton
AJ
Ioannidis
JP
et al
Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials
BMJ
 
2011
343
d4002
13.
Ioannidis
JP
Trikalinos
TA.
The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey
CMAJ
 
2007
176
(
8
):
1091
1096
14.
Lau
J
Ioannidis
JP
Terrin
N
Schmid
CH
Olkin
I.
The case of the misleading funnel plot
BMJ
 
2006
333
(
7568
):
597
600
15.
Ioannidis
JP
Trikalinos
TA.
An exploratory test for an excess of significant findings
Clin Trials
 
2007
4
(
3
):
245
253
16.
Ioannidis
JP.
Excess significance bias in the literature on brain volume abnormalities
Arch Gen Psychiatry
 
2011
68
(
8
):
773
780
17.
Kavvoura
FK
McQueen
MB
Khoury
MJ
Tanzi
RE
Bertram
L
Ioannidis
JP.
Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer’s disease
Am J Epidemiol
 
2008
168
(
8
):
855
865
18.
Spitz
MR
Bondy
ML.
The evolving discipline of molecular epidemiology of cancer
Carcinogenesis
 
2010
31
(
1
):
127
134
19.
DerSimonian
R
Laird
N.
Meta-analysis in clinical trials
Control Clin Trials
 
1986
7
(
3
):
177
188
20.
Veglia
F
Loft
S
Matullo
G
et al
DNA adducts and cancer risk in prospective studies: a pooled analysis and a meta-analysis
Carcinogenesis
 
2008
29
(
5
):
932
936
21.
Chinn
S.
A simple method for converting an odds ratio to effect size for use in meta-analysis
Stat Med
 
2000
19
(
22
):
3127
3131
22.
Cochran
WG.
The combination of estimates from different experiments
Biometrics
 
1954
10
101
129
23.
Higgins
JP
Thompson
SG.
Quantifying heterogeneity in a meta-analysis
Stat Med
 
2002
21
(
11
):
1539
1558
24.
Ioannidis
JP
Patsopoulos
NA
Evangelou
E.
Uncertainty in heterogeneity estimates in meta-analyses
BMJ
 
2007
335
(
7626
):
914
916
25.
Egger
M
Davey Smith
G
Schneider
M
Minder
C.
Bias in meta-analysis detected by a simple, graphical test
BMJ
 
1997
315
(
7109
):
629
634
26.
Harbord
RM
Egger
M
Sterne
JA.
A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints
Stat Med
 
2006
25
(
20
):
3443
3457
27.
Wang
C
Yuan
Y
Hunt
RH.
The association between Helicobacter pylori infection and early gastric cancer: a meta-analysis
Am J Gastroenterol
 
2007
102
(
8
):
1789
1798
28.
Zhuo
XL
Wang
Y
Zhuo
WL
Zhang
XY.
Possible association of Helicobacter pylori infection with laryngeal cancer risk: an evidence-based meta-analysis
Arch Med Res
 
2008
39
(
6
):
625
628
29.
Zhuo
WL
Zhu
B
Xiang
ZL
Zhuo
XL
Cai
L
Chen
ZT.
Assessment of the relationship between Helicobacter pylori and lung cancer: a meta-analysis
Arch Med Res
 
2009
40
(
5
):
406
410
30.
Donato
F
Boffetta
P
Puoti
M.
A meta-analysis of epidemiological studies on the combined effect of hepatitis B and C virus infections in causing hepatocellular carcinoma
Int J Cancer
 
1998
75
(
3
):
347
354
31.
Hobbs
CG
Sterne
JA
Bailey
M
Heyderman
RS
Birchall
MA
Thomas
SJ.
Human papillomavirus and head and neck cancer: a systematic review and meta-analysis
Clin Otolaryngol
 
2006
31
(
4
):
259
266
32.
Taylor
ML
Mainous
AG
3rd
Wells
BJ.
Prostate cancer and sexually transmitted diseases: a meta-analysis
Fam Med
 
2005
37
(
7
):
506
512
33.
Lubin
JH
Gail
MH.
On power and sample size for studying features of the relative odds of disease
Am J Epidemiol
 
1990
131
(
3
):
552
566
34.
Higgins
JP
Thompson
SG
Deeks
JJ
Altman
DG.
Measuring inconsistency in meta-analyses
BMJ
 
2003
327
(
7414
):
557
560
35.
Rinaldi
S
Cleveland
R
Norat
T
et al
Serum levels of IGF-I, IGFBP-3 and colorectal cancer risk: results from the EPIC cohort, plus a meta-analysis of prospective studies
Int J Cancer
 
2010
126
(
7
):
1702
1715
36.
Morris
JK
George
LM
Wu
T
Wald
NJ.
Insulin-like growth factors and cancer: no role in screening. Evidence from the BUPA study and meta-analysis of prospective epidemiological studies
Br J Cancer
 
2006
95
(
1
):
112
117
37.
Key
TJ
Appleby
PN
Reeves
GK
Roddam
AW.
Insulin-like growth factor 1 (IGF1), IGF binding protein 3 (IGFBP3), and breast cancer risk: pooled individual data analysis of 17 prospective studies
Lancet Oncol
 
2010
11
(
6
):
530
542
38.
Chen
B
Liu
S
Xu
W
Wang
X
Zhao
W
Wu
J.
IGF-I and IGFBP-3 and the risk of lung cancer: a meta-analysis based on nested case-control studies
J Exp Clin Cancer Res
 
2009
28
(
1
)
89
39.
Rowlands
MA
Gunnell
D
Harris
R
Vatten
LJ
Holly
JM
Martin
RM.
Circulating insulin-like growth factor peptides and prostate cancer risk: a systematic review and meta-analysis
Int J Cancer
 
2009
124
(
10
):
2416
2429
40.
Pisani
P.
Hyper-insulinaemia and cancer, meta-analyses of epidemiological studies
Arch Physiol Biochem
 
2008
114
(
1
):
63
70
41.
Roddam
AW
Allen
NE
Appleby
P
Key
TJ.
Endogenous sex hormones and prostate cancer: a collaborative analysis of 18 prospective studies
J Natl Cancer Inst
 
2008
100
(
3
):
170
183
42.
Key
T
Appleby
P
Barnes
I
Reeves
G.
Endogenous sex hormones and breast cancer in postmenopausal women: reanalysis of nine prospective studies
J Natl Cancer Inst
 
2002
94
(
8
):
606
616
43.
Barba
M
Yang
L
Schunemann
HJ
et al
Urinary estrogen metabolites and prostate cancer: a case-control study and meta-analysis
J Exp Clin Cancer Res
 
2009
28
(
1
):
135
44.
Yin
L
Grandi
N
Raum
E
Haug
U
Arndt
V
Brenner
H.
Meta-analysis: longitudinal studies of serum vitamin D and colorectal cancer risk
Aliment Pharmacol Ther
 
2009
30
(
2
):
113
125
45.
Yin
L
Raum
E
Haug
U
Arndt
V
Brenner
H.
Meta-analysis of longitudinal studies: serum vitamin D and prostate cancer risk
Cancer Epidemiol
 
2009
33
(
6
):
435
445
46.
Chen
P
Hu
P
Xie
D
Qin
Y
Wang
F
Wang
H.
Meta-analysis of vitamin D, calcium and the prevention of breast cancer
Breast Cancer Res Treat
 
2010
121
(
2
):
469
477
47.
Gallicchio
L
Boyd
K
Matanoski
G
et al
Carotenoids and the risk of developing lung cancer: a systematic review
Am J Clin Nutr
 
2008
88
(
2
):
372
383
48.
Zhuo
H
Smith
AH
Steinmaus
C.
Selenium and lung cancer: a quantitative analysis of heterogeneity in the current epidemiological literature
Cancer Epidemiol Biomarkers Prev
 
2004
13
(
5
):
771
778
49.
Saadatian-Elahi
M
Norat
T
Goudable
J
Riboli
E.
Biomarkers of dietary fatty acid intake and the risk of breast cancer: a meta-analysis
Int J Cancer
 
2004
111
(
4
):
584
591
50.
Buck
K
Zaineddin
AK
Vrieling
A
Linseisen
J
Chang-Claude
J.
Meta-analyses of lignans and enterolignans in relation to breast cancer risk
Am J Clin Nutr
 
2010
92
(
1
):
141
153
51.
Simon
JA
Chen
YH
Bent
S.
The relation of alpha-linolenic acid to the risk of prostate cancer: a systematic review and meta-analysis
Am J Clin Nutr
 
2009
89
(
5
):
1558S
1564S
52.
Larsson
SC
Giovannucci
E
Wolk
A.
Folate and risk of breast cancer: a meta-analysis
J Natl Cancer Inst
 
2007
99
(
1
):
64
76
53.
Collin
SM
Metcalfe
C
Refsum
H
et al
Circulating folate, vitamin B12, homocysteine, vitamin B12 transport proteins, and risk of prostate cancer: a case-control study, systematic review, and meta-analysis
Cancer Epidemiol Biomarkers Prev
 
2010
19
(
6
):
1632
1642
54.
Larsson
SC
Orsini
N
Wolk
A.
Vitamin B6 and risk of colorectal cancer: a meta-analysis of prospective studies
JAMA
 
2010
303
(
11
):
1077
1083
55.
Tsilidis
KK
Branchini
C
Guallar
E
Helzlsouer
KJ
Erlinger
TP
Platz
EA.
C-reactive protein and colorectal cancer risk: a systematic review of prospective studies
Int J Cancer
 
2008
123
(
5
):
1133
1140
56.
Heikkila
K
Harris
R
Lowe
G
et al
Associations of circulating C-reactive protein and interleukin-6 with cancer risk: findings from two prospective cohorts and a meta-analysis
Cancer Causes Control
 
2009
20
(
1
):
15
26
57.
Huang
JQ
Zheng
GF
Sumanac
K
Irvine
EJ
Hunt
RH.
Meta-analysis of the relationship between cagA seropositivity and gastric cancer
Gastroenterology
 
2003
125
(
6
):
1636
1644
58.
Zhao
YS
Wang
F
Chang
D
Han
B
You
DY.
Meta-analysis of different test indicators: Helicobacter pylori infection and the risk of colorectal cancer
Int J Colorectal Dis
 
2008
23
(
9
):
875
882
59.
Islami
F
Kamangar
F.
Helicobacter pylori and esophageal cancer risk: a meta-analysis
Cancer Prev Res (Phila)
 
2008
1
(
5
):
329
338
60.
Gutierrez
J
Jimenez
A
de Dios Luna
J
Soto
MJ
Sorlozano
A.
Meta-analysis of studies analyzing the relationship between bladder cancer and infection by human papillomavirus
J Urol
 
2006
176
(
6, pt 1
):
2474
2481
61.
Mandelblatt
JS
Kanetsky
P
Eggert
L
Gold
K.
Is HIV infection a cofactor for cervical squamous cell neoplasia?
Cancer Epidemiol Biomarkers Prev
 
1999
8
(
1
):
97
106
62.
Zhang
ZF
Begg
CB.
Is Trichomonas vaginalis a cause of cervical neoplasia? Results from a combined analysis of 24 studies
Int J Epidemiol
 
1994
23
(
4
):
682
690
63.
Khanjani
N
Hoving
JL
Forbes
AB
Sim
MR.
Systematic review and meta-analysis of cyclodiene insecticides and breast cancer
J Environ Sci Health C Environ Carcinog Ecotoxicol Rev
 
2007
25
(
1
):
23
52
64.
Lopez-Cervantes
M
Torres-Sanchez
L
Tobias
A
Lopez-Carrillo
L.
Dichlorodiphenyldichloroethane burden and breast cancer risk: a meta-analysis of the epidemiologic evidence
Environ Health Perspect
 
2004
112
(
2
):
207
214
65.
Ioannidis
JP.
Interpretation of tests of heterogeneity and bias in meta-analysis
J Eval Clin Pract
 
2008
14
(
5
):
951
957
66.
Panagiotou
OA
Ioannidis
JP.
Primary study authors and methodologists differ in their interpretations of heterogeneous meta-analysis results
J Clin Epidemiol
 
2012
65
(
7
):
740
747
67.
Roddam
AW
Allen
NE
Appleby
P
et al
Insulin-like growth factors, their binding proteins, and prostate cancer risk: analysis of individual patient data from 12 prospective studies
Ann Intern Med
 
2008
149
(
7
):
461
471
68.
Ioannidis
JP
Tzoulaki
I.
Minimal and null predictive effects for the most popular blood biomarkers of cardiovascular disease
Circ Res
 
2012
110
(
5
):
658
662
69.
Zhang
SM
Buring
JE
Lee
IM
Cook
NR
Ridker
PM.
C-reactive protein levels are not associated with increased risk for colorectal cancer in women
Ann Intern Med
 
2005
142
(
6
):
425
432
70.
Bertagnolli
MM
Eagle
CJ
Zauber
AG
et al
Celecoxib for the prevention of sporadic colorectal adenomas
N Engl J Med
 
2006
355
(
9
):
873
884
71.
Corpet
DE
Pierre
F.
Point: from animal models to prevention of colon cancer. Systematic review of chemoprevention in min mice and choice of the model system
Cancer Epidemiol Biomarkers Prev
 
2003
12
(
5
):
391
400
72.
Itzkowitz
SH
Yio
X.
Inflammation and cancer IV. Colorectal cancer in inflammatory bowel disease: the role of inflammation
Am J Physiol Gastrointest Liver Physiol
 
2004
287
(
1
):
G7
17
73.
Heikkila
K
Silander
K
Salomaa
V
et al
C-reactive protein-associated genetic variants and cancer risk: findings from FINRISK 1992, FINRISK 1997 and Health 2000 studies
Eur J Cancer
 
2011
47
(
3
):
404
412
74.
Poole
EM
Bigler
J
Whitton
J
Sibert
JG
Potter
JD
Ulrich
CM.
C-reactive protein genotypes and haplotypes, polymorphisms in NSAID-metabolizing enzymes, and risk of colorectal polyps
Pharmacogenet Genomics
 
2009
19
(
2
):
113
120
75.
Shinohara
RT
Frangakis
CE
Platz
EA
Tsilidis
K.
Designs combining instrumental variables with case–control: estimating principal strata causal effects
Int J Biostat
 
2012
;
8
(
1
):
1
21
76.
Boyle
P
Boffetta
P
Autier
P.
Diet, nutrition and cancer: public, media and scientific confusion
Ann Oncol
 
2008
19
(
10
):
1665
1667
77.
Ioannidis
JP
Castaldi
P
Evangelou
E.
A compendium of genome-wide associations for cancer: critical synopsis and reappraisal
J Natl Cancer Inst
 
2010
102
(
12
):
846
858
78.
Altman
DG
Simera
I
Hoey
J
Moher
D
Schulz
K.
EQUATOR: reporting guidelines for health research
Lancet
 
2008
371
(
9619
):
1149
1150
79.
von Elm
E
Altman
DG
Egger
M
Pocock
SJ
Gotzsche
PC
Vandenbroucke
JP.
The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies
PLoS Med
 
2007
4
(
10
):
e296
80.
Gallo
V
Egger
M
McCormack
V
et al
STrengthening the Reporting of OBservational studies in Epidemiology–Molecular Epidemiology (STROBE-ME): an extension of the STROBE Statement
PLoS Med
 
2011
8
(
10
):
e1001117