ABSTRACT

Background: Nutritional epidemiology is a highly prolific field. Debates on associations of nutrients with disease risk are common in the literature and attract attention in public media.

Objective: We aimed to examine the conclusions, statistical significance, and reproducibility in the literature on associations between specific foods and cancer risk.

Design: We selected 50 common ingredients from random recipes in a cookbook. PubMed queries identified recent studies that evaluated the relation of each ingredient to cancer risk. Information regarding author conclusions and relevant effect estimates were extracted. When >10 articles were found, we focused on the 10 most recent articles.

Results: Forty ingredients (80%) had articles reporting on their cancer risk. Of 264 single-study assessments, 191 (72%) concluded that the tested food was associated with an increased (n = 103) or a decreased (n = 88) risk; 75% of the risk estimates had weak (0.05 > P ≥ 0.001) or no statistical (P > 0.05) significance. Statistically significant results were more likely than nonsignificant findings to be published in the study abstract than in only the full text (P < 0.0001). Meta-analyses (n = 36) presented more conservative results; only 13 (26%) reported an increased (n = 4) or a decreased (n = 9) risk (6 had more than weak statistical support). The median RRs (IQRs) for studies that concluded an increased or a decreased risk were 2.20 (1.60, 3.44) and 0.52 (0.39, 0.66), respectively. The RRs from the meta-analyses were on average null (median: 0.96; IQR: 0.85, 1.10).

Conclusions: Associations with cancer risk or benefits have been claimed for most food ingredients. Many single studies highlight implausibly large effects, even though evidence is weak. Effect sizes shrink in meta-analyses.

See corresponding editorial on page 5

INTRODUCTION

Thousands of nutritional epidemiology studies are conducted and published annually in the quest to identify dietary factors that affect major health outcomes, including cancer risk (1). These studies influence dietary guidelines and at times public health policy (2) and receive wide attention in news media (3). However, interpretation of the multitude of studies in this area is difficult (1, 4) and is critically dependent on accurate assessments of the credibility of published data. Randomized trials have repeatedly failed to find treatment effects for nutrients in which observational studies had previously proposed strong associations (58), and such discrepancies in the evidence have fueled hot debates (912) rife with emotional and sensational rhetoric that can subject the general public to increased anxiety and contradictory advice (13, 14). One wonders whether this highly charged atmosphere and intensive testing of food-related associations may create a plethora of false-positive findings (15) and questionable research practices, especially when the research is highly exploratory, the analyses and protocols are not preregistered, and the findings are selectively reported. It was previously shown in a variety of other fields that “negative” results are either less likely to be published (1621) or misleadingly interpreted (19, 22). Studies may spuriously highlight results that barely achieve statistical significance (15, 23) or report effect estimates that either are overblown (24, 25) or cannot be replicated in other studies (24, 26, 27).

To better evaluate the extent to which these factors may affect studies investigating dietary risk factors for malignancy, we surveyed recently published studies and meta-analyses that addressed the potential association between a large random sample of food ingredients and cancer risk of any type of malignancy.

SUBJECTS AND METHODS

Random ingredient selection

We selected ingredients from random recipes included in The Boston Cooking-School Cook Book (28), available online at http://archive.org/details/bostoncookingsch00farmrich. A copy of the book was obtained in portable document format and viewed by using Skim version 1.3.17 (http://skim-app.sourceforge.net). The recipes (see Supplementary Table 1 under “Supplemental data” in the online issue) were selected at random by generating random numbers corresponding to cookbook page numbers using Microsoft Excel (Microsoft Corporation). The first recipe on each page selected was used; the page was passed over if there was no recipe. All unique ingredients within selected recipes were chosen for analysis. This process was repeated until 50 unique ingredients were selected.

Study searches

We performed literature searches using PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) for studies investigating the relation of the selected ingredients to cancer risk using the following search terms: “risk factors”[MeSH Terms] AND “cancer”[sb] AND the singular and/or plural forms of the selected ingredient restricted to the title or abstract. Titles and abstracts of retrieved articles were then reviewed to select the 10 most recently published cohort or case-control studies investigating the relation between the ingredients and cancer risk. Ingredient derivatives and components (eg, orange juice) and ingredients analyzed as part of a broader diet specifically mentioned as a component of that diet were considered. Whenever <10 studies were retrieved for a given article, an attempt was made to obtain additional studies by searching for ingredient synonyms (eg, mutton for lamb, thymol for thyme), using articles explicitly referred to by the previously retrieved material, and broadening the original searches (searching simply by ingredient name AND “cancer”).

Searches for relevant meta-analyses were performed in the same manner as for single studies, but adding the PubMed “meta-analysis” filter. For each ingredient, the most recent meta-analysis investigating the relation with a particular cancer was selected for analysis. In 2 meta-analyses that separately investigated associations with more than one different type or subtype of cancer, only the first type mentioned in the abstract was considered.

Data extraction

From each retrieved study or meta-analysis, data were extracted from the abstract regarding the ingredient and cancer type, authors’ conclusions regarding the risk of malignancy (increased risk, decreased risk, no effect, or borderline/other effect), the respective RR estimate (typically the HR for cohort studies or OR for case-control studies), and the exposure contrast to which it pertained, its 95% CI, and P value. When available, we used P values that were explicitly reported, including P values for trends. Standard reporting of these P values did not adjust for potential multiple testing. When not available, we estimated the P values from the reported point estimates and CIs of the effects, assuming no testing for trends across multiple different exposure levels. Whenever the effect estimate and P value were not available and could not be approximated from data available in the article abstract, the full text was then retrieved and examined in an attempt to obtain this information.

When multiple potentially relevant effect estimates were available from a given study, the following criteria were applied in order of priority: the estimates most specific for the ingredient, the most broadly defined definition of malignancy (eg, colorectal compared with colon or rectal cancers), the most general patient subgroup, the most adjusted estimate, and that corresponding to the most extreme reported exposure contrast (ie, highest compared with lowest level of exposure). This allowed us to better compare effect estimates across ingredients and from individual studies with meta-analyses, because it is common practice in the literature to report comparisons of extreme exposure levels (22). In the case of estimates reported for multiple malignancies or patient subgroups of similar magnitude, the estimate or conclusion referred to first in the abstract was chosen for further analysis. If no estimate or conclusion was specifically referred to in the abstract, the same criteria described above were applied to the full text.

Whenever available, effect estimates limited to the analysis of prospective cohort data were also separately extracted from the retrieved meta-analyses. One author performed data extraction (JDS) and discussed any uncertainties with the other author (JPAI) for arbitration.

Statistical analyses

We aimed to examine whether results and their interpretations were generally more conservative in the meta-analyses than in the single studies and whether there were any hints of biases in the overall evidence. We summarized and compared data from the retrieved single studies and from the meta-analyses on the conclusions of the authors and on whether these were congruent with the presence of nominal statistical significance (P < 0.05) without adjustment for potential multiple testing. We also assessed the types and consistency of exposure contrasts used. Finally, we evaluated the distribution of P values (and corresponding standardized, z scores from the normal distribution) to examine whether there were any peaks of frequently reported P values and troughs of infrequently reported P values and the distribution of RRs to examine the median and IQR of reported effect sizes, to help highlight trends in the literature and potential biases. For P values (z scores), we also examined whether the results listed in the abstract differed from those listed only in the full text using a chi-square test.

P values >0.05 are considered not nominally significant, whereas P values between 0.05 and 0.001 are considered to offer weak support, as previously proposed for epidemiologic analyses (29). In Bayesian terms, such P values generally do not correspond to very strong support, regardless of prior assumptions (23, 30).

The main analyses evaluated all retrieved data from single studies and from meta-analyses. Sensitivity analyses focused on comparisons of meta-analyses against single studies on the same ingredient-cancer pairs (excluding single studies on associations for which no meta-analysis had been found and meta-analyses on associations for which no single study had been among the 10 more recent captured studies) and assessment of meta-analysis data only from prospective cohort studies. JMP version 9.0 (SAS Institute) was used to generate summary statistics, calculate z scores from the normal distribution, perform chi-square analysis, and draft figures.

RESULTS

Ingredients studied in relation to cancer

At least one study was identified for 80% (n = 40) of the ingredients selected from random recipes that investigated the relation to cancer risk: veal, salt, pepper spice, flour, egg, bread, pork, butter, tomato, lemon, duck, onion, celery, carrot, parsley, mace, sherry, olive, mushroom, tripe, milk, cheese, coffee, bacon, sugar, lobster, potato, beef, lamb, mustard, nuts, wine, peas, corn, cinnamon, cayenne, orange, tea, rum, and raisin. These ingredients studied include many of the most common sources of vitamins and nutrients in the United States diet (31, 32); in contrast, the 10 ingredients for which a relevant cancer risk study was not identified were generally more obscure: bay leaf, cloves, thyme, vanilla, hickory, molasses, almonds, baking soda, ginger, and terrapin. Of the 40 ingredients for which at least one study was identified, 50% (n = 20) had ≥10 studies, 15% (n = 6) had 6–10 studies, and 35% (n = 14) had 1–5 studies. The identified studies provided 264 relevant effect estimates described in 216 publications dating from February 1976 to December 2011 (see Supplementary Table 1 under “Supplemental data” in the online issue). One hundred fifty-four (71%) and 184 (85%) articles were published in or after 2005 and 2000, respectively.

Author conclusions and reported effect estimates

Author conclusions reported in the abstract and manuscript text and relevant effect estimates are summarized in Table 1. Thirty-nine percent of studies concluded that the studied ingredient conferred an increased risk of malignancy; 33% concluded that there was a decreased risk, 5% concluded that there was a borderline statistically significant effect, and 23% concluded that there was no evidence of a clearly increased or decreased risk. Thirty-six of the 40 ingredients for which at least one study was identified had at least one study concluding increased or decreased risk of malignancy: veal, salt, pepper spice, egg, bread, pork, butter, tomato, lemon, duck, onion, celery, carrot, parsley, mace, olive, mushroom, tripe, milk, cheese, coffee, bacon, sugar, lobster, potato, beef, lamb, mustard, nuts, wine, peas, corn, cayenne, orange, tea, and rum.

TABLE 1

Author conclusions from retrieved articles and meta-analyses in relation to the statistical significance of the associations and effect estimates

Author conclusion n (%)1 Statistical significance of association2 Median effect estimate (IQR) Median P value (IQR) 
Individual studies     
 Increased risk 103 (39) Nonsignificant: 13 (13%) 2.20 (1.60, 3.44) 0.008 (0.001, 0.030) 
Weak: 64 (62%)   
Strong: 25 (24%)   
Missing: 1 (1%)   
 Decreased risk 88 (33) Nonsignificant: 7 (8%) 0.52 (0.39, 0.66) 0.010 (0.002, 0.030) 
Weak: 60 (68%)   
Strong: 17 (19%)   
Missing: 4 (5%)   
 No effect 61 (23) Nonsignificant: 58 (95%) 1.03 (0.91, 1.14) 0.510 (0.294, 0.701) 
Weak: 1 (2%)   
Missing: 2 (3%)   
 Borderline effect 12 (5) Nonsignificant: 11 (92%) 0.80 (0.60, 1.50) 0.075 (0.060, 0.275) 
Strong: 1 (8%)   
Meta-analyses     
 Increased risk 4 (11) Nonsignificant: 1 (25%) 1.33 (1.20, 1.69) 0.017 (0.003, 0.119) 
Weak: 2 (50%)   
Strong: 1 (25%)   
 Decreased risk 9 (25) Weak: 4 (44%) 0.68 (0.61, 0.81) 0.0005 (0.0001, 0.0133) 
Strong: 5 (56%)   
 No effect 13 (36) Nonsignificant: 11 (85%) 1.07 (0.98, 1.21) 0.38 (0.11, 0.55) 
Weak: 2 (15%)   
Borderline or complex3 effect 10 (28) Nonsignificant: 6 (60%) 0.844 (0.72, 0.99) 0.0614(0.044, 0.142) 
Weak: 2 (20%)   
Strong: 2 (20%)   
Author conclusion n (%)1 Statistical significance of association2 Median effect estimate (IQR) Median P value (IQR) 
Individual studies     
 Increased risk 103 (39) Nonsignificant: 13 (13%) 2.20 (1.60, 3.44) 0.008 (0.001, 0.030) 
Weak: 64 (62%)   
Strong: 25 (24%)   
Missing: 1 (1%)   
 Decreased risk 88 (33) Nonsignificant: 7 (8%) 0.52 (0.39, 0.66) 0.010 (0.002, 0.030) 
Weak: 60 (68%)   
Strong: 17 (19%)   
Missing: 4 (5%)   
 No effect 61 (23) Nonsignificant: 58 (95%) 1.03 (0.91, 1.14) 0.510 (0.294, 0.701) 
Weak: 1 (2%)   
Missing: 2 (3%)   
 Borderline effect 12 (5) Nonsignificant: 11 (92%) 0.80 (0.60, 1.50) 0.075 (0.060, 0.275) 
Strong: 1 (8%)   
Meta-analyses     
 Increased risk 4 (11) Nonsignificant: 1 (25%) 1.33 (1.20, 1.69) 0.017 (0.003, 0.119) 
Weak: 2 (50%)   
Strong: 1 (25%)   
 Decreased risk 9 (25) Weak: 4 (44%) 0.68 (0.61, 0.81) 0.0005 (0.0001, 0.0133) 
Strong: 5 (56%)   
 No effect 13 (36) Nonsignificant: 11 (85%) 1.07 (0.98, 1.21) 0.38 (0.11, 0.55) 
Weak: 2 (15%)   
Borderline or complex3 effect 10 (28) Nonsignificant: 6 (60%) 0.844 (0.72, 0.99) 0.0614(0.044, 0.142) 
Weak: 2 (20%)   
Strong: 2 (20%)   
1

n = 264 for individual studies and n = 36 for meta-analyses. Among the individual studies, effect estimates were missing from 9 studies, and specific P values were missing from 7 studies.

2

Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001); P values with inequalities were imputed as equal to the reported threshold when median values were calculated (eg, P < 0.001 was considered a strong association but was used as P = 0.001 to calculate medians).

3

J-shaped or dependent on study type.

4

Borderline effects only.

TABLE 1

Author conclusions from retrieved articles and meta-analyses in relation to the statistical significance of the associations and effect estimates

Author conclusion n (%)1 Statistical significance of association2 Median effect estimate (IQR) Median P value (IQR) 
Individual studies     
 Increased risk 103 (39) Nonsignificant: 13 (13%) 2.20 (1.60, 3.44) 0.008 (0.001, 0.030) 
Weak: 64 (62%)   
Strong: 25 (24%)   
Missing: 1 (1%)   
 Decreased risk 88 (33) Nonsignificant: 7 (8%) 0.52 (0.39, 0.66) 0.010 (0.002, 0.030) 
Weak: 60 (68%)   
Strong: 17 (19%)   
Missing: 4 (5%)   
 No effect 61 (23) Nonsignificant: 58 (95%) 1.03 (0.91, 1.14) 0.510 (0.294, 0.701) 
Weak: 1 (2%)   
Missing: 2 (3%)   
 Borderline effect 12 (5) Nonsignificant: 11 (92%) 0.80 (0.60, 1.50) 0.075 (0.060, 0.275) 
Strong: 1 (8%)   
Meta-analyses     
 Increased risk 4 (11) Nonsignificant: 1 (25%) 1.33 (1.20, 1.69) 0.017 (0.003, 0.119) 
Weak: 2 (50%)   
Strong: 1 (25%)   
 Decreased risk 9 (25) Weak: 4 (44%) 0.68 (0.61, 0.81) 0.0005 (0.0001, 0.0133) 
Strong: 5 (56%)   
 No effect 13 (36) Nonsignificant: 11 (85%) 1.07 (0.98, 1.21) 0.38 (0.11, 0.55) 
Weak: 2 (15%)   
Borderline or complex3 effect 10 (28) Nonsignificant: 6 (60%) 0.844 (0.72, 0.99) 0.0614(0.044, 0.142) 
Weak: 2 (20%)   
Strong: 2 (20%)   
Author conclusion n (%)1 Statistical significance of association2 Median effect estimate (IQR) Median P value (IQR) 
Individual studies     
 Increased risk 103 (39) Nonsignificant: 13 (13%) 2.20 (1.60, 3.44) 0.008 (0.001, 0.030) 
Weak: 64 (62%)   
Strong: 25 (24%)   
Missing: 1 (1%)   
 Decreased risk 88 (33) Nonsignificant: 7 (8%) 0.52 (0.39, 0.66) 0.010 (0.002, 0.030) 
Weak: 60 (68%)   
Strong: 17 (19%)   
Missing: 4 (5%)   
 No effect 61 (23) Nonsignificant: 58 (95%) 1.03 (0.91, 1.14) 0.510 (0.294, 0.701) 
Weak: 1 (2%)   
Missing: 2 (3%)   
 Borderline effect 12 (5) Nonsignificant: 11 (92%) 0.80 (0.60, 1.50) 0.075 (0.060, 0.275) 
Strong: 1 (8%)   
Meta-analyses     
 Increased risk 4 (11) Nonsignificant: 1 (25%) 1.33 (1.20, 1.69) 0.017 (0.003, 0.119) 
Weak: 2 (50%)   
Strong: 1 (25%)   
 Decreased risk 9 (25) Weak: 4 (44%) 0.68 (0.61, 0.81) 0.0005 (0.0001, 0.0133) 
Strong: 5 (56%)   
 No effect 13 (36) Nonsignificant: 11 (85%) 1.07 (0.98, 1.21) 0.38 (0.11, 0.55) 
Weak: 2 (15%)   
Borderline or complex3 effect 10 (28) Nonsignificant: 6 (60%) 0.844 (0.72, 0.99) 0.0614(0.044, 0.142) 
Weak: 2 (20%)   
Strong: 2 (20%)   
1

n = 264 for individual studies and n = 36 for meta-analyses. Among the individual studies, effect estimates were missing from 9 studies, and specific P values were missing from 7 studies.

2

Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001); P values with inequalities were imputed as equal to the reported threshold when median values were calculated (eg, P < 0.001 was considered a strong association but was used as P = 0.001 to calculate medians).

3

J-shaped or dependent on study type.

4

Borderline effects only.

The statistical support of the effects was weak (0.001 ≤ P < 0.05) or even nonnominally significant (P > 0.05) in 80% of the studies. It was also weak or nonnominally significant, even in 75% of the studies that claimed an increased risk and in 76% of the studies that claimed a decreased risk (Table 1).

RRs compared the lowest with the highest categories of consumption in 172 (65%) estimates. There was wide variability in how exposure contrasts were defined: highest compared with lowest tertiles, quartiles, or quintiles were compared in 32, 36, and 22 studies, respectively. However, other studies used more arbitrary definitions for extremes, eg, ≥5 cups/d compared with <1 cup/d (33), ≥5 servings/wk compared with <1 serving/wk (34, 35), ≥30 g/d compared with 0.1–4.9 g/d (36), ≥43 drinks/wk compared with zero drinks (37), and “often” compared with “never” (38). Contrasts used for the remainder of the estimates were compared with no consumption (n = 36, 13%) or intermediate or incremental levels of consumption (n = 45, 17%) or could not be determined (n = 11, 4%). The median RRs were 2.20 (IQR: 1.60, 3.44) and 0.52 IQR: (0.39, 0.66) in studies that concluded increased and decreased risks, respectively.

The effect estimates are shown in Figure 1 by malignancy type or by ingredient for the 20 ingredients for which ≥10 articles were identified. Gastrointestinal malignancies were the most commonly studied (45%), followed by genitourinary (14%), breast (14%), head and neck (9%), lung (5%), and gynecologic (5%) malignancies.

FIGURE 1.

Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with ≥10 studies are shown. Three outliers are not shown (effect estimates >10).

FIGURE 1.

Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with ≥10 studies are shown. Three outliers are not shown (effect estimates >10).

The distribution of standardized (z) scores associated with P values was bimodal, with peaks corresponding to nominally statistically significant results and a trough in the middle corresponding to the sparse nonsignificant results (Figure 2, left panel). The bimodal peaks and middle trough pattern were even more prominent for results reported in the abstracts: 62% of the nominally statistically significant effect estimates were reported in abstracts, whereas most (70%) of the nonsignificant results appeared only in the full text and not in the abstracts (P < 0.0001).

FIGURE 2.

Standardized (z) scores associated with effect estimates for ingredients from individual studies (left) and meta-analyses (right). Scores available from article abstracts are shown in black, whereas those found in the full text are in gray. For reference, a P value of 0.05 has a z score of −1.96 for an association with a decreased cancer risk and 1.96 for an association with an increased cancer risk.

FIGURE 2.

Standardized (z) scores associated with effect estimates for ingredients from individual studies (left) and meta-analyses (right). Scores available from article abstracts are shown in black, whereas those found in the full text are in gray. For reference, a P value of 0.05 has a z score of −1.96 for an association with a decreased cancer risk and 1.96 for an association with an increased cancer risk.

Meta-analyses

Thirty-six relevant effect estimates were obtained from meta-analyses (see Supplementary Table 2 under “Supplemental data” in the online issue). Author conclusions and the respective effect estimates are summarized in Table 1.

Thirty-three (92%) of the 36 estimates pertained to comparisons of the lowest with the highest levels of consumption, but most of these meta-analyses combined studies that had different exposure contrasts. For example, one meta-analysis (39) combined studies that compared the highest with the lowest consumption and others that compared the fourth with the first quartile. Only 13 meta-analysis estimates were obtained by combining data on the same exact contrast across all studies.

Thirteen meta-analyses concluded that there was an increased (n = 4) or decreased (n = 9) risk of malignancy, respectively, and 6 of them had more than weak statistical support. The remainder of studies concluded that there was no effect (36%, n = 13) or an effect that was borderline (n = 6), potentially J-shaped (n = 2), seen in case-control but not in cohort studies (n = 1), or seen in cohort but not in case-control studies (n = 1).

The distribution of standardized (z) scores associated with P values was bimodal also for meta-analyses (Figure 2, right panel), with a trough in the middle for z values −1 to 0.5. However, in contrast with single studies, the peaks corresponded to nonstatistically significant results or were of borderline significance. Only 6 estimates came from the full text and not from the abstract.

As shown in Figure 3, the distribution of the effect sizes in the meta-analyses appeared normal, centered around the null, and generally showed small effects on both sides of the distribution (median RR: 0.96; IQR: 0.85, 1.10). Median effect estimates were 1.33 (IQR: 1.20, 1.69) for studies that concluded an increased risk and 0.68 (IQR: 0.61, 0.81) for studies that concluded a decreased risk; these estimates were, in general, more conservative than those predicted by the individual studies (Figure 3, lower panel).

FIGURE 3.

Effect estimates from the meta-analyses (n = 36) and individual studies (n = 255; effect estimates were missing in 9 of the 264 studies).

FIGURE 3.

Effect estimates from the meta-analyses (n = 36) and individual studies (n = 255; effect estimates were missing in 9 of the 264 studies).

Sensitivity analyses

When ingredient-malignancy category pairs were both an individual study and meta-analysis and were available and compared, the median effect estimates for studies that concluded an increased or decreased risk were 2.69 (95% CI: 1.60, 4.85) and 0.51 (0.36, 0.64) for individual studies compared with 1.33 (95% CI: 1.20, 1.69) and 0.66 (0.56, 0.82) for meta-analyses, respectively. In 43 of the 64 pairwise comparisons between results from individual studies and meta-analyses that reported on the same ingredient-malignancy category (eg, coffee with gastrointestinal malignancies; Table 2), the effect was closer to the null (RR: 1.00) in the meta-analysis than in the respective original study (P = 0.009; McNemar's test for paired comparison).

TABLE 2

Strength and direction of associations obtained from individual studies and meta-analyses based on reported effect estimates and reported or calculated P values

 Ingredient-malignancy pairs with data from both individual studies and meta-analyses   
Association1 Individual studies (n = 50) Meta-analyses (n = 30) All meta-analyses (n = 36) Meta-analyses using data from prospective cohort studies (n = 26) 
 n (%n (%n (%n (%
Nonsignificant association 22 (44) 17 (57) 18 (50) 19 (73) 
Weakly increased risk 9 (18) 4 (13) 4 (11) 2 (8) 
Weakly decreased risk 11 (22) 5 (17) 6 (17) 3 (12) 
Strongly increased risk 2 (4) 1 (3) 2 (6) 
Strongly decreased risk 6 (12) 3 (10) 6 (17) 2 (8) 
 Ingredient-malignancy pairs with data from both individual studies and meta-analyses   
Association1 Individual studies (n = 50) Meta-analyses (n = 30) All meta-analyses (n = 36) Meta-analyses using data from prospective cohort studies (n = 26) 
 n (%n (%n (%n (%
Nonsignificant association 22 (44) 17 (57) 18 (50) 19 (73) 
Weakly increased risk 9 (18) 4 (13) 4 (11) 2 (8) 
Weakly decreased risk 11 (22) 5 (17) 6 (17) 3 (12) 
Strongly increased risk 2 (4) 1 (3) 2 (6) 
Strongly decreased risk 6 (12) 3 (10) 6 (17) 2 (8) 
1

Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001).

TABLE 2

Strength and direction of associations obtained from individual studies and meta-analyses based on reported effect estimates and reported or calculated P values

 Ingredient-malignancy pairs with data from both individual studies and meta-analyses   
Association1 Individual studies (n = 50) Meta-analyses (n = 30) All meta-analyses (n = 36) Meta-analyses using data from prospective cohort studies (n = 26) 
 n (%n (%n (%n (%
Nonsignificant association 22 (44) 17 (57) 18 (50) 19 (73) 
Weakly increased risk 9 (18) 4 (13) 4 (11) 2 (8) 
Weakly decreased risk 11 (22) 5 (17) 6 (17) 3 (12) 
Strongly increased risk 2 (4) 1 (3) 2 (6) 
Strongly decreased risk 6 (12) 3 (10) 6 (17) 2 (8) 
 Ingredient-malignancy pairs with data from both individual studies and meta-analyses   
Association1 Individual studies (n = 50) Meta-analyses (n = 30) All meta-analyses (n = 36) Meta-analyses using data from prospective cohort studies (n = 26) 
 n (%n (%n (%n (%
Nonsignificant association 22 (44) 17 (57) 18 (50) 19 (73) 
Weakly increased risk 9 (18) 4 (13) 4 (11) 2 (8) 
Weakly decreased risk 11 (22) 5 (17) 6 (17) 3 (12) 
Strongly increased risk 2 (4) 1 (3) 2 (6) 
Strongly decreased risk 6 (12) 3 (10) 6 (17) 2 (8) 
1

Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001).

The percentage of studies with a nonnominally significant result (P ≥ 0.05) was greater among the meta-analyses (57%) than among the individual studies (44%), although this difference was not statistically significant (P = 0.356; Fisher's exact test). However, a significantly greater percentage of studies with nonnominally significant results (73%) was found when effect estimates were limited to meta-analyses of prospective cohorts obtained from the retrieved studies as compared with individual studies (P = 0.028; Table 2).

DISCUSSION

In this survey of published literature regarding the relation between food ingredients and malignancy, we found that 80% of ingredients from randomly selected recipes had been studied in relation to malignancy and the large majority of these studies were interpreted by their authors as offering evidence for increased or decreased risk of cancer. However, the vast majority of these claims were based on weak statistical evidence. Many statistically insignificant “negative” and weak results were relegated to the full text rather than to the study abstract. Individual studies reported larger effect sizes than did the meta-analyses. There was no standardized, consistent selection of exposure contrasts for the reported risks. A minority of associations had more than weak support in meta-analyses, and summary effects in meta-analyses were consistent with a null average and relatively limited variance.

We should acknowledge that our searches for eligible studies were not exhaustive. Covering the entire nutritional epidemiology literature would be impossible. However, our search approach was representative of the studies that might be encountered by a researcher, physician, patient, or consumer embarking on a review of this literature. We preferentially analyzed the first effect estimate mentioned in individual studies and meta-analyses, which, although not random, are likely to be the first encountered by the reader. Moreover, application of this rule allowed for consistency and avoidance of subjectivity in selecting effect estimates. Given that we examined the effects most specific to individual ingredients, we did not examine more complex analyses involving nutritional pathways, biochemical nutritional measurements, and metabolites or combinations of ingredients. However, we hypothesize that similar patterns of research conduct and reporting also apply to these other aspects of nutritional epidemiology. Moreover, most ingredients for which a human study was not identified by our search have been studied in animal cancer models, eg, eugenol from bay leaf (40), cloves (41), terpenoids from thyme (42), vanillin from vanilla (43), ginger (44), and almonds (45).

We found great variability in the types of exposure contrasts. Moreover, the meta-analyses were often forced to merge data from studies that had used different exposure contrasts. We found that, if anything, a greater percentage of effect estimates from meta-analyses (92%) than from individual studies (65%) contrasted the lowest compared with the highest levels of consumption. This suggests that the more extreme risk estimates reported in the single studies may not represent simply a choice of more extreme exposure contrasts. However, the lack of standardization in definitions and choice of exposure contrasts allows widely different (and potentially biased) estimates of effect sizes to be reported, at the discretion of the investigators. We focused on the more extreme risk estimates, because previous evidence suggests that these are selectively used to present data when effect sizes are more modest (22). Whereas this is probably understood by expert scientists, the larger reported risks may be misleading to the nonmetholodogist reader or general public (22).

Nutritional epidemiology is a valuable field that can identify potentially modifiable risk factors related to diet. However, the credibility of studies in this and other fields is subject to publication and other selective outcome and analysis reporting biases (1621), whenever the pressure to publish (46) fosters a climate in which “negative” results are undervalued and not reported (47). Ingredients viewed as “unhealthy” may be demonized, leading to subsequent biases in the design, execution and reporting of studies (48). Some studies that narrowly meet criteria for statistical significance may represent spurious results (15), especially when there is large flexibility in analyses, selection of contrasts, and reporting. When results are overinterpreted, the emerging literature can skew perspectives (49) and potentially obfuscate other truly significant findings. This issue may be especially problematic in areas such as cancer epidemiology, where randomized trials may be exceedingly difficult and expensive to conduct (50, 51); therefore, more reliance is placed on observational studies, but with a considerable risk of trusting false-positive or inflated results (52).

Some meta-analyses may yield more reliable results, synthesize the available evidence, and control for potential confounding factors (53); however, even these analyses can be biased or misinterpreted (21, 54). Our findings support previous evidence suggesting that effect sizes are likely to trend closer to the null as more data are accumulated (55). However, fragmented efforts from multiple teams may be difficult to integrate with meta-analyses after the fact. To enhance further progress, the field of nutritional epidemiology may need to consider practices such as advanced registration. When research is exploratory, protocols and analyses are modified in iterative fashion, and these changes should be documented. If protocols are not registered up front, one should consider, at a minimum, upfront registration of the data sets that are used for such analyses (what variables are available for analyses) (56). This would allow mapping the space of what analyses could have been performed, regardless of what was eventually reported. Prospective consortium meta-analyses addressing all nutrients and foods in the same project and using standardized exposure contrasts can also address the heterogeneity of how studies are conducted or how the results are analyzed. In addition, comprehensive documentation of analyses and reporting (49) rather than testing and reporting one association at a time (50) would be of value. Such approaches, in combination, may facilitate a more accurate interpretation of the evidence linking foods to the risk of developing cancer.

The authors’ responsibilities were as follows—JDS and JPAI (guarantor): designed the study, interpreted the data and the analyses, wrote the manuscript, and approved the final version; and JDS: performed the data extraction and the statistical analyses with help and supervision from JPAI. No conflicts of interest were reported by either author.

FOOTNOTES

2

There was no funding for this study.

REFERENCES

1.

Kushi
LH
,
Doyle
C
,
Mccullough
M
,
Rock
CL
,
Demark-Wahnefried
W
,
Bandera
EV
,
Gapstur
S
,
Patel
AV
,
Andrews
K
,
Gansler
T
et al. 
American Cancer Society guidelines on nutrition and physical activity for cancer prevention: reducing the risk of cancer with healthy food choices and physical activity
.
CA Cancer J Clin
2012
;
62
:
30
67
.

2.

Brownell
KD
,
Warner
KE
.
The perils of ignoring history: Big Tobacco played dirty and millions died. How similar is Big Food?
Milbank Q
2009
;
87
:
259
94
.

3.

Bartlett
C
,
Sterne
J
,
Egger
M
.
What is newsworthy? Longitudinal study of the reporting of medical research in two British newspapers
.
BMJ
2002
;
325
:
81
4
.

4.

AICR
.
Food, nutrition, physical activity, and the prevention of cancer: a global perspective
.
Washington, DC
:
AICR
,
2007
:1ndash537.

5.

Gaziano
JM
,
Glynn
RJ
,
Christen
WG
,
Kurth
T
,
Belanger
C
,
MacFadyen
J
,
Bubes
V
,
Manson
JE
,
Sesso
HD
,
Buring
JE
.
Vitamins E and C in the prevention of prostate and total cancer in men: the Physicians’ Health Study II randomized controlled trial
.
JAMA
2009
;
301
:
52
62
.

6.

Klein
EA
,
Thompson
IM
,
Tangen
CM
,
Crowley
JJ
,
Lucia
MS
,
Goodman
PJ
,
Minasian
LM
,
Ford
LG
,
Parnes
HL
,
Gaziano
JM
et al. 
Vitamin E and the risk of prostate cancer: the Selenium and Vitamin E Cancer Prevention Trial (SELECT)
.
JAMA
2011
;
306
:
1549
56
.

7.

Lee
I-M
,
Cook
NR
,
Gaziano
JM
,
Gordon
D
,
Ridker
PM
,
Manson
JE
,
Hennekens
CH
,
Buring
JE
.
Vitamin E in the primary prevention of cardiovascular disease and cancer: the Women's Health Study: a randomized controlled trial
.
JAMA
2005
;
294
:
56
65
.

8.

Omenn
GS
,
Goodman
GE
,
Thornquist
MD
,
Balmes
J
,
Cullen
MR
,
Glass
A
,
Keogh
JP
,
Meyskens
FL
,
Valanis
B
,
Williams
JH
et al. 
Effects of a combination of beta carotene and vitamin A on lung cancer and cardiovascular disease
.
N Engl J Med
1996
;
334
:
1150
5
.

9.

Gann
PH
.
Randomized trials of antioxidant supplementation for cancer prevention: first bias, now chance–next, cause
.
JAMA
2009
;
301
:
102
3
.

10.

Hoffman
RM
.
ACP Journal Club. Vitamin E supplementation increased risk for prostate cancer in healthy men at a median of 7 years
.
Ann Intern Med
2012
;
156
:
JC2
03
.

11.

Jacobs
EJ
,
Thun
MJ
.
Low-dose aspirin and vitamin E: challenges and opportunities in cancer prevention
.
JAMA
2005
;
294
:
105
6
.

12.

Martínez
ME
,
Jacobs
ET
,
Baron
JA
,
Marshall
JR
,
Byers
T
.
Dietary supplements and cancer prevention: balancing potential benefits against proven harms
.
J Natl Cancer Inst
2012
;
104
(
10
):
732
9
.

13.

Macdonald
I
.
Nonsense and non-science in nutrition
.
Proc Nutr Soc
1983
;
42
:
513
23
.

14.

Taubes
G
.
Epidemiology faces its limits
.
Science
1995
;
269
:
164
9
.

15.

Ioannidis
JPA
.
Why most published research findings are false
.
PLoS Med
2005
;
2
:
e124
.

16.

Dwan
K
,
Altman
DG
,
Arnaiz
JA
,
Bloom
J
,
Chan
A-W
,
Cronin
E
,
Decullier
E
,
Easterbrook
PJ
,
Von Elm
E
,
Gamble
C
et al. 
Systematic review of the empirical evidence of study publication bias and outcome reporting bias
.
PLoS ONE
2008
;
3
:
e3081
.

17.

Easterbrook
PJ
,
Berlin
JA
,
Gopalan
R
,
Matthews
DR
.
Publication bias in clinical research
.
Lancet
1991
;
337
:
867
72
.

18.

Jennions
MD
,
Møller
AP
.
Publication bias in ecology and evolution: an empirical assessment using the ‘trim and fill’ method
.
Biol Rev Camb Philos Soc
2002
;
77
:
211
22
.

19.

Kyzas
PA
,
Denaxa-Kyza
D
,
Ioannidis
JPA
.
Almost all articles on cancer prognostic markers report statistically significant results
.
Eur J Cancer
2007
;
43
:
2559
79
.

20.

Kyzas
PA
,
Loizou
KT
,
Ioannidis
JPA
.
Selective reporting biases in cancer prognostic factor studies
.
J Natl Cancer Inst
2005
;
97
:
1043
55
.

21.

Song
F
,
Parekh
S
,
Hooper
L
,
Loke
YK
,
Ryder
J
,
Sutton
AJ
,
Hing
C
,
Kwok
CS
,
Pang
C
,
Harvey
I
. Dissemination and publication of research findings: an updated review of related biases. Health Technol Assess
2010
;14(8):iii, ix–xi, 1–193.

22.

Kavvoura
FK
,
Liberopoulos
G
,
Ioannidis
JPA
.
Selection in reported epidemiological risks: an empirical assessment
.
PLoS Med
2007
;
4
:
e79
.

23.

Ioannidis
JPA
.
Effect of formal statistical significance on the credibility of observational associations
.
Am J Epidemiol
2008
;
168
:
374
83, discussion 84–90
.

24.

Ioannidis
JPA
.
Contradicted and initially stronger effects in highly cited clinical research
.
JAMA
2005
;
294
:
218
28
.

25.

Trikalinos
TA
,
Churchill
R
,
Ferri
M
,
Leucht
S
,
Tuunainen
A
,
Wahlbeck
K
,
Ioannidis
JPA
.
project E-P. Effect sizes in cumulative meta-analyses of mental health randomized trials evolved over time
.
J Clin Epidemiol
2004
;
57
:
1124
30
.

26.

Begley
CG
,
Ellis
LM
.
Drug development: raise standards for preclinical cancer research
.
Nature
2012
;
483
:
531
3
.

27.

Prinz
F
,
Schlange
T
,
Asadullah
K
.
Believe it or not: how much can we rely on published data on potential drug targets?
Nat Rev Drug Discov
2011
;
10
:
712
.

28.

Farmer
FM
.
The Boston cooking-school cookbook
.
Boston, MA
:
Little, Brown and Company
,
1896
.

29.

Sterne
JA
,
Davey Smith
G
.
Sifting the evidence-what's wrong with significance tests?
BMJ
2001
;
322
:
226
31
.

30.

Boos
DD
,
Stefanski
LA
.
P-value precision and reproducibility
.
Am Stat
2011
;
65
:
213
21
.

31.

Block
G
,
Dresser
CM
,
Hartman
AM
,
Carroll
MD
.
Nutrient sources in the American diet: quantitative data from the NHANES II survey. I. Vitamins and minerals
.
Am J Epidemiol
1985
;
122
:
13
26
.

32.

Block
G
,
Dresser
CM
,
Hartman
AM
,
Carroll
MD
.
Nutrient sources in the American diet: quantitative data from the NHANES II survey. II. Macronutrients and fats
.
Am J Epidemiol
1985
;
122
:
27
40
.

33.

Michikawa
T
,
Inoue
M
,
Shimazu
T
,
Sasazuki
S
,
Iwasaki
M
,
Sawada
N
,
Yamaji
T
,
Tsugane
S
.
Green tea and coffee consumption and its association with thyroid cancer risk: a population-based cohort study in Japan
.
Cancer Causes Control
2011
;
22
:
985
93
.

34.

Grieb
SM
,
Theis
RP
,
Burr
D
,
Benardot
D
,
Siddiqui
T
,
Asal
NR
.
Food groups and renal cell carcinoma: results from a case-control study
.
J Am Diet Assoc
2009
;
109
(
4
):
656
67
.

35.

Kiani
F
,
Knutsen
S
,
Singh
P
,
Ursin
G
,
Fraser
G
.
Dietary risk factors for ovarian cancer: the Adventist Health Study (United States)
.
Cancer Causes Control
2006
;
17
:
137
46
.

36.

Duell
EJ
,
Travier
N
,
Lujan-Barroso
L
,
Clavel-Chapelon
F
,
Boutron-Ruault
MC
,
Morois
S
,
Palli
D
,
Krogh
V
,
Panico
S
,
Tumino
R
et al. 
Alcohol consumption and gastric cancer risk in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort
.
Am J Clin Nutr
2011
;
94
(
5
):
1266
75
.

37.

Huang
WY
,
Winn
DM
,
Brown
LM
,
Gridley
G
,
Bravo-Otero
E
,
Diehl
SR
,
Fraumeni
JF
Jr
,
Hayes
RB
.
Alcohol concentration and risk of oral cancer in Puerto Rico
.
Am J Epidemiol
2003
;
157
:
881
7
.

38.

Setiawan
VW
,
Yu
GP
,
Lu
QY
,
Lu
ML
,
Yu
SZ
,
Mu
L
,
Zhang
JG
,
Kurtz
RC
,
Cai
L
,
Hsieh
CC
et al. 
Allium vegetables and stomach cancer risk in China
.
Asian Pac J Cancer Prev
2005
;
6
:
387
95
.

39.

Alexander
DD
,
Cushing
CA
.
Quantitative assessment of red meat or processed meat consumption and kidney cancer
.
Cancer Detect Prev
2009
;32(5–6):340–51.

40.

Rompelberg
CJ
,
Vogels
JT
,
de Vogel
N
,
Bruijntjes-Rozier
GC
,
Stenhuis
WH
,
Bogaards
JJ
,
Verhagen
H
.
Effect of short-term dietary administration of eugenol in humans
.
Hum Exp Toxicol
1996
;
15
:
129
35
.

41.

Banerjee
S
,
Panda
CK
,
Das
S
.
Clove (Syzygium aromaticum L.), a potential chemopreventive agent for lung cancer
.
Carcinogenesis
2006
;
27
:
1645
54
.

42.

Sertel
S
,
Eichhorn
T
,
Plinkert
PK
,
Efferth
T
.
Cytotoxicity of Thymus vulgaris essential oil towards human oral cavity squamous cell carcinoma
.
Anticancer Res
2011
;
31
:
81
7
.

43.

Ho
K
,
Yazan
LS
,
Ismail
N
,
Ismail
M
.
Apoptosis and cell cycle arrest of human colorectal cancer cell line HT-29 induced by vanillin
.
Cancer Epidemiol
2009
;
33
:
155
60
.

44.

Baliga
MS
,
Haniadka
R
,
Pereira
MM
,
D'Souza
JJ
,
Pallaty
PL
,
Bhat
HP
,
Popuri
S
.
Update on the chemopreventive effects of ginger and its phytochemicals
.
Crit Rev Food Sci Nutr
2011
;
51
:
499
523
.

45.

Davis
PA
,
Iwahashi
CK
.
Whole almonds and almond fractions reduce aberrant crypt foci in a rat model of colon carcinogenesis
.
Cancer Lett
2001
;
165
(
1
):
27
33
.

46.

Anderson
MS
,
Ronning
EA
,
De Vries
R
,
Martinson
BC
.
The perverse effects of competition on scientists’ work and relationships
.
Sci Eng Ethics
2007
;
13
:
437
61
.

47.

Fanelli
D
.
Do pressures to publish increase scientists’ bias? An empirical support from US States Data
.
PLoS ONE
2010
;
5
:
e10271
.

48.

Cope
MB
,
Allison
DB
.
White hat bias: examples of its presence in obesity research and a call for renewed commitment to faithfulness in research reporting
.
Int J Obes (lond)
2010
;
34
:
84
8
, discussion 3.

49.

Tatsioni
A
,
Bonitsis
NG
,
Ioannidis
JPA
.
Persistence of contradicted claims in the literature
.
JAMA
2007
;
298
:
2517
26
.

50.

Blumberg
J
,
Heaney
RP
,
Huncharek
M
,
Scholl
T
,
Stampfer
M
,
Vieth
R
,
Weaver
CM
,
Zeisel
SH
.
Evidence-based criteria in the nutritional context
.
Nutr Rev
2010
;
68
:
478
84
.

51.

Mann
J
.
Discrepancies in nutritional recommendations: the need for evidence based nutrition
.
Asia Pac J Clin Nutr
2002
;
11
(
Suppl 3
):
S510
5
.

52.

Mayes
LC
,
Horwitz
RI
,
Feinstein
AR
.
A collection of 56 topics with contradictory results in case-control research
.
Int J Epidemiol
1988
;
17
:
680
5
.

53.

Alexander
DD
,
Weed
DL
,
Cushing
CA
,
Lowe
KA
.
Meta-analysis of prospective studies of red meat consumption and colorectal cancer
.
Eur J Cancer Prev
2011
;
20
:
293
307
.

54.

Egger
M
,
Schneider
M
,
Davey Smith
G
.
Spurious precision? Meta-analysis of observational studies
.
BMJ
1998
;
316
:
140
4
.

55.

Ioannidis
JP
.
Why most discovered true associations are inflated
.
Epidemiology
2008
;
19
:
640
8
.

56.

Ioannidis
JP
.
The importance of potential studies that have not existed and registration of observational data sets
.
JAMA
2012
;
308
:
575
6
.

Supplementary data