-
PDF
- Split View
-
Views
-
Cite
Cite
Jonathan D Schoenfeld, John PA Ioannidis, Is everything we eat associated with cancer? A systematic cookbook review, The American Journal of Clinical Nutrition, Volume 97, Issue 1, January 2013, Pages 127–134, https://doi.org/10.3945/ajcn.112.047142
Close -
Share
ABSTRACT
Background: Nutritional epidemiology is a highly prolific field. Debates on associations of nutrients with disease risk are common in the literature and attract attention in public media.
Objective: We aimed to examine the conclusions, statistical significance, and reproducibility in the literature on associations between specific foods and cancer risk.
Design: We selected 50 common ingredients from random recipes in a cookbook. PubMed queries identified recent studies that evaluated the relation of each ingredient to cancer risk. Information regarding author conclusions and relevant effect estimates were extracted. When >10 articles were found, we focused on the 10 most recent articles.
Results: Forty ingredients (80%) had articles reporting on their cancer risk. Of 264 single-study assessments, 191 (72%) concluded that the tested food was associated with an increased (n = 103) or a decreased (n = 88) risk; 75% of the risk estimates had weak (0.05 > P ≥ 0.001) or no statistical (P > 0.05) significance. Statistically significant results were more likely than nonsignificant findings to be published in the study abstract than in only the full text (P < 0.0001). Meta-analyses (n = 36) presented more conservative results; only 13 (26%) reported an increased (n = 4) or a decreased (n = 9) risk (6 had more than weak statistical support). The median RRs (IQRs) for studies that concluded an increased or a decreased risk were 2.20 (1.60, 3.44) and 0.52 (0.39, 0.66), respectively. The RRs from the meta-analyses were on average null (median: 0.96; IQR: 0.85, 1.10).
Conclusions: Associations with cancer risk or benefits have been claimed for most food ingredients. Many single studies highlight implausibly large effects, even though evidence is weak. Effect sizes shrink in meta-analyses.
See corresponding editorial on page 5
INTRODUCTION
Thousands of nutritional epidemiology studies are conducted and published annually in the quest to identify dietary factors that affect major health outcomes, including cancer risk (1). These studies influence dietary guidelines and at times public health policy (2) and receive wide attention in news media (3). However, interpretation of the multitude of studies in this area is difficult (1, 4) and is critically dependent on accurate assessments of the credibility of published data. Randomized trials have repeatedly failed to find treatment effects for nutrients in which observational studies had previously proposed strong associations (5–8), and such discrepancies in the evidence have fueled hot debates (9–12) rife with emotional and sensational rhetoric that can subject the general public to increased anxiety and contradictory advice (13, 14). One wonders whether this highly charged atmosphere and intensive testing of food-related associations may create a plethora of false-positive findings (15) and questionable research practices, especially when the research is highly exploratory, the analyses and protocols are not preregistered, and the findings are selectively reported. It was previously shown in a variety of other fields that “negative” results are either less likely to be published (16–21) or misleadingly interpreted (19, 22). Studies may spuriously highlight results that barely achieve statistical significance (15, 23) or report effect estimates that either are overblown (24, 25) or cannot be replicated in other studies (24, 26, 27).
To better evaluate the extent to which these factors may affect studies investigating dietary risk factors for malignancy, we surveyed recently published studies and meta-analyses that addressed the potential association between a large random sample of food ingredients and cancer risk of any type of malignancy.
SUBJECTS AND METHODS
Random ingredient selection
We selected ingredients from random recipes included in The Boston Cooking-School Cook Book (28), available online at http://archive.org/details/bostoncookingsch00farmrich. A copy of the book was obtained in portable document format and viewed by using Skim version 1.3.17 (http://skim-app.sourceforge.net). The recipes (see Supplementary Table 1 under “Supplemental data” in the online issue) were selected at random by generating random numbers corresponding to cookbook page numbers using Microsoft Excel (Microsoft Corporation). The first recipe on each page selected was used; the page was passed over if there was no recipe. All unique ingredients within selected recipes were chosen for analysis. This process was repeated until 50 unique ingredients were selected.
Study searches
We performed literature searches using PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) for studies investigating the relation of the selected ingredients to cancer risk using the following search terms: “risk factors”[MeSH Terms] AND “cancer”[sb] AND the singular and/or plural forms of the selected ingredient restricted to the title or abstract. Titles and abstracts of retrieved articles were then reviewed to select the 10 most recently published cohort or case-control studies investigating the relation between the ingredients and cancer risk. Ingredient derivatives and components (eg, orange juice) and ingredients analyzed as part of a broader diet specifically mentioned as a component of that diet were considered. Whenever <10 studies were retrieved for a given article, an attempt was made to obtain additional studies by searching for ingredient synonyms (eg, mutton for lamb, thymol for thyme), using articles explicitly referred to by the previously retrieved material, and broadening the original searches (searching simply by ingredient name AND “cancer”).
Searches for relevant meta-analyses were performed in the same manner as for single studies, but adding the PubMed “meta-analysis” filter. For each ingredient, the most recent meta-analysis investigating the relation with a particular cancer was selected for analysis. In 2 meta-analyses that separately investigated associations with more than one different type or subtype of cancer, only the first type mentioned in the abstract was considered.
Data extraction
From each retrieved study or meta-analysis, data were extracted from the abstract regarding the ingredient and cancer type, authors’ conclusions regarding the risk of malignancy (increased risk, decreased risk, no effect, or borderline/other effect), the respective RR estimate (typically the HR for cohort studies or OR for case-control studies), and the exposure contrast to which it pertained, its 95% CI, and P value. When available, we used P values that were explicitly reported, including P values for trends. Standard reporting of these P values did not adjust for potential multiple testing. When not available, we estimated the P values from the reported point estimates and CIs of the effects, assuming no testing for trends across multiple different exposure levels. Whenever the effect estimate and P value were not available and could not be approximated from data available in the article abstract, the full text was then retrieved and examined in an attempt to obtain this information.
When multiple potentially relevant effect estimates were available from a given study, the following criteria were applied in order of priority: the estimates most specific for the ingredient, the most broadly defined definition of malignancy (eg, colorectal compared with colon or rectal cancers), the most general patient subgroup, the most adjusted estimate, and that corresponding to the most extreme reported exposure contrast (ie, highest compared with lowest level of exposure). This allowed us to better compare effect estimates across ingredients and from individual studies with meta-analyses, because it is common practice in the literature to report comparisons of extreme exposure levels (22). In the case of estimates reported for multiple malignancies or patient subgroups of similar magnitude, the estimate or conclusion referred to first in the abstract was chosen for further analysis. If no estimate or conclusion was specifically referred to in the abstract, the same criteria described above were applied to the full text.
Whenever available, effect estimates limited to the analysis of prospective cohort data were also separately extracted from the retrieved meta-analyses. One author performed data extraction (JDS) and discussed any uncertainties with the other author (JPAI) for arbitration.
Statistical analyses
We aimed to examine whether results and their interpretations were generally more conservative in the meta-analyses than in the single studies and whether there were any hints of biases in the overall evidence. We summarized and compared data from the retrieved single studies and from the meta-analyses on the conclusions of the authors and on whether these were congruent with the presence of nominal statistical significance (P < 0.05) without adjustment for potential multiple testing. We also assessed the types and consistency of exposure contrasts used. Finally, we evaluated the distribution of P values (and corresponding standardized, z scores from the normal distribution) to examine whether there were any peaks of frequently reported P values and troughs of infrequently reported P values and the distribution of RRs to examine the median and IQR of reported effect sizes, to help highlight trends in the literature and potential biases. For P values (z scores), we also examined whether the results listed in the abstract differed from those listed only in the full text using a chi-square test.
P values >0.05 are considered not nominally significant, whereas P values between 0.05 and 0.001 are considered to offer weak support, as previously proposed for epidemiologic analyses (29). In Bayesian terms, such P values generally do not correspond to very strong support, regardless of prior assumptions (23, 30).
The main analyses evaluated all retrieved data from single studies and from meta-analyses. Sensitivity analyses focused on comparisons of meta-analyses against single studies on the same ingredient-cancer pairs (excluding single studies on associations for which no meta-analysis had been found and meta-analyses on associations for which no single study had been among the 10 more recent captured studies) and assessment of meta-analysis data only from prospective cohort studies. JMP version 9.0 (SAS Institute) was used to generate summary statistics, calculate z scores from the normal distribution, perform chi-square analysis, and draft figures.
RESULTS
Ingredients studied in relation to cancer
At least one study was identified for 80% (n = 40) of the ingredients selected from random recipes that investigated the relation to cancer risk: veal, salt, pepper spice, flour, egg, bread, pork, butter, tomato, lemon, duck, onion, celery, carrot, parsley, mace, sherry, olive, mushroom, tripe, milk, cheese, coffee, bacon, sugar, lobster, potato, beef, lamb, mustard, nuts, wine, peas, corn, cinnamon, cayenne, orange, tea, rum, and raisin. These ingredients studied include many of the most common sources of vitamins and nutrients in the United States diet (31, 32); in contrast, the 10 ingredients for which a relevant cancer risk study was not identified were generally more obscure: bay leaf, cloves, thyme, vanilla, hickory, molasses, almonds, baking soda, ginger, and terrapin. Of the 40 ingredients for which at least one study was identified, 50% (n = 20) had ≥10 studies, 15% (n = 6) had 6–10 studies, and 35% (n = 14) had 1–5 studies. The identified studies provided 264 relevant effect estimates described in 216 publications dating from February 1976 to December 2011 (see Supplementary Table 1 under “Supplemental data” in the online issue). One hundred fifty-four (71%) and 184 (85%) articles were published in or after 2005 and 2000, respectively.
Author conclusions and reported effect estimates
Author conclusions reported in the abstract and manuscript text and relevant effect estimates are summarized in Table 1. Thirty-nine percent of studies concluded that the studied ingredient conferred an increased risk of malignancy; 33% concluded that there was a decreased risk, 5% concluded that there was a borderline statistically significant effect, and 23% concluded that there was no evidence of a clearly increased or decreased risk. Thirty-six of the 40 ingredients for which at least one study was identified had at least one study concluding increased or decreased risk of malignancy: veal, salt, pepper spice, egg, bread, pork, butter, tomato, lemon, duck, onion, celery, carrot, parsley, mace, olive, mushroom, tripe, milk, cheese, coffee, bacon, sugar, lobster, potato, beef, lamb, mustard, nuts, wine, peas, corn, cayenne, orange, tea, and rum.
Author conclusions from retrieved articles and meta-analyses in relation to the statistical significance of the associations and effect estimates
| Author conclusion | n (%)1 | Statistical significance of association2 | Median effect estimate (IQR) | Median P value (IQR) |
| Individual studies | ||||
| Increased risk | 103 (39) | Nonsignificant: 13 (13%) | 2.20 (1.60, 3.44) | 0.008 (0.001, 0.030) |
| Weak: 64 (62%) | ||||
| Strong: 25 (24%) | ||||
| Missing: 1 (1%) | ||||
| Decreased risk | 88 (33) | Nonsignificant: 7 (8%) | 0.52 (0.39, 0.66) | 0.010 (0.002, 0.030) |
| Weak: 60 (68%) | ||||
| Strong: 17 (19%) | ||||
| Missing: 4 (5%) | ||||
| No effect | 61 (23) | Nonsignificant: 58 (95%) | 1.03 (0.91, 1.14) | 0.510 (0.294, 0.701) |
| Weak: 1 (2%) | ||||
| Missing: 2 (3%) | ||||
| Borderline effect | 12 (5) | Nonsignificant: 11 (92%) | 0.80 (0.60, 1.50) | 0.075 (0.060, 0.275) |
| Strong: 1 (8%) | ||||
| Meta-analyses | ||||
| Increased risk | 4 (11) | Nonsignificant: 1 (25%) | 1.33 (1.20, 1.69) | 0.017 (0.003, 0.119) |
| Weak: 2 (50%) | ||||
| Strong: 1 (25%) | ||||
| Decreased risk | 9 (25) | Weak: 4 (44%) | 0.68 (0.61, 0.81) | 0.0005 (0.0001, 0.0133) |
| Strong: 5 (56%) | ||||
| No effect | 13 (36) | Nonsignificant: 11 (85%) | 1.07 (0.98, 1.21) | 0.38 (0.11, 0.55) |
| Weak: 2 (15%) | ||||
| Borderline or complex3 effect | 10 (28) | Nonsignificant: 6 (60%) | 0.844 (0.72, 0.99) | 0.0614(0.044, 0.142) |
| Weak: 2 (20%) | ||||
| Strong: 2 (20%) |
| Author conclusion | n (%)1 | Statistical significance of association2 | Median effect estimate (IQR) | Median P value (IQR) |
| Individual studies | ||||
| Increased risk | 103 (39) | Nonsignificant: 13 (13%) | 2.20 (1.60, 3.44) | 0.008 (0.001, 0.030) |
| Weak: 64 (62%) | ||||
| Strong: 25 (24%) | ||||
| Missing: 1 (1%) | ||||
| Decreased risk | 88 (33) | Nonsignificant: 7 (8%) | 0.52 (0.39, 0.66) | 0.010 (0.002, 0.030) |
| Weak: 60 (68%) | ||||
| Strong: 17 (19%) | ||||
| Missing: 4 (5%) | ||||
| No effect | 61 (23) | Nonsignificant: 58 (95%) | 1.03 (0.91, 1.14) | 0.510 (0.294, 0.701) |
| Weak: 1 (2%) | ||||
| Missing: 2 (3%) | ||||
| Borderline effect | 12 (5) | Nonsignificant: 11 (92%) | 0.80 (0.60, 1.50) | 0.075 (0.060, 0.275) |
| Strong: 1 (8%) | ||||
| Meta-analyses | ||||
| Increased risk | 4 (11) | Nonsignificant: 1 (25%) | 1.33 (1.20, 1.69) | 0.017 (0.003, 0.119) |
| Weak: 2 (50%) | ||||
| Strong: 1 (25%) | ||||
| Decreased risk | 9 (25) | Weak: 4 (44%) | 0.68 (0.61, 0.81) | 0.0005 (0.0001, 0.0133) |
| Strong: 5 (56%) | ||||
| No effect | 13 (36) | Nonsignificant: 11 (85%) | 1.07 (0.98, 1.21) | 0.38 (0.11, 0.55) |
| Weak: 2 (15%) | ||||
| Borderline or complex3 effect | 10 (28) | Nonsignificant: 6 (60%) | 0.844 (0.72, 0.99) | 0.0614(0.044, 0.142) |
| Weak: 2 (20%) | ||||
| Strong: 2 (20%) |
n = 264 for individual studies and n = 36 for meta-analyses. Among the individual studies, effect estimates were missing from 9 studies, and specific P values were missing from 7 studies.
Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001); P values with inequalities were imputed as equal to the reported threshold when median values were calculated (eg, P < 0.001 was considered a strong association but was used as P = 0.001 to calculate medians).
J-shaped or dependent on study type.
Borderline effects only.
Author conclusions from retrieved articles and meta-analyses in relation to the statistical significance of the associations and effect estimates
| Author conclusion | n (%)1 | Statistical significance of association2 | Median effect estimate (IQR) | Median P value (IQR) |
| Individual studies | ||||
| Increased risk | 103 (39) | Nonsignificant: 13 (13%) | 2.20 (1.60, 3.44) | 0.008 (0.001, 0.030) |
| Weak: 64 (62%) | ||||
| Strong: 25 (24%) | ||||
| Missing: 1 (1%) | ||||
| Decreased risk | 88 (33) | Nonsignificant: 7 (8%) | 0.52 (0.39, 0.66) | 0.010 (0.002, 0.030) |
| Weak: 60 (68%) | ||||
| Strong: 17 (19%) | ||||
| Missing: 4 (5%) | ||||
| No effect | 61 (23) | Nonsignificant: 58 (95%) | 1.03 (0.91, 1.14) | 0.510 (0.294, 0.701) |
| Weak: 1 (2%) | ||||
| Missing: 2 (3%) | ||||
| Borderline effect | 12 (5) | Nonsignificant: 11 (92%) | 0.80 (0.60, 1.50) | 0.075 (0.060, 0.275) |
| Strong: 1 (8%) | ||||
| Meta-analyses | ||||
| Increased risk | 4 (11) | Nonsignificant: 1 (25%) | 1.33 (1.20, 1.69) | 0.017 (0.003, 0.119) |
| Weak: 2 (50%) | ||||
| Strong: 1 (25%) | ||||
| Decreased risk | 9 (25) | Weak: 4 (44%) | 0.68 (0.61, 0.81) | 0.0005 (0.0001, 0.0133) |
| Strong: 5 (56%) | ||||
| No effect | 13 (36) | Nonsignificant: 11 (85%) | 1.07 (0.98, 1.21) | 0.38 (0.11, 0.55) |
| Weak: 2 (15%) | ||||
| Borderline or complex3 effect | 10 (28) | Nonsignificant: 6 (60%) | 0.844 (0.72, 0.99) | 0.0614(0.044, 0.142) |
| Weak: 2 (20%) | ||||
| Strong: 2 (20%) |
| Author conclusion | n (%)1 | Statistical significance of association2 | Median effect estimate (IQR) | Median P value (IQR) |
| Individual studies | ||||
| Increased risk | 103 (39) | Nonsignificant: 13 (13%) | 2.20 (1.60, 3.44) | 0.008 (0.001, 0.030) |
| Weak: 64 (62%) | ||||
| Strong: 25 (24%) | ||||
| Missing: 1 (1%) | ||||
| Decreased risk | 88 (33) | Nonsignificant: 7 (8%) | 0.52 (0.39, 0.66) | 0.010 (0.002, 0.030) |
| Weak: 60 (68%) | ||||
| Strong: 17 (19%) | ||||
| Missing: 4 (5%) | ||||
| No effect | 61 (23) | Nonsignificant: 58 (95%) | 1.03 (0.91, 1.14) | 0.510 (0.294, 0.701) |
| Weak: 1 (2%) | ||||
| Missing: 2 (3%) | ||||
| Borderline effect | 12 (5) | Nonsignificant: 11 (92%) | 0.80 (0.60, 1.50) | 0.075 (0.060, 0.275) |
| Strong: 1 (8%) | ||||
| Meta-analyses | ||||
| Increased risk | 4 (11) | Nonsignificant: 1 (25%) | 1.33 (1.20, 1.69) | 0.017 (0.003, 0.119) |
| Weak: 2 (50%) | ||||
| Strong: 1 (25%) | ||||
| Decreased risk | 9 (25) | Weak: 4 (44%) | 0.68 (0.61, 0.81) | 0.0005 (0.0001, 0.0133) |
| Strong: 5 (56%) | ||||
| No effect | 13 (36) | Nonsignificant: 11 (85%) | 1.07 (0.98, 1.21) | 0.38 (0.11, 0.55) |
| Weak: 2 (15%) | ||||
| Borderline or complex3 effect | 10 (28) | Nonsignificant: 6 (60%) | 0.844 (0.72, 0.99) | 0.0614(0.044, 0.142) |
| Weak: 2 (20%) | ||||
| Strong: 2 (20%) |
n = 264 for individual studies and n = 36 for meta-analyses. Among the individual studies, effect estimates were missing from 9 studies, and specific P values were missing from 7 studies.
Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001); P values with inequalities were imputed as equal to the reported threshold when median values were calculated (eg, P < 0.001 was considered a strong association but was used as P = 0.001 to calculate medians).
J-shaped or dependent on study type.
Borderline effects only.
The statistical support of the effects was weak (0.001 ≤ P < 0.05) or even nonnominally significant (P > 0.05) in 80% of the studies. It was also weak or nonnominally significant, even in 75% of the studies that claimed an increased risk and in 76% of the studies that claimed a decreased risk (Table 1).
RRs compared the lowest with the highest categories of consumption in 172 (65%) estimates. There was wide variability in how exposure contrasts were defined: highest compared with lowest tertiles, quartiles, or quintiles were compared in 32, 36, and 22 studies, respectively. However, other studies used more arbitrary definitions for extremes, eg, ≥5 cups/d compared with <1 cup/d (33), ≥5 servings/wk compared with <1 serving/wk (34, 35), ≥30 g/d compared with 0.1–4.9 g/d (36), ≥43 drinks/wk compared with zero drinks (37), and “often” compared with “never” (38). Contrasts used for the remainder of the estimates were compared with no consumption (n = 36, 13%) or intermediate or incremental levels of consumption (n = 45, 17%) or could not be determined (n = 11, 4%). The median RRs were 2.20 (IQR: 1.60, 3.44) and 0.52 IQR: (0.39, 0.66) in studies that concluded increased and decreased risks, respectively.
The effect estimates are shown in Figure 1 by malignancy type or by ingredient for the 20 ingredients for which ≥10 articles were identified. Gastrointestinal malignancies were the most commonly studied (45%), followed by genitourinary (14%), breast (14%), head and neck (9%), lung (5%), and gynecologic (5%) malignancies.
Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with ≥10 studies are shown. Three outliers are not shown (effect estimates >10).
Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with ≥10 studies are shown. Three outliers are not shown (effect estimates >10).
The distribution of standardized (z) scores associated with P values was bimodal, with peaks corresponding to nominally statistically significant results and a trough in the middle corresponding to the sparse nonsignificant results (Figure 2, left panel). The bimodal peaks and middle trough pattern were even more prominent for results reported in the abstracts: 62% of the nominally statistically significant effect estimates were reported in abstracts, whereas most (70%) of the nonsignificant results appeared only in the full text and not in the abstracts (P < 0.0001).
Standardized (z) scores associated with effect estimates for ingredients from individual studies (left) and meta-analyses (right). Scores available from article abstracts are shown in black, whereas those found in the full text are in gray. For reference, a P value of 0.05 has a z score of −1.96 for an association with a decreased cancer risk and 1.96 for an association with an increased cancer risk.
Standardized (z) scores associated with effect estimates for ingredients from individual studies (left) and meta-analyses (right). Scores available from article abstracts are shown in black, whereas those found in the full text are in gray. For reference, a P value of 0.05 has a z score of −1.96 for an association with a decreased cancer risk and 1.96 for an association with an increased cancer risk.
Meta-analyses
Thirty-six relevant effect estimates were obtained from meta-analyses (see Supplementary Table 2 under “Supplemental data” in the online issue). Author conclusions and the respective effect estimates are summarized in Table 1.
Thirty-three (92%) of the 36 estimates pertained to comparisons of the lowest with the highest levels of consumption, but most of these meta-analyses combined studies that had different exposure contrasts. For example, one meta-analysis (39) combined studies that compared the highest with the lowest consumption and others that compared the fourth with the first quartile. Only 13 meta-analysis estimates were obtained by combining data on the same exact contrast across all studies.
Thirteen meta-analyses concluded that there was an increased (n = 4) or decreased (n = 9) risk of malignancy, respectively, and 6 of them had more than weak statistical support. The remainder of studies concluded that there was no effect (36%, n = 13) or an effect that was borderline (n = 6), potentially J-shaped (n = 2), seen in case-control but not in cohort studies (n = 1), or seen in cohort but not in case-control studies (n = 1).
The distribution of standardized (z) scores associated with P values was bimodal also for meta-analyses (Figure 2, right panel), with a trough in the middle for z values −1 to 0.5. However, in contrast with single studies, the peaks corresponded to nonstatistically significant results or were of borderline significance. Only 6 estimates came from the full text and not from the abstract.
As shown in Figure 3, the distribution of the effect sizes in the meta-analyses appeared normal, centered around the null, and generally showed small effects on both sides of the distribution (median RR: 0.96; IQR: 0.85, 1.10). Median effect estimates were 1.33 (IQR: 1.20, 1.69) for studies that concluded an increased risk and 0.68 (IQR: 0.61, 0.81) for studies that concluded a decreased risk; these estimates were, in general, more conservative than those predicted by the individual studies (Figure 3, lower panel).
Effect estimates from the meta-analyses (n = 36) and individual studies (n = 255; effect estimates were missing in 9 of the 264 studies).
Effect estimates from the meta-analyses (n = 36) and individual studies (n = 255; effect estimates were missing in 9 of the 264 studies).
Sensitivity analyses
When ingredient-malignancy category pairs were both an individual study and meta-analysis and were available and compared, the median effect estimates for studies that concluded an increased or decreased risk were 2.69 (95% CI: 1.60, 4.85) and 0.51 (0.36, 0.64) for individual studies compared with 1.33 (95% CI: 1.20, 1.69) and 0.66 (0.56, 0.82) for meta-analyses, respectively. In 43 of the 64 pairwise comparisons between results from individual studies and meta-analyses that reported on the same ingredient-malignancy category (eg, coffee with gastrointestinal malignancies; Table 2), the effect was closer to the null (RR: 1.00) in the meta-analysis than in the respective original study (P = 0.009; McNemar's test for paired comparison).
Strength and direction of associations obtained from individual studies and meta-analyses based on reported effect estimates and reported or calculated P values
| Ingredient-malignancy pairs with data from both individual studies and meta-analyses | ||||
| Association1 | Individual studies (n = 50) | Meta-analyses (n = 30) | All meta-analyses (n = 36) | Meta-analyses using data from prospective cohort studies (n = 26) |
| n (%) | n (%) | n (%) | n (%) | |
| Nonsignificant association | 22 (44) | 17 (57) | 18 (50) | 19 (73) |
| Weakly increased risk | 9 (18) | 4 (13) | 4 (11) | 2 (8) |
| Weakly decreased risk | 11 (22) | 5 (17) | 6 (17) | 3 (12) |
| Strongly increased risk | 2 (4) | 1 (3) | 2 (6) | 0 |
| Strongly decreased risk | 6 (12) | 3 (10) | 6 (17) | 2 (8) |
| Ingredient-malignancy pairs with data from both individual studies and meta-analyses | ||||
| Association1 | Individual studies (n = 50) | Meta-analyses (n = 30) | All meta-analyses (n = 36) | Meta-analyses using data from prospective cohort studies (n = 26) |
| n (%) | n (%) | n (%) | n (%) | |
| Nonsignificant association | 22 (44) | 17 (57) | 18 (50) | 19 (73) |
| Weakly increased risk | 9 (18) | 4 (13) | 4 (11) | 2 (8) |
| Weakly decreased risk | 11 (22) | 5 (17) | 6 (17) | 3 (12) |
| Strongly increased risk | 2 (4) | 1 (3) | 2 (6) | 0 |
| Strongly decreased risk | 6 (12) | 3 (10) | 6 (17) | 2 (8) |
Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001).
Strength and direction of associations obtained from individual studies and meta-analyses based on reported effect estimates and reported or calculated P values
| Ingredient-malignancy pairs with data from both individual studies and meta-analyses | ||||
| Association1 | Individual studies (n = 50) | Meta-analyses (n = 30) | All meta-analyses (n = 36) | Meta-analyses using data from prospective cohort studies (n = 26) |
| n (%) | n (%) | n (%) | n (%) | |
| Nonsignificant association | 22 (44) | 17 (57) | 18 (50) | 19 (73) |
| Weakly increased risk | 9 (18) | 4 (13) | 4 (11) | 2 (8) |
| Weakly decreased risk | 11 (22) | 5 (17) | 6 (17) | 3 (12) |
| Strongly increased risk | 2 (4) | 1 (3) | 2 (6) | 0 |
| Strongly decreased risk | 6 (12) | 3 (10) | 6 (17) | 2 (8) |
| Ingredient-malignancy pairs with data from both individual studies and meta-analyses | ||||
| Association1 | Individual studies (n = 50) | Meta-analyses (n = 30) | All meta-analyses (n = 36) | Meta-analyses using data from prospective cohort studies (n = 26) |
| n (%) | n (%) | n (%) | n (%) | |
| Nonsignificant association | 22 (44) | 17 (57) | 18 (50) | 19 (73) |
| Weakly increased risk | 9 (18) | 4 (13) | 4 (11) | 2 (8) |
| Weakly decreased risk | 11 (22) | 5 (17) | 6 (17) | 3 (12) |
| Strongly increased risk | 2 (4) | 1 (3) | 2 (6) | 0 |
| Strongly decreased risk | 6 (12) | 3 (10) | 6 (17) | 2 (8) |
Nonsignificant (P ≥ 0.05), weak (0.001 ≤ P < 0.05), and strong (P < 0.001).
The percentage of studies with a nonnominally significant result (P ≥ 0.05) was greater among the meta-analyses (57%) than among the individual studies (44%), although this difference was not statistically significant (P = 0.356; Fisher's exact test). However, a significantly greater percentage of studies with nonnominally significant results (73%) was found when effect estimates were limited to meta-analyses of prospective cohorts obtained from the retrieved studies as compared with individual studies (P = 0.028; Table 2).
DISCUSSION
In this survey of published literature regarding the relation between food ingredients and malignancy, we found that 80% of ingredients from randomly selected recipes had been studied in relation to malignancy and the large majority of these studies were interpreted by their authors as offering evidence for increased or decreased risk of cancer. However, the vast majority of these claims were based on weak statistical evidence. Many statistically insignificant “negative” and weak results were relegated to the full text rather than to the study abstract. Individual studies reported larger effect sizes than did the meta-analyses. There was no standardized, consistent selection of exposure contrasts for the reported risks. A minority of associations had more than weak support in meta-analyses, and summary effects in meta-analyses were consistent with a null average and relatively limited variance.
We should acknowledge that our searches for eligible studies were not exhaustive. Covering the entire nutritional epidemiology literature would be impossible. However, our search approach was representative of the studies that might be encountered by a researcher, physician, patient, or consumer embarking on a review of this literature. We preferentially analyzed the first effect estimate mentioned in individual studies and meta-analyses, which, although not random, are likely to be the first encountered by the reader. Moreover, application of this rule allowed for consistency and avoidance of subjectivity in selecting effect estimates. Given that we examined the effects most specific to individual ingredients, we did not examine more complex analyses involving nutritional pathways, biochemical nutritional measurements, and metabolites or combinations of ingredients. However, we hypothesize that similar patterns of research conduct and reporting also apply to these other aspects of nutritional epidemiology. Moreover, most ingredients for which a human study was not identified by our search have been studied in animal cancer models, eg, eugenol from bay leaf (40), cloves (41), terpenoids from thyme (42), vanillin from vanilla (43), ginger (44), and almonds (45).
We found great variability in the types of exposure contrasts. Moreover, the meta-analyses were often forced to merge data from studies that had used different exposure contrasts. We found that, if anything, a greater percentage of effect estimates from meta-analyses (92%) than from individual studies (65%) contrasted the lowest compared with the highest levels of consumption. This suggests that the more extreme risk estimates reported in the single studies may not represent simply a choice of more extreme exposure contrasts. However, the lack of standardization in definitions and choice of exposure contrasts allows widely different (and potentially biased) estimates of effect sizes to be reported, at the discretion of the investigators. We focused on the more extreme risk estimates, because previous evidence suggests that these are selectively used to present data when effect sizes are more modest (22). Whereas this is probably understood by expert scientists, the larger reported risks may be misleading to the nonmetholodogist reader or general public (22).
Nutritional epidemiology is a valuable field that can identify potentially modifiable risk factors related to diet. However, the credibility of studies in this and other fields is subject to publication and other selective outcome and analysis reporting biases (16–21), whenever the pressure to publish (46) fosters a climate in which “negative” results are undervalued and not reported (47). Ingredients viewed as “unhealthy” may be demonized, leading to subsequent biases in the design, execution and reporting of studies (48). Some studies that narrowly meet criteria for statistical significance may represent spurious results (15), especially when there is large flexibility in analyses, selection of contrasts, and reporting. When results are overinterpreted, the emerging literature can skew perspectives (49) and potentially obfuscate other truly significant findings. This issue may be especially problematic in areas such as cancer epidemiology, where randomized trials may be exceedingly difficult and expensive to conduct (50, 51); therefore, more reliance is placed on observational studies, but with a considerable risk of trusting false-positive or inflated results (52).
Some meta-analyses may yield more reliable results, synthesize the available evidence, and control for potential confounding factors (53); however, even these analyses can be biased or misinterpreted (21, 54). Our findings support previous evidence suggesting that effect sizes are likely to trend closer to the null as more data are accumulated (55). However, fragmented efforts from multiple teams may be difficult to integrate with meta-analyses after the fact. To enhance further progress, the field of nutritional epidemiology may need to consider practices such as advanced registration. When research is exploratory, protocols and analyses are modified in iterative fashion, and these changes should be documented. If protocols are not registered up front, one should consider, at a minimum, upfront registration of the data sets that are used for such analyses (what variables are available for analyses) (56). This would allow mapping the space of what analyses could have been performed, regardless of what was eventually reported. Prospective consortium meta-analyses addressing all nutrients and foods in the same project and using standardized exposure contrasts can also address the heterogeneity of how studies are conducted or how the results are analyzed. In addition, comprehensive documentation of analyses and reporting (49) rather than testing and reporting one association at a time (50) would be of value. Such approaches, in combination, may facilitate a more accurate interpretation of the evidence linking foods to the risk of developing cancer.
The authors’ responsibilities were as follows—JDS and JPAI (guarantor): designed the study, interpreted the data and the analyses, wrote the manuscript, and approved the final version; and JDS: performed the data extraction and the statistical analyses with help and supervision from JPAI. No conflicts of interest were reported by either author.
FOOTNOTES
There was no funding for this study.



