Preventable Cancer Burden Associated With Poor Diet in the United States

Abstract Background Diet is an important risk factor for cancer that is amenable to intervention. Estimating the cancer burden associated with diet informs evidence-based priorities for nutrition policies to reduce cancer burden in the United States. Methods Using a comparative risk assessment model that incorporated nationally representative data on dietary intake, national cancer incidence, and estimated associations of diet with cancer risk from meta-analyses of prospective cohort studies, we estimated the annual number and proportion of new cancer cases attributable to suboptimal intakes of seven dietary factors among US adults ages 20 years or older, and by population subgroups. Results An estimated 80 110 (95% uncertainty interval [UI] = 76 316 to 83 657) new cancer cases were attributable to suboptimal diet, accounting for 5.2% (95% UI = 5.0% to 5.5%) of all new cancer cases in 2015. Of these, 67 488 (95% UI = 63 583 to 70 978) and 4.4% (95% UI = 4.2% to 4.6%) were attributable to direct associations and 12 589 (95% UI = 12 156 to 13 038) and 0.82% (95% UI = 0.79% to 0.85%) to obesity-mediated associations. By cancer type, colorectal cancer had the highest number and proportion of diet-related cases (n = 52 225, 38.3%). By diet, low consumption of whole grains (n = 27 763, 1.8%) and dairy products (n = 17 692, 1.2%) and high intake of processed meats (n = 14 524, 1.0%) contributed to the highest burden. Men, middle-aged (45–64 years) and racial/ethnic minorities (non-Hispanic blacks, Hispanics, and others) had the highest proportion of diet-associated cancer burden than other age, sex, and race/ethnicity groups. Conclusions More than 80 000 new cancer cases are estimated to be associated with suboptimal diet among US adults in 2015, with middle-aged men and racial/ethnic minorities experiencing the largest proportion of diet-associated cancer burden in the United States.


Appendix 1. Comparative Risk Assessment Population Attributable Fraction
A Comparative Risk Assessment (CRA) framework has been used to estimate the proportion of cancer cases attributable to suboptimal diet, the Population Attributable Fraction (PAF). The standard PAF formula used is as follows: where ( ) is the distribution of current dietary consumption, ( ) is the relative risk of mortality at exposure level , and is the maximum exposure level. As described by Vander Hoorn et al, 1,2 this is a special case of the more commonly used (for descriptive purposes) formula: where the alternative distribution ( ′( )) is the same as the theoretical minimum risk (optimal population distribution of diet) exposure distribution.

( )
We estimated the mean intake and distribution of dietary intake of seven food groups and nutrients among US adults aged 20 years or older using the dietary data collected from 24-diet recalls in the National Health and Nutrition Examination Survey (NHANES). Dietary data from one or two 24-hour diet recalls may not represent a person's usual intake due to within-person variations in food intake. To correct for measurement errors, we applied the National Cancer Institute (NCI) method to estimate usual intake of nutrients from foods. 3 As documented in prior literature, the NCI method is the preferred method for estimating usual intake distribution from 24-hour diet recalls. 4 A 2-step approach was used to estimate usual intake in the NCI method.
The first step (MIXTRAN macro) models the amount of a daily-consumed dietary factor but both the amount and probability of an episodically-consumed dietary factor. For the dietary factors that are episodically consumed with more than 5% of the individuals reporting zero intake on a given day such as fruits, non-starchy vegetables, whole grains, processed meats, unprocessed red meats, sugar-sweetened beverages (SSBs), we used a two-part model that estimates both the amount and probability of consumptionday. 4 The second step of the NCI method involves estimating usual intake with parameters estimated from the first step using mixed-effect linear regression on a transformed scale with a person-specific effect (INDIVINT macro). 3 The NCI method requires that some of the participants have multiple days of nutrient intake to estimate and separate the within and between-person variations. 5 In our study, 8683 also provided a second valid diet recall (86% of the 10064 participants who provided first valid recall). For each nutrient, the following covariates were specified in estimating usual intake: age group (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49), 50-64, 65+ years), sex, race/ethnicity (non-Hispanic white, non-Hispanic black, Hispanic, and other), an indicator of first versus second-day diet recall, and day of the week when recall occurs (weekday versus weekend). We further incorporated the weights from the complex survey sample design to estimate the mean intake and distribution of the dietary factors and nutrients included in this study.
Because distributions of many dietary factors are non-normal, we utilized a gamma distribution for each factor after confirming, using individual-level NHANES data, that this estimation is similar to the normal distribution for normally distributed dietary factors and closer to observed data for skewed dietary factors than normal or log-normal distributions; and that the gamma distribution also performs optimally for estimating population attributable fractions.
Specifically, based on a visual inspection of histograms, we concluded that, overall, the gamma distribution fit the NHANES data better than an alternative right-skewed distribution (the lognormal), particularly for foods where the intake is highly skewed, such as processed meat and sugar-sweetened beverages. Simulations done to compare estimated attributable mortality estimates assuming gamma, normal, and log-normal distributions to mortality estimates based on a non-parametric approach showed that estimates assuming the gamma distribution gave closer estimates to the non-parametric approach than the others. Because the mean and variance of the gamma distribution is a function of the parameters of the gamma distribution ( [ ] = , [ ] = 2 where is a gamma random variable, is the shape parameter and is the scale parameter), estimates for the gamma parameter can be obtained from mean and variance estimates that account for survey design characteristics.
( ) where is the the change in log relative risk per unit of exposure, is the current exposure level, and ( ) is the optimal (theoretical minimum risk, TMRED) exposure level. ( ) is defined to be where is the cumulative distribution function of the TMRED and −1 is the inverse cumulative distribution function of the current exposure distribution. Implicit in how we characterize the relative risk function are some of the fundamental assumptions we make about relative risk. Namely, that relative risk increases exponentially as distance from theoretical minimum risk exposure level ( ) increases, that there is no risk associated with exposure beyond the theoretical minimum risk exposure level, and that both and the theoretical minimum risk exposure level for an individual at exposure level are the -th quantile of their respective distributions (the observed exposure distribution, and the TMRED, respectively). TMRED was characterized based on the optimal distribution associated with lowest disease risk, assessed by the Global Burden of Disease (GBD) 2010 with three considerations: the availability of strong evidence that supports a continuous reduction in cancer risk to optimal intake; the distribution that is feasible at the population level; and consistency with major dietary guidelines. 6 In our analyses, is defined to be ∞. Since the density of a gamma distribution approaches 0 as exposure, , approaches infinity, and because implausibly high values of exposure should exceed the corresponding theoretical maximum exposure level, implausibly high values of exposure will make little to no contributions to the PAF.

Computation
In practice, we use simple numerical integration (using Riemann sums) to compute the integrals in the PAF formula. Thus, we used the categorical equivalent of the PAF formula where the categories are determined by dividing up the exposure range (chosen here to be [0, −1 ( (6))]) into 121 intervals, each of length 0.1 when converted to the standard normal scale (except for the first one). More precisely, the range of exposure group can be described as follows:

Joint PAF
Because summing would overestimate joint relationships, 6

Monte Carlo Simulations
Monte Carlo simulations were used to quantify uncertainty in the PAFs, incorporating uncertainty in dietary exposure distributions, etiologic RR estimates, and for BMI-mediated associations, prevalence of overweight/obesity. Specifically, for each diet disease pair and stratum, we drew randomly 1,000 times from the normal distribution of the estimate of diseasespecific change in the log(RR) corresponding to a one-unit increase in intake, the normal distribution of the estimate of the exposure mean, and where appropriate, the normal distribution of the estimate of the prevalence of obesity. Draws of proportions that were less than 0 or greater than 1 were changed to 0 or 1, respectively. Likewise, draws of mean intake that were zero or less were changed to 0.00001. Each set of random draws was used to calculate the PAFs and associated cancer incidence.

Population Attributable Fraction via Mediated Effects
In order to estimate diet-related cancer burden mediated by obesity, we associated the effect of or "probable". We included two additional cancer types (thyroid cancer and multiple myeloma) associated with body fatness evaluated by IARC for with evidence grading as "sufficient", for a total of 15 cancers associated with body fatness (BMI).
For SSB, current evidence does not support a direct association between SSB consumption and cancer risk. 11,12 Thus, the total cancer burden attributed to high SSB intake reflect entirely the BMI-mediated associations. For fruits, vegetables, whole grains, red meats, and processed meats, direct associations with cancer risk (after adjustment for BMI) were included; and the total cancer burden attributable to these dietary factors was a combination of cancer burden attributable to direct associations and that attributable to BMI-mediated associations.  13 The strength of evidence is categorized into "convincing", "probable", "limited-suggestive", "limited-no conclusion", and "substantial effect unlikely" (eTable 2 in the Supplement). Based on the WCRF/AICR grading system, we focused on 7 dietary targets having "convincing" or "probable" evidence for effects on cancer risk: fruits, non-starchy vegetables, whole grains, processed meats, red meats, whole grains, total dairy products, and sugar-sweetened beverages (SSBs) ( Table 1).

Appendix 2. Etiologic Relationships of Dietary Factors with Cancer Risk
The present analysis incorporated the RR estimates from meta-analyses of prospective cohort studies with limited evidence of bias from confounding, such as the RR estimates evaluated by WCRF/AICR on whole grains, processed meats, and red meats in association with risks of colorectal cancer 14 and stomach cancer (non-cardia). 15 Both whole grains and foods high in fiber decrease the risk of colorectal cancer, 14 and whole grain foods are an important source of dietary fiber. Similarly, both dairy products and foods high in calcium decrease the risk of colorectal cancer, 14 and dairy products are an important source of calcium. We included whole grains and total dairy products but not fiber and calcium in the same analysis to avoid overestimation. SSB was not quantified as a separate food group in WCRF/AICR CUP reports for cancer risk.
However, its causal impact on adiposity 16 provides strong support to include SSB as a dietary target for estimating cancer burden (eAppendix 1). [17][18][19] When RR estimates were not directly available from the WCRF/AICR reports for some cancers (mouth, pharynx, and larynx cancers) or dietary target (SSB), we performed systematic searches on PubMed to identify meta-analyses that evaluated these specific cancer types and dietary targets; or when eligible meta-analysis was not available, we conducted de novo meta-analyses of prospective cohort studies to estimate the RR (eAppendix 3 in the Supplement). Published meta-analyses were eligible if including randomized trials or prospective cohort studies of the identified diet-disease relationships of interest. Whenever possible, we prioritized meta-analyses that characterized dose-responses using all available data (as opposed to comparisons of extreme categories, e.g., highest vs. lowest quartiles). Meta-analysis including only retrospective casecontrol studies were excluded due to greater potential for selection bias, recall bias, and reverse causation. When more than one meta-analysis was identified for any diet-disease relationship, we included the dose-response estimates with the greatest number of studies and clinical events.
When meta-analysis based on randomized trials or prospective cohort studies was not available from WCRF/AICR reports, CUP, or PubMed systematic searchers, we developed our own systematic review protocols and performed de novo meta-analyses (e.g., for mouth, pharynx, and larynx cancers).

Study Designs Used for RR Estimates
Both prospective cohort studies and randomized controlled trials (RCTs) provide important evidence for identification of effects of dietary factors on cancer risk. However, RRs for diet and cancer risk came mostly from prospective cohort studies not RCTs. 20  when assigning evidence categories. 22 Thus, the RR estimates for diet and cancer risk are based on long-term prospective cohort studies that have adjusted for major confounders and, when available, for bias introduced by measurement error in the exposure 23 that generally produces underestimation of the true RR. 24 Notably, this approach has also been used for estimating the effects of smoking, alcohol, adiposity, and physical inactivity on cancer risk.

Heterogeneity of RR Estimates by Population Subgroups
In WCRF reports and our work to-date, the current evidence for diet and cancer risk, both from individual studies and meta-analyses, suggests that when measured comparably, the proportional effects considered in this proposal are similar across different age, sex, and racial/ethnic groups.
Because substantial interactions of age, sex, or race/ethnicity in these RR estimates are not observed, homogeneous RR estimates were applied in population subgroups. To systematically review and summarize the evidence from prospective cohort studies and/or randomized controlled trials on the association between fruit and vegetable intake and incidence and mortality of cancer in the mouth, pharynx and larynx in men and women.

Methods
We adapted the systematic review protocol of the WCRF/AICR CUP reports for cancers of mouth, pharynx and larynx, 25 following the recommendations of Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) and Meta-analysis of Observational Studies in Epidemiology (MOOSE) guidelines during all stages of the design, implementation, and reporting of this meta-analysis.

Inclusion criteria
• Study designs: Randomized controlled trials or prospective cohort studies (including nested case-control design) • Study population: Men, women or both 18 years or older • Studies in which the major outcome include esophageal cancer.
• Studies in which the outcome is exclusively cancer of the lip and/or salivary glands.
• Study results in which the exposure is a biomarker taken at or after cancer diagnosis.
• Articles written in language other than English, French, Spanish, Portuguese or Italian, if it is not possible to obtain a translation of the article.

Search database
The MEDLINE database will be searched using PubMed as platform. Hand search will be performed for the references of reviews and meta-analyses identified during the search.

Article selection
First, all references obtained with the searches in PubMed will be imported in the reference manager database "Abstrackr". 26 Second, two reviewers will independently screen the titles and abstracts of all references. Third, the reviewers will assess the full manuscripts of all papers for which eligibility could not be determined by reading the title and abstract. The reviewers will solve any disagreements about the study or exposure relevance by discussion with the principal investigator.

Data extraction
The data to be extracted include among others, the study design, name, characteristics of study population, age range, sex, country, recruitment year, methods of exposure assessment, definition of exposure, definition of outcome, method of outcome assessment, study size, number of cases, number of comparison subjects, length of follow up, lost to follow-up, analytical methods and whether methods for correction of measurement error were used.
The ranges, means or median values for each exposure level will be extracted as reported in the paper. The reviewer will not do any calculation during data extraction. For each result, the reviewers will extract the covariates and matching variables included in the analytical models.
Measures of association, number of cases and person years for each category of exposure will be extracted for each analytical model reported.
Some studies present results for the cancers of interest as separate outcomes (mouth, pharynx and larynx), combinations of these cancers, or total results for these cancers. In some cases, esophageal and nasopharyngeal cases may also be included. The reviewer will extract the results for each cancer site and for the cancer groups relevant to the review. The reviewer will also extract the results by sex, age group, race/ethnicity and other subgroups, if reported, and for combined results when presented in the paper. The data extracted will be double-checked by a second reviewer.

Meta-analysis
Dose-response meta-analysis will be conducted to express the results of each study in the same increment unit for a given exposure, using the "best" adjusted models. The best adjusted model will usually be the most adjusted model. When the linear dose-response estimate is reported in an article, this will be used in the dose-response meta-analysis. If the results are presented only for categorical exposures/intervention (quantiles or pre-defined categories), the slope of the doseresponse relationship for each study will be derived from the categorical data using generalized least-squares for trend estimation (command GLST in Stata). 27 The meta-analysis results will be shown in a dose-response forest plots. For comparability, the same increment units used in the meta-analyses of WCRF/AICR CUPs (e.g., servings/d) will be used to present the linear doseresponse analysis results.
If the dose response estimates are not reported in an article, this will be derived from categorical data This method accounts for the correlation between relative risks estimates with respect to the 20 same reference category. 28 The dose-response model is forcing the fitted line to go through the origin and whenever the assigned dose corresponding to the reference group (RR=1) is different from zero, this will be rescaled to zero and the assigned doses to the other exposure categories will be rescaled accordingly.
Heterogeneity between studies will be quantified with the I 2 statistic with cut points for I 2 values of 30%, and 50% for low, moderate, and high degrees of heterogeneity. 29 Heterogeneity will be assessed visually from forest plots and with statistical tests (P value <0.05 will be considered statistically significant) but the interpretation will rely mainly in the I 2 values as the test has low power and the number of studies will probably be low.  13 Evidence grading and RR estimates for cancers in mouth, pharynx and larynx were based on de novo meta-analysis of prospective cohort studies (eAppendix 3).