Metabolic signature of healthy lifestyle and its relation with risk of hepatocellular carcinoma in a large European cohort.

Background
Studies using metabolomic data have identified metabolites from several compound classes that are associated with disease-related lifestyle factors.


Objective
In this study, we identified metabolic signatures reflecting lifestyle patterns and related them to the risk of hepatocellular carcinoma (HCC) in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort.


Design
Within a nested case-control study of 147 incident HCC cases and 147 matched controls, partial least squares (PLS) analysis related 7 modified healthy lifestyle index (HLI) variables (diet, BMI, physical activity, lifetime alcohol, smoking, diabetes, and hepatitis) to 132 targeted serum-measured metabolites and a liver function score. The association between the resulting PLS scores and HCC risk was examined in multivariable conditional logistic regression models, where ORs and 95% CIs were computed.


Results
The lifestyle component's PLS score was negatively associated with lifetime alcohol, BMI, smoking, and diabetes, and positively associated with physical activity. Its metabolic counterpart was positively related to the metabolites sphingomyelin (SM) (OH) C14:1, C16:1, and C22:2, and negatively related to glutamate, hexoses, and the diacyl-phosphatidylcholine PC aaC32:1. The lifestyle and metabolomics components were inversely associated with HCC risk, with the ORs for a 1-SD increase in scores equal to 0.53 (95% CI: 0.38, 0.74) and 0.28 (0.18, 0.43), and the associated AUCs equal to 0.64 (0.57, 0.70) and 0.74 (0.69, 0.80), respectively.


Conclusions
This study identified a metabolic signature reflecting a healthy lifestyle pattern which was inversely associated with HCC risk. The metabolic profile displayed a stronger association with HCC than did the modified HLI derived from questionnaire data. Measuring a specific panel of metabolites may identify strata of the population at higher risk for HCC and can add substantial discrimination compared with questionnaire data. This trial was registered at clinicaltrials.gov as NCT03356535.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the predominant form of liver cancer and is the second most frequent cause of cancer death worldwide (1). HCC incidence rates have risen dramatically in Europe in the recent decades (2), along with HCC mortality, with unfavorable trends projected to 2020 (3). HCC is a multifactorial disease strongly associated with lifestyle factors (4). Although hepatitis infection remains its primary risk factor, other exposures, such as obesity, alcohol drinking, diabetes, smoking, physical activity, and some dietary factors, have been related to HCC risk (5)(6)(7). The rising prevalence of obesity, diabetes, and alcohol drinking may explain the observed trends in HCC incidence, particularly in regions where hepatitis infection rates are low (4,8). The prevalence of hepatitis B virus is estimated to be ∼0.9% and that of hepatitis C virus ∼1.1% in the European Union (compared with 1% and 0.8%, respectively, in the United States) (9, 10).
The etiology of HCC is complex, and the pathways implicated in hepatocarcinogenesis entail a broad range of mechanisms that are likely affected by more than one exposure (2). Better understanding of this process calls for a comprehensive approach to exploring HCC risk factors, in particular investigating the effects of multiple modifiable lifestyle factors as well as their biological correlates. Studies investigating etiologic determinants of HCC often rely on variables ascertained in self-administered questionnaires that may be subject to various biases, including measurement error and recall bias (11). The identification and application of biomarkers in this regard might not only improve measurement of the relevant etiological determinants, but may also help elucidate the link between lifestyle exposures and disease progression by revealing insights into the underlying biological mechanisms (12,13).
Metabolomics is an evolving methodology that measures a broad spectrum of small molecules to identify causal and mechanistic pathways in disease development and etiology (14). Recently, epidemiologic studies incorporating metabolomic data have identified metabolites related to a number of disease outcomes, including different cancer subtypes (15,16), diabetes (17), alcoholic hepatitis (18), and hepatobiliary disease (12). A number of investigations have also found metabolic   (19,20) and lifestyle exposures, including physical activity (19), obesity (21), alcohol consumption (13,22), and smoking (23). These exposures are established HCC risk factors and are components of an established healthy lifestyle index (HLI), the adherence to which has been associated with lower risks of noncommunicable diseases, including cancer (24,25).
In this study, the lifestyle variables of a modified HLI (24,25), including diet, BMI, physical activity, smoking, alcohol intake, diabetes, and hepatitis infection, were related to specific metabolic signals acquired through targeted metabolomics. Metabolic signatures reflecting healthy lifestyle behaviors were identified and related to HCC risk within a nested case-control study of HCC in the European Prospective Investigation into Cancer and Nutrition (EPIC).

Study population
The EPIC is a multinational prospective cohort study designed to investigate the link between diet, lifestyle, and environmental factors with cancer incidence and other chronic disease outcomes. Over 520,000 healthy men and women aged 25-85 y were enrolled between 1992 and 2000 across 23 EPIC administrative centers in 10 European countries: Denmark, France, Germany, Greece, Italy, Netherlands, Norway, Spain, Sweden, and the United Kingdom (26). In most of the EPIC centers, participants were recruited from the general population with the following exceptions: for France, women were enrolled from a health insurance scheme for school and university employees; in Utrecht (Netherlands) and in Florence (Italy), participants came from breast cancer screening programs; some centers in Italy (Turin and Ragusa) and Spain recruited blood donors; and the Oxford subcohort (United Kingdom) included mostly health-conscious individuals recruited throughout the United Kingdom. Finally, the French, Norwegian, and Naples (Italy) cohorts comprised only women. Extensive details of the study design and recruitment methods have been published previously (26).

Collection of dietary and lifestyle data
During the enrollment period, participants gave informed consent and completed questionnaires on diet, lifestyle, and medical history. Biological samples were collected for approximately 80% of the cohort prior to disease onset. Serum samples were stored at the International Agency for Research on Cancer (IARC), Lyon, France in −196°C liquid nitrogen for all countries, with the exception of the samples originating from Sweden (−80°C freezers) and Denmark (−150°C nitrogen vapor). Usual diet over the previous 12 mo was assessed for each individual through validated country-specific dietary questionnaires (26). Nutrient intakes were then estimated with the use of a common harmonized food composition database across EPIC countries (EPIC Nutrient Database, ENDB) (27). Information on sociodemographic data, including education, smoking, and alcohol drinking histories, as well as physical activity were gathered in lifestyle questionnaires. Anthropometric characteristics were directly measured by trained study personnel for most of the participants (26), but were self-reported in baseline questionnaires for a subset of participants from the EPIC-Oxford subcohort, although the accuracy of these self-reported data have been validated (28). Approval for this study was obtained from the ethical review boards of the participating institutions and IARC.

Nested case-control study of HCC
Within a nested case-control study of HCC (29) in EPIC, this analysis focused on 147 cases and 147 matched controls with available biological samples identified in the period between the subjects' recruitment into the cohort and 2010 (29) (Supplemental Figure 1). For each HCC case, one control (n = 147) was selected by incidence density sampling from all cohort members alive and free of cancer (except for nonmelanoma skin cancer), and matched by age at blood collection (±1 y), sex, study center, time of the day at blood collection (±3 h), and fasting status at blood collection (<3, 3-6, and >6 h); among women, the pair was additionally matched by menopausal status (pre-, peri-, and postmenopausal) and hormone replacement therapy use at time of blood collection (yes/no). Cases of HCC originated from all participating EPIC centers except for Norway and France, which were not included in this study. All subjects were cancer free at the time of blood collection.

Follow-up and case ascertainment in the nested case-control study
Follow-up started at date of entry to the study and finished at date of diagnosis, death, or last completed follow-up (from December 2004 up to June 2010), whichever came first. Cancer incidence was determined through population cancer registries or through active follow-up, as detailed elsewhere (29). Incident HCC cases were defined as first primary invasive tumours and identified through the 10th Revision of International Statistical Classification of Diseases, Injury and Causes of Death as C22.0, with morphology codes ICD-O-2 "8170/3" and "8180/3." Metastatic cases and other types of primary liver cancer were excluded.

Lifestyle variables
The panel of lifestyle variables included in this analysis were BMI (continuous, kg/m²), average lifetime alcohol intake (continuous, grams per day), the diet score (continuous) described in the Supplemental Methods, physical activity (continuous metabolic equivalents of task in metabolic equivalent hours per week), smoking (never, former smokers quit >10 y, former smokers quit ≤10 y, current smokers ≤15 cigarettes/d, current smokers >15 cigarettes/d), hepatitis infection (yes/no), and self-reported diabetes at baseline (yes/no). These are the components of an HLI used in EPIC (24,25), modified to include hepatitis and diabetes status, as detailed in Supplemental Methods and Supplemental Table 1. Average lifetime alcohol intake was used instead of alcohol intake at recruitment to address potential bias related to reverse causality.
These seven lifestyle variables are referred to herein as the X-set. Missing values in the X-set were imputed through a simple expectation maximization (EM) algorithm using the covariance matrix of the data (30).

Metabolomic data
Concentrations of a predefined set of common endogenous metabolic biomarkers were measured in serum samples at IARC, Lyon, France, with the use of a BIOCRATES AbsoluteIDQ p180 Kit (Biocrates), with ultra-performance liquid chromatography (1290 Series HPLC, Agilent) coupled to a tandem mass spectrometer (QTrap 5500, AB Sciex). Details of the sample preparation methods and mass spectrometry analyses are provided elsewhere (29). Platform reliability has been previously assessed with ICCs above 0.50 in 73% and 52% of the metabolites in fasting and nonfasting samples, respectively (31). Metabolites with CVs >20% for analytical replicates were excluded, resulting in 145 detected metabolites. Of these, metabolites with >40% of missing values were excluded (i.e., values that are below the limit of detection/quantification or above the highest calibration standards), resulting in a total of 132 metabolites being included in the study. Measurements that were below the limit of detection were set to half that value and those below the limit of quantification were set to half that limit (applicable to a total of 16 metabolites for 0.3-29.3% of participants). Additionally, measurements that were above the highest concentration calibration standards were set to the highest values. The metabolite nomenclature has been described previously (21), and can be found in Supplemental Methods.

Liver function score
A composite score indicative of liver function identifying the number of abnormal values for six circulating liver blood biomarker tests indicating possible underlying liver dysfunction (32) was included in the set of metabolites. The score entailed the following tests: alanine aminotransferase >55 U/L, aspartate aminotransferase >34 U/L, gamma-glutamyltransferase: men >64 U/L and women >36 U/L, alkaline phosphatase >150 U/L, albumin <35 g/L, and total bilirubin >20.5 μmol/L. The cut-off points were provided by the clinical biochemistry laboratory that conducted the analyses (Centre de Biologie République, Lyon, France) based on assay specifications, as previously described (32). These biomarkers were acquired at the same time as the metabolites from the prediagnostic blood samples collected at recruitment.
The set comprising the 132 metabolites and the liver function score is referred to as the M-set, containing the metabolomic data.

Identification of sources of systematic variability within the data
To identify and quantify the sources of systematic variability within the X-set of HLI variables and the M-set of metabolites, the Principal Component Partial R2 (PC-PR2) method was used (33). In this study, PC-PR2 was applied to the X-set of 7 exposure variables, with the covariates explored for systematic variability being country, age at recruitment, and sex. With the similar objective of identifying sources of variability in the metabolomic data, another PC-PR2 analysis was run on the M-set examining the contribution to variability of the following covariates: country, age at blood collection, batch, sex, BMI, diet score, physical activity, alcohol at recruitment, smoking, hepatitis, and diabetes at baseline. PC-PR2 combines aspects of principal  component analysis (PCA) with the partial R 2 statistic in multiple linear regression models. Briefly, PCA is performed on the set under scrutiny, and a number of components explaining an amount of total variability above a designated threshold (here, 80%) are retained. Multiple linear models are then fitted, where each component's variability is explained by regressing it on a list of relevant covariates, yielding an R 2 statistic for each of the latter. The R 2 quantifies the amount of variability each independent variable explained, conditionally on all other covariates included in the model. Finally, an overall partial R 2 is computed as a weighted mean for each covariate, with the eigenvalues as the components' weights. Subsequently, residuals on the variables accounting for most variability in lifestyle and metabolomic data were computed in a series of univariate linear regression models and were used in the following partial least squares (PLS) analyses.

Lifestyle pattern and metabolic signature assessment
With the aim of deriving lifestyle and metabolic signatures that mirror one another while relating the entire set of exposure variables to the metabolomic data, we used PLS analysis, a multivariate method that achieves dimensionality reduction. PLS generalizes features of PCA with those of multiple linear 2 Metabolite variables contributing to each PLS factor were selected based on extreme loading values, i.e., below or above the 2.5th and 97.5th percentiles. 3 Results from the main analysis using residuals based on country (X-and M-sets) and batch (M-set only) (n = 294, X-set = 7, M-set = 133). 4 Results from the sensitivity analysis using the same residuals as in the main analysis but excluding the liver function score from the M-set (n = 294, X-set = 7, M-set = 132). regression (34). This technique extracts linear combinations, referred to as PLS factors, of predictors (the X-set of lifestyle variables) and responses (the M-set of metabolites), allowing a simultaneous decomposition of both sets with the aim of maximizing their covariance (34). The mathematical and computational details of the PLS method and its applicability have been described thoroughly in our previous study on HCC within EPIC (35). The number of PLS factors to retain was selected after carrying out a 7-fold cross-validation to minimize the predicted residual sum of squares statistic, a measure of PLS performance. Details of the process can be found elsewhere (35). PLS factor loadings, i.e., the coefficients quantifying how much each original variable contributes to the PLS factor, characterized each extracted lifestyle and metabolic profile. As the M-set was particularly dense in metabolite variables, the interpretation of the metabolic profile mainly focused on those most significantly contributing to the PLS component, reporting variables with loading values lower than the 2.5th and larger than the 97.5th percentile. A sensitivity PLS analysis was performed by excluding the liver function score from the M-set. This was done to investigate the performance of the PLS-obtained signatures, ruling out any potential bias that may arise when including a composite score based on enzyme biomarker levels that was closely associated with HCC (32). Additionally, analyses were performed excluding pairs of cases and controls where cases were diagnosed within the first 2 and 4 y following blood collection and excluding case sets where cases and/or controls were hepatitis positive (Supplemental Table 2).

Conditional logistic regression analyses
The associations between the modified HLI as well as the PLS scores reflecting the lifestyle and metabolic signatures and the HCC risk were evaluated separately in conditional logistic regression models. ORs and their 95% CIs were computed to express a change in HCC risk reflecting a 1-SD increase in the index and the PLS scores, respectively. The models with the modified HLI and the main PLS analysis were unadjusted, whereas the models from the sensitivity PLS analysis were adjusted for the liver function score. These conditional logistic regression models were used to build the receiver operating characteristic (ROC) curves and determine their AUCs, to evaluate the predictive performance of the models. The DeLong test was used to evaluate the difference between AUCs where applicable. Associated statistics for sensitivity, specificity, and accuracy were computed for a cut-off point, selected as the minimal distance between the ROC curve and the upper left corner of the diagram. To account for the nested casecontrol design, the positive and negative predictive values (36) were computed by including the HCC prevalence within EPIC (π = 0.0004) calculated over an 8-y period corresponding to the mean follow-up time (from 1992-2002 to 2010) from a total of 477,206 participants included for case identification after relevant exclusions where 191 HCC cases were identified (29).
The different analytical steps described herein are outlined in Figure 1. All statistical tests were two-sided and P values <0.05 were considered statistically significant. Statistical analyses were performed with the use of PROC PLS in SAS (Base SAS 9.4, SAS Institute Inc.) for PLS analyses and R Software version 3.3.1 (R Foundation for Statistical Computing) for PC-PR2 analysis, conditional logistic regressions, and ROC-related statistics (packages survival and OptimalCutpoints).

RESULTS
Study population characteristics by case-control status are presented in Table 1. Following the application of PC-PR2, totals of  2 ORs are reported for a 1-SD increase either in the modified HLI or in the PLS X-and M-scores (lifestyle and metabolic signatures). The ORs from the HLI and the main PLS analysis were unadjusted, whereas the conditional regression models in the sensitivity analysis were adjusted for the liver function score. The OR results for the lifestyle and metabolic signatures were computed from two separated models, and were not mutually adjusted, in both the main and sensitivity analyses. Cases and controls were matched on age at blood collection (±1 y), sex, study center, date (±2 mo) and time of day at blood collection (± 3 h), and fasting status at blood collection (<3, 3-6, >6 h); women were additionally matched on menopausal status (pre-, peri-, postmenopausal) and hormone replacement therapy. 3 Results from the main analysis using residuals based on country (X-and M-sets) and batch (M-set only) (n = 294, X-set = 7, M-set = 133). 4 Results from the sensitivity analysis using the same residuals as in the main PLS analysis but excluding the liver function score from the M-set (n = 294, X-set = 7, M-set = 132). 5 This model was adjusted for the liver function score.
6 and 21 principal components were retained, explaining around 80% of the total variability among the modified HLI original variables and the metabolites set, respectively. Results reported in Supplemental Figure 2 showed that the ensemble of explanatory variables accounted for 10.7% and 29.5% of the total variance within the X-and M-sets, respectively. "Country of origin" was the highest contributor to the variance, with consistently 6.2% and 13.1% in the X-and M-sets, followed by "Batch", with 7.1% in the M-set (Supplemental Figure 2). For both the X-and M-sets, residuals on country and batch (M-set only) were computed in univariate linear regression models and used in the PLS analyses. One PLS factor was retained after 7-fold cross-validation for PLS analysis. The score plot of this factor is depicted in Supplemental Figure 3. The lifestyle PLS factor identified a "healthy" behavior profile, with positive loadings for physical activity and negative loadings for BMI, lifetime alcohol consumption, and smoking ( Table 2). The corresponding metabolic PLS factor was characterized by glutamic acid, hexoses and sphingomyelins (SMs), including SM(OH) C14:1, SM(OH) C16:1, and SM(OH) C22:2. The PLS lifestyle factor was inversely associated with HCC risk, with OR = 0.53 (95% CI: 0.38, 0.74, P = 2.64 × 10 −5 ) ( Table 3); the direction of the association was stronger than that of the modified HLI score, with OR = 0.82 (95% CI: 0.76, 0.89, P = 2.3 × 10 −6 ). However, the PLS metabolic signature showed a much stronger inverse association with HCC risk, with OR = 0.28 (95% CI: 0.18, 0.43, P = 8.0 × 10 −9 ) ( Table 3). The lifestyle and metabolic signatures remained virtually unchanged after excluding case sets where cases were diagnosed within the 2 and 4 y after blood collection. The association with HCC risk remained strong, with OR = 0.46 (95% CI: 0.32, 0.70, P = 3.03 × 10 −5 ) for lifestyle and OR = 0.27 (95% CI: 0.17, 0.43, P = 6.03 × 10 −8 ) for metabolic signature, respectively, for the lag time of 2 y, and OR = 0.54 (95% CI: 0.37, 0.80, P = 3.64 × 10 −4 ) and OR = 0.30 (95% CI: 0.19, 0.50, P = 1.69 × 10 −6 ) for the lag time of 4 y. Similarly, after excluding hepatitis-positive case sets, the extracted metabolic signature remained inversely related to HCC risk, with OR = 0.33 (95% CI: 0.21, 0.52, P = 2.50 × 10 −6 ) (Supplemental Table 2).

Statement of principal findings
In this analysis, we employed a multivariate dimension reduction method to identify a metabolic signature reflecting a "healthy lifestyle" profile. The metabolic signature was associated with a significant 72% (95% CI: 57%, 82%) reduction in HCC risk, which was statistically significantly different from the 51% (95% CI: 32%, 65%) risk reduction associated with a lifestyle profile based on questionnaire data. The metabolic signature had a higher predictability with respect to HCC, with AUC = 0.74 (95% CI: 0.69, 0.80) compared with AUC = 0.64 (95% CI: 0.57, 0.70) for the lifestyle profile. These findings highlight the potential for metabolic profiling to capture data on pathophysiological processes that are associated with lifestyle exposures and significantly improve identification of at-risk individuals in the general population.  1 The estimates for the lifestyle and metabolic signatures were computed from two separated models, and were not mutually adjusted, both in main and sensitivity analyses. HCC, hepatocellular carcinoma; HLI, healthy lifestyle index; LFS, liver function score; NPV, negative predicted value; PPV, positive predicted value; PLS, partial least squares; ROC, receiver operating characteristic curve.
2 P value from the DeLong test comparing the AUCs of the models including only the LFS with the models also including the lifestyle and the metabolic signature, respectively. 3 Results from the main analysis using residuals based on country (X-and M-sets) and batch (M-set only) (n = 294, X-set = 7, M-set = 133). 4 Results from the sensitivity analysis excluding the liver function score from the M-set (n = 294, X-set = 7, M-set = 132).

Strengths and limitations of the study
To our knowledge, this study is the first to derive metabolic signatures of healthy lifestyle behaviors and at the same time relate the signatures to the risk of HCC risk, thus offering objective information on the mechanistic processes involved. Our findings showed that metabolic profiles substantially improved discrimination of at-risk individuals compared with questionnaire-derived data and liver function tests (14). Hitherto, large-scale prospective studies examining the association between combined lifestyle factors or patterns of healthy behaviors with chronic disease (37), including cancer subtypes (24,38), have relied on indices derived from self-reported questionnaire-based variables (11), without including metabolites associated with healthy habits. The methodology described herein was tailored to HCC by examining a restricted set of 7 exposures from the modified HLI that have previously been associated with HCC risk (4-7, 39, 40) with the aim of identifying robust metabolic signals.
The derived metabolic signature remained strongly associated with a 71% decrease in HCC risk after separating out the liver function score from the M-set. In addition, the metabolic signature increased the prediction of HCC risk when it was added on top of the liver function score, with a higher AUC of 0.83 (95% CI: 0.79, 0.88), compared with AUC of 0.77 (95% CI: 0.72, 0.82), suggesting that metabolites can add further information to models that relate liver function enzymes to HCC risk only. Additional data from untargeted metabolomics might improve this prediction.
A limitation of the study is its small sample size, since HCC is a rare malignancy and few prospective cohorts have a sufficient number of incident cases with prediagnostic blood samples. As a consequence, both cases and controls were used in the PLS analysis to extract lifestyle and metabolic signatures, though the analysis was carried out blindly with regard to the case-control status and blood samples were obtained prior to cancer diagnosis. Despite the prospective nature of our design, we cannot rule out potential reverse causation, as the concentrations of some metabolites may have been modified by an underlying subclinical carcinogenic process. Yet, after performing sensitivity analyses excluding sets of cases and controls where cases were diagnosed within the first 2 (and 4) y after recruitment, the metabolic signatures remained stable and strongly inversely associated with HCC risk. It is noteworthy that we observed lower hepatitis positivity (∼10-15%) in our European population relative to the general population, where rates are as high as 30% for hepatitis B and/or C (41). In our analysis, missing values for hepatitis infection (∼25%) were imputed by an EM algorithm (30). Despite the small sample size, a sensitivity analysis was conducted excluding hepatitis-negative cases and controls, and the results indicated that although hepatitis infection plays a prominent role in HCC occurrence, signatures related to lifestyle habits can explain HCC carcinogenesis. Lastly, this study was conducted in a nested case-control sample within a European population. The associations between HLI and HCC risk was marginally heterogeneous by sex (results not shown) but, given the limited sample size, this could not be investigated further.

Comparison with other studies and potential mechanisms
A number of studies have investigated the association between individual lifestyle (13,17,20,23), anthropometric (42) and food items (20,43), with concentrations of blood metabolites relying on high-resolution nuclear magnetic resonance or mass spectroscopy, yet only a few have studied metabolic signatures of a number of exposures simultaneously. More recently, metabolite phenotyping has been used to objectively assess dietary patterns by identifying metabolite profiles of healthy eating patterns (20,44).
In our study, the metabolites contributing the greatest loadings to the HLI-related metabolic signatures were glutamic acid, hexoses, sphingomyelins [SM C16:1, SM(OH) C14:1, C16:1, and C22:2], and phosphatidylcholines [LysoPC aC28:1, diacyl-phosphatidylcholine (PC aa) C32:1, and PC aeC30:2]. These patterns remained virtually unchanged in the sensitivity analysis where the score of liver enzymes was removed from the set. Results from two studies conducted in a component subcohort of EPIC (19,42) showed that PC aes and diacyl-phosphatidylcholines (PC aas) were, respectively, negatively and positively associated with BMI and other anthropometric markers of obesity, including waist and hip circumferences. PC aas and LysoPCs were inversely associated with dietary fiber sources, although LysoPCs were positively related to physical activity (19).
Ethanol has been hypothesized to disrupt the metabolism of phospholipids by inducing lipogenesis in the liver tissues (45), leading to hepatic injuries and precancerous lesions. High intakes of alcohol can result in the activation of the enzyme acid sphingomyelinase, which likely accelerates the catabolism of SMs into ceramide and PCs (46), and in turn leads to hepatotoxicity and HCC (47). Glutamic acid concentrations were positively associated with alcohol intake in Japanese men of the Tsuruoka Metabolomics Cohort Study (13). SM(OH) C22:2 was negatively associated with smoking in women in KORA S4 (23) and with alcohol intake in the CARLA study (22). PC aaC32:1 was positively related to alcohol use in CARLA (22), type 2 diabetes (T2D) in EPIC-Potsdam (17), and high prevalence of smoking in men (22) and low prevalence of smoking in women (23). Free radicals contained in cigarette smoke promote lipid peroxidation (48), which can result in liver disease (49). Smoking also lowers levels of PC aes and raises those of PC aas, impeding lipid remodeling in membranes, which results in inflammation (50), and may in turn lead to liver injury (51).
A number of studies based on targeted (17) and untargeted (52) metabolomic data identified hexoses as biomarkers of diabetes and related it to increased risk of T2D. In our study, the sum of hexoses was negatively associated with the metabolic signature. Hexoses relate to hyperglycemia, which can be monitored in diabetic patients, who are at higher risk of developing HCC (4). In a study using metabolomics for biomarker discovery of T2D (17), SM C16:1 was associated with decreased T2D risk and was previously identified as a prediabetes biomarker (53).
Lastly, some of the serum liver enzymes in the liver function score have previously been related to high ethanol intake (13), hepatitis infection (18), and low physical exercise (7), and can be used as a panel to detect abnormal conditions, such as hepatitis and cirrhosis, preceding HCC (51). A panel of metabolites, including the hexoses, the identified phospholipids, and glutamic acid, can be measured in at-risk patients to detect their propensity to develop HCC, in addition to key liver enzymes.

Conclusions
Our findings indicate that a specific panel of metabolites linked to healthy lifestyle habits was strongly associated with HCC and can substantially improve HCC risk prediction beyond questionnaire-derived variables and liver function tests. The identified metabolites may also offer insights into the underlying biological mechanisms driving HCC development. Replication of these findings in an independent setting is now warranted to enhance the understanding of the relation between lifestyle exposures and health outcomes through metabolic profiling.
The authors thank Bertrand Hémon and Carine Biessy from IARC for their help with issues related to data management.
The authors' responsibilities were as follows-NA, PF, and VV: conceptualized the study and defined the analytical strategy; NA: was responsible for the statistical analyses, provided preliminary interpretation of the findings and developed the first draft of the manuscript; NA, MJG, VV, and PF: wrote the paper; PF, VV, DCT, ML, MS, MJ, A Scalbert, and MJG contributed to the drafting of the manuscript; VC, TP, CB, M-CBR, TMS, AM, HB, A Sundkvist, TK, RT, KO, ER, and MJG substantially contributed to the interpretation of results and critically revised the content of the manuscript; and all authors: critically helped in the interpretation of results, provided relevant intellectual input, and read, revised, and approved the final manuscript. NA was financially supported by the Université Claude Bernard Lyon I through a doctoral fellowship awarded by the EDISS (Ecole Doctorale InterDisciplinaire Sciences Santé) doctoral school to complete her PhD work. NA also holds a grant from the Fondation de France (FdF) supporting her postdoctoral research (grant number: 00069254). The data on the EPIC-Hepatobiliary dataset was generated through support from the French National Cancer Institute (L'Institut National du Cancer; INCA) (grant number 2009-139; PI: MJ). The work undertaken by DCT reported in this publication was supported by the National Institutes of Health under award number P01 CA196559. RT is a co-principal investigator of the EPIC-Oxford cohort whose work is supported by Cancer Research UK under grant number C8221/A19170. None of the authors reported a conflict of interest related to the study.