Systematic investigation of genetically determined plasma and urinary metabolites to discover potential interventional targets for colorectal cancer

Abstract Background We aimed to identify plasma and urinary metabolites related to colorectal cancer (CRC) risk and elucidate their mediator role in the associations between modifiable risk factors and CRC. Methods Metabolite quantitative trait loci were derived from 2 published metabolomics genome-wide association studies, and summary-level data were extracted for 651 plasma metabolites and 208 urinary metabolites. Genetic associations with CRC were obtained from a large-scale genome-wide association study meta-analysis (100 204 cases, 154 587 controls) and the FinnGen cohort (4957 cases, 304 197 controls). Mendelian randomization and colocalization analyses were performed to evaluate the causal roles of metabolites in CRC. Druggability evaluation was employed to prioritize potential therapeutic targets. Multivariable Mendelian randomization and mediation estimation were conducted to elucidate the mediating effects of metabolites on the associations between modifiable risk factors and CRC. Results The study identified 30 plasma metabolites and 4 urinary metabolites for CRC. Plasma sphingomyelin and urinary lactose, which were positively associated with CRC risk, could be modulated by drug interventions (ie, olipudase alfa, tilactase). Thirteen modifiable risk factors were associated with 9 metabolites, and 8 of these modifiable risk factors were associated with CRC risk. These 9 metabolites mediated the effect of modifiable risk factors (Actinobacteria, body mass index, waist to hip ratio, fasting insulin, smoking initiation) on CRC. Conclusion This study identified key metabolite biomarkers associated with CRC and elucidated their mediator roles in the associations between modifiable risk factors and CRC. These findings provide new insights into the etiology and potential therapeutic targets for CRC and the etiological pathways of modifiable environmental factors with CRC.

Colorectal cancer (CRC) is the third-most common malignant tumor and the second-most common cause of cancer death in the world (1).Evidence suggests that metabolic alterations are tied to the occurrence and progression of CRC (2).Some metabolic pathways, such as glycolysis (3), fatty acid metabolism (4), and gut flora metabolism (5), are distinct between colorectal tumors and normal mucosa.A better understanding of these metabolic changes may contribute to uncovering the etiological mechanism of CRC and the development of novel strategies for CRC prevention, diagnosis, and treatment.
Metabolites are the intermediate link between genetic factors or environmental exposures and diseases, and they can accurately reflect the current health state of individuals as the endpoints of cellular pathways and biological processes (6).Currently, alterations in the plasma metabolites have been observed between patients with CRC and controls (7-13).Most of these studies were observational, however, and limited to candidate approaches with few numbers of metabolites, single biological sample sources, or small sample sizes, which restricted their ability to understand the causal role of metabolites in CRC.Furthermore, given that plasma and urine metabolites are strongly influenced by lifestyle factors such as diet, smoking, alcohol, and drugs, these metabolites could be potential mediators underlying the detrimental effect of lifestyle risk factors and hence may represent interventional targets (14).
As an analytic strategy for investigating causal relationships between exposures and outcomes by using genetic variations as instrumental variables, Mendelian randomization has the dual benefits of minimizing confounding and diminishing reverse causality (15).Here, we employed metabolome-wide Mendelian randomization and colocalization analyses to identify causal plasma and urinary metabolites of CRC by integrating human metabolome with genome data from large-scale genome-wide association studies (GWASs).We then evaluated whether the identified CRC-related metabolites can be modulated by pharmacologic or lifestyle interventions.

Methods
Figure 1 shows the overall study design.First, we performed a 2stage (discovery and replication) metabolome-wide Mendelian randomization analysis to examine causal associations of plasma and urinary metabolites with CRC risk, using metabolite quantitative trait locus data derived from 2 large-scale metabolomics GWASs.The statistically significant associations between metabolites and CRC were further prioritized by Bayesian colocalization.Then, we evaluated the druggability of identified metabolites and systematically scanned modifiable risk factors associated with CRC-related metabolites and CRC by employing univariate Mendelian randomization.Last, multivariable Mendelian randomization and mediation analyses were performed to elucidate the metabolic mediators of the associations between modifiable risk factors and CRC.

Study population and datasets
The largest CRC GWAS dataset (100 204 cases and 154 587 controls) to date, covering 22 million single-nucleotide variations (SNVs; formerly single-nucleotide polymorphisms) (16), was employed in the discovery metabolome-wide Mendelian randomization analysis.Details of the study population, genotyping, and imputation have previously been described (16).Independent GWAS summary data covering 20 million SNVs on the basis of 4957 CRC cases and 304 197 controls from the FinnGen cohort were used in the replication stage (17).Supplementary Table 1 (available online) presents the basic characteristics of these CRC GWASs.Ethics approvals were obtained from the relevant authorities, and all participants provided informed consent (16,17).
Step Summary statistics of genetic associations with plasma and urinary metabolites were extracted from 2 large-scale metabolomics studies with 690 plasma metabolites (18) and 211 urinary metabolites (19), respectively.These metabolites were measured using an ultra-high-performance liquid chromatography-tandem mass spectrometry platform.Supplementary

Metabolome-wide Mendelian randomization analysis
The genetic instruments were selected using metabolite quantitative trait loci from the above-mentioned 2 metabolomics studies.Supplementary Methods (available online) show the details for SNV selection criteria.After matching and harmonizing with CRC outcome data, a total of 1237 instruments for 651 unique plasma metabolites and 233 instruments for 208 unique urinary metabolites remained.Details for all the instrumental variables are shown in Supplementary Tables 4 and 5 (available online).
The TwoSampleMR package (20) was used to conduct metabolome-wide Mendelian randomization analysis.The method details are described in Supplementary Methods (available online).A strict Bonferroni correction method was adopted for correction of multiple testing in discovery stage to reduce false-positive findings.For statistically significant metabolites in the discovery dataset, we further conducted Mendelian randomization analysis to replicate their associations using CRC GWAS summary statistics from the FinnGen cohort.Finally, the random-effects meta-analysis was performed to estimate the combined estimate for each metabolite from discovery and replication datasets.

Bayesian colocalization analysis
Bayesian colocalization analysis was performed using the coloc package (21) on the basis of summary statistics of identified metabolites and CRC meta-GWASs to assess whether 2 associated signals (metabolite and CRC risk) were driven by a shared causal genetic variant to distinguish the confounding of linkage disequilibrium (LD).The strong evidence of colocalization was defined as the posterior probability for the hypothesis 4 (H4) greater than 80%.The method details are described in Supplementary Methods (available online).

Druggability evaluation
We searched the targets and drug information using DrugBank (22) and ChEMBL (23) to evaluate whether the identified metabolites could serve as potential therapeutic targets.DrugBank and ChEMBL prioritized the potential druggable targets by integrating information from text mining, gene function, drug-gene interactions, and expert curation.The information and the development process of drugs that targeted identified metabolites were documented.

Multivariable Mendelian randomization and mediation analyses
Furthermore, to uncover modifiable risk factors that can modulate CRC-related metabolites, we first employed univariate Mendelian randomization analysis to systematically evaluate the relationships of modifiable risk factors with identified metabolites and CRC risk, respectively.A false discovery rate by Benjamini-Hochberg adjusted P < .05 was identified as the significance level.We then performed multivariable Mendelian randomization to test whether metabolites mediated the effect of modifiable factors on CRC.The method details are described in Supplementary Methods (available online).Finally, the mediated proportion was calculated using the formula (total effect -direct effect)/total effect.All statistical analyses were conducted in R, version 4.1.0(R Foundation for Statistical Computing, Vienna, Austria).

Metabolome-wide Mendelian randomization analysis identified 33 metabolites for CRC
The F statistics for all instruments of metabolites were above 10, indicating a good strength (Supplementary Tables 4 to 5, available online).Metabolome-wide Mendelian randomization identified that 102 plasma metabolites and 25 urinary metabolites were associated with CRC risk with nominal significance (P < .05)(Supplementary Tables 6 to 7, available online).Among them, genetically predicted ethylmalonate and methylsuccinate levels in both plasma and urine were positively associated with CRC, while levels of X-19141 and X-12707 in both plasma and urine were inversely associated with CRC (Supplementary Figure 1, available online).After Bonferroni correction, a total of 30 plasma metabolites (P < 7.68 × 10 -5 ) (Table 1, Figure 2) and 4 urinary metabolites (P < 2.39 × 10 -4 ) (Table 1; Supplementary Figure 2, available online) were statistically significantly associated with CRC.Genetically predicted levels of 22 plasma metabolites and 2 urinary metabolites were positively associated with CRC risk, while the other 8 plasma metabolites and 2 urinary metabolites were inversely associated with CRC.For metabolites that had 2 sources and survived after Bonferroni correction in at least 1 source, Mendelian randomization results of both plasma and urine sources and power are shown in Supplementary Table 8 (available online).No evidence of heterogeneity and pleiotropy was observed (P heterogeneity > .05,P pleiotropy > .05)(Supplementary Table 9, available online).For 6 metabolite pairs, covering a total of 19 metabolites, that had partly overlapping instruments, multivariable Mendelian randomization prioritized 6 metabolites that had a more dominant effect on CRC risk (P < .008)(Supplementary Table 10, available online).In stratification analysis by ethnicity, most of these metabolite-CRC associations were still statistically significant in White and Asian populations with consistent effect direction (P < .05)and no statistically significant heterogeneity by ethnicity (Supplementary Table 11, available online).
In the replication stage, 27 plasma metabolites and 4 urinary metabolites were successfully validated in the FinnGen dataset (P < .05)(Table 1).In the meta-analysis of discovery and replication datasets, 29 plasma metabolites and 4 urinary metabolites displayed statistically significant associations with CRC risk, which could be classified into 13 subcategories (Table 1).

Metabolites were supported by colocalization evidence
A total of 24 metabolites (23 plasma and 1 urinary) were supported by strong evidence of colocalization, with posterior probability for H4 greater than 80% across different window sizes, suggesting high probability for a shared causal genetic variant for metabolites and CRC risk (Table 1; Supplementary Table 12

Nine metabolites as interventional targets by modifiable factors
In univariable Mendelian randomization analysis of modifiable factors with 32 metabolites that have convincing evidence (tiers 1 and 2), we found that a total of 13 modifiable risk factors (2 dietary factors, 2 gut microbial taxa, 5 lifestyle factors, 4 obesityrelated factors) were associated with 9 CRC-related metabolites (false discovery rate < 0.05) (Figure 3; Supplementary Table 14, available online), and 8 of 13 modifiable factors were also associated with CRC risk (false discovery rate < 0.05) (Supplementary Table 15, available online).

Metabolites partially mediate the effect of modifiable factors on CRC
Figure 4 shows the modifiable factor-metabolites-CRC pairs with mediating effects.For 8 modifiable factor-metabolites-CRC pairs, 7 pairs (except for milk intake) had full summary data and were further evaluated using multivariable Mendelian randomization.
The associations of phylum Actinobacteria, class Actinobacteria, body mass index (BMI), waist to hip ratio, fasting insulin, and    smoking initiation with CRC risk were attenuated in the multivariable Mendelian randomization analyses, with adjustment for metabolites (Figure 4), whereas the association of leisure television watching with CRC became stronger (Supplementary Figure 3, available online).Among them, genetically predicted levels of plasma galactonate and urinary lactose mediated 95% of the effect of phylum Actinobacteria on CRC and 76% of the effect of class Actinobacteria on CRC (Figure 4).

Discussion
In this study, we performed a comprehensive investigation on associations of plasma and urinary metabolites with CRC risk.
The discovery metabolome-wide Mendelian randomization analysis identified 30 plasma and 4 urinary metabolites associated with CRC, and most of them showed cross-ethnicity effect consistencies.The replication Mendelian randomization validated 30 candidate metabolites, and 24 metabolites were supported by colocalization evidence.Collectively, 22, 10, and 2 metabolites were classified into the most convincing evidence (tier 1), convincing evidence (tier 2), and low evidence (tier 3) groups, respectively.Druggability evaluation prioritized 2 CRC-related metabolites (ie, sphingomyelin, lactose) that could be modified by drug interventions; additionally, 9 of these metabolites could be modulated by modifiable risk factors.Multivariable Mendelian randomization analyses indicated that the effect of modifiable factors (ie, Actinobacteria, smoking initiation, BMI, waist to hip ratio, and fasting insulin) on CRC were partially mediated by these identified metabolites.
Our findings indicated a potentially important role of long chain polyunsaturated fatty acid (n3 and n6) (adrenate [22:4n6], arachidonate [20:4n6], stearidonate [18:4n3], EPA, n3 DPA, n6 DPA, nisinate [24:6n3], docosatrienoate [22:3n6] � ) in CRC liability.The observed relationships of adrenate (22:4n6), arachidonate (20:4n6), and stearidonate (18:4n3) with CRC are consistent with previous findings (24)(25)(26), but current evidence of EPA and DPA on CRC are conflicting.Previous Mendelian randomization studies and our findings indicated that high plasma EPA, n3 DPA, and n6 DPA were positively associated with CRC (25,27).Observational studies showed null association between blood DPA and EPA levels and CRC and even inverse association between dietary their intake and CRC risk (27).Given that in dietary assessments it is difficult to be precise and observational studies are susceptible to confounding factors, further intervention studies are needed to explain these inconsistent findings.We additionally found positive associations of nisinate (24:6n3) and docosatrienoate (22:3n6) � with CRC.In rodents, nisinate (24:6n3) is both a product of and a precursor to docosahexaenoic acid in the n-3 PUFA biosynthetic pathway, and docosahexaenoic acid has been reported to influence the invasion in CRC cells (28).
We observed detrimental effects of plasma mannose and urinary lactose and the protective effect of plasma galactonate on CRC.Consistently, Long et al. (9) found that patients with CRC had higher levels of mannose than controls.A randomized controlled trial indicated that intake of lactose-rich foods increased the risk of diarrhea in patients with CRC (31).Galactonate is a metabolic breakdown product of galactose, a monosaccharide that together with glucose forms lactose.Intestinal galactose shows a protective effect against colon cancer through binding lectins and inhibiting mucosal proliferation, and the lower level of galactose leads to the pathogenetic process of CRC (32).γ-Glutamylthreonine is a dipeptide composed of γ-glutamate and threonine.Both our study and a previous Mendelian randomization study reported a positive association between blood γ-glutamylthreonine and CRC (12).
The identified metabolites could be modulated by either pharmacological intervention or modifiable factors.Specifically, 2 metabolites (sphingomyelin, lactose) could be modulated by drugs used to treat acid sphingomyelinase deficiency or irritable bowel syndrome.Also, avoiding excessive consumption of food products containing lactose could be efficient.Nine of the identified CRC-related metabolites were observed to be affected by modifiable factors.The identified metabolites partially mediated the effect of Actinobacteria, BMI, waist to hip ratio, fasting insulin, and smoking initiation on CRC.The positive association between Actinobacteria and CRC was found to be mediated by higher levels of urinary lactose and decreased levels of plasma galactonate.Similarly, a nested case-control study reported that compared with controls, patients with CRC had more abundant oral Actinobacteria (33).Other studies also showed that Actinobacteriota was 1 of the dominant colonic mucosal microbiota in patients with CRC (34) and has shown abundance even in the colorectal adenomas stage (35).Because these results are derived by retrospective studies, however, they may be due to reverse causality.The association of Actinobacteria with lactase gene (LCT) has previously been documented, suggesting an interaction of Actinobacteria with the gut and lactose metabolism (36).Consumption of lactose may increase its availability in colonic bacteria (such as Actinobacteria and Negativibacillus) that use it as the energy source for which to compete, especially for adults with lactase deficiency (36).An abundance of Actinobacteria may lead to metabolic disorders of lactose and galactonate and could contribute to an increased CRC risk.Plasma mannose was found to partially mediate the relationships of smoking initiation, BMI, and waist to hip ratio with CRC.Similarly, Long et al. (9) found a statistically significant joint effect of smoking with mannose and a statistically significant interaction between BMI and mannose in modifying CRC risk.We found that fasting insulin was positively associated with CRC and that the association was partially mediated by plasma 1-lignoceroyl-GPC (24:0) and 1-linoleoyl-2-linolenoyl-GPC (18:2/18:3) � .Gut microbiota, obesity, insulin resistance, and smoking have been linked to the etiology of CRC by abundant evidence from observational studies (37).We expanded the causal evidence and elucidated the potential etiologic metabolic pathways of these modifiable factors with CRC.
The current study has several strengths.First, to our knowledge, this study is the first to comprehensively evaluate the causal associations of metabolites from plasma and urine with CRC, which helps provide new insights into the etiology and potential therapeutic targets for CRC.Second, the present study was performed by employing Mendelian randomization design and colocalization based on well-designed GWASs with large sample sizes, which enhanced the statistical power and reduced the risk of confounding bias and reverse causation.Additionally, we assessed the mediating role of CRC-related metabolites in modifiable factors and CRC, which provided new insights into the etiologic pathways of modifiable environmental factors with CRC.Several limitations of this study should be acknowledged, as well.First, the strict significance threshold of Bonferroni correction in discovery Mendelian randomization may filter out some important metabolites, although these findings may be less prone to false-positive errors.Second, instrumental variables for gut microbiota in multivariable Mendelian randomization stage were selected at a more lenient threshold value (P < 5 × 10 − 6 ), as indicated by the original microbiome GWAS of MiBioGen consortium and other Mendelian randomization studies that the selection of associated SNVs using lenient P value thresholds had the greatest explained variance on microbial features (38,39).The F statistics of all instrumental variables used in the current study were above 10, indicating low risk of weak instruments bias.Third, participants of urine metabolites GWASs had reduced kidney function, which may not be representative of the general population, despite the relevant genetic effects on most urine metabolite concentrations between individuals with reduced kidney function and the general population (19).Further studies based on the general population are required.Fourth, the mediation effects of modifiable factor-metabolite-CRC pairs were discovered mainly by statistical analysis.Further experiment research is needed to verify these findings and elucidate the underlying biological mechanism.
This study identified key metabolites with a potential causal association with CRC risk and elucidated the metabolic mediators of the effect of modifiable risk factors on CRC.Our findings provide new insights into the etiology and potential therapeutic targets for CRC and the etiologic pathways of modifiable environmental factors with CRC.Further interventional studies are needed to evaluate whether the concentrations of these metabolites could be modified through drug intervention or lifestyle changes and ultimately reduce CRC risk.

1Figure 2 .
Figure 2. Volcano plot showing results from plasma metabolome-wide Mendelian randomization in the discovery stage.The metabolome-wide Mendelian randomization was performed based on summary statistics of genetic associations of a large-scale plasma metabolomics study and CRC meta-GWAS summary data to test associations of 651 plasma metabolites with CRC risk.Metabolites that survived after Bonferroni correction are labeled.CRC ¼ colorectal cancer.

Table 1 .
, Summary results from Mendelian randomization, meta-analysis, and colocalization for metabolome-wide Mendelian randomization identified metabolites

Table 1 .
(continued) The metabolome-wide Mendelian randomization was performed based on summary statistics of genetic associations of a large-scale plasma metabolomics study and CRC meta-GWAS summary data to test associations of 651 plasma metabolites with CRC risk.Metabolites that survived after Bonferroni correction are labeled.CRC ¼ colorectal cancer.