-
PDF
- Split View
-
Views
-
Cite
Cite
David M Hein, Weiye Deng, MaryLena Bleile, Syed Ali Kazmi, Brooke Rhead, Francisco M De La Vega, Amy L Jones, Radhika Kainthla, Wen Jiang, Brandi Cantarel, Nina N Sanford, Racial and Ethnic Differences in Genomic Profiling of Early Onset Colorectal Cancer, JNCI: Journal of the National Cancer Institute, Volume 114, Issue 5, May 2022, Pages 775–778, https://doi.org/10.1093/jnci/djac014
- Share Icon Share
Abstract
The incidence and mortality of early onset colorectal cancer (EOCRC) is rising; outcomes appear to differ by race and ethnicity. We aimed to assess differences in mutational landscape and gene expression of EOCRC by racial and ethnic groups (non-Hispanic Asian, non-Hispanic Black, non-Hispanic White, White Hispanic) using data from the American Association for Cancer Research Project GENIE (10.2) and University of Texas Southwestern, the latter enriched in Hispanic patients. All statistical tests were 2-sided. Of 1752 EOCRC patients, non-Hispanic Black patients had higher rates of KRAS mutations (60.9%; P = .001, q = 0.015), and non-Hispanic White and non-Hispanic Black patients had higher rates of APC mutations (77.1% and 76.6% among non-Hispanic White and non-Hispanic Black patients, respectively; P = .001, q = 0.015) via the Fisher exact test with Benjamini-Hochberg correction. Using R packages DESeq2 and clusterProfiler, we found that White Hispanic patients had increased expression of genes involved in oxidative phosphorylation (P < .001, q = 0.025). Genomic profiling has the potential to identify novel diagnostics and influence individualized treatment options to address the currently limited prognosis of EOCRC.
Over the last 3 decades, there has been an unexplained increase in incidence and mortality of colorectal cancer (CRC) in individuals younger than 50 years, so termed early onset CRC (EOCRC) (1). Although EOCRC tends to have different clinical features as compared with CRC diagnosed in individuals aged 50 years or older (older adults), including later stage of presentation and more left-sided disease, studies comparing molecular features of CRC in younger vs older patients have shown similar genomic landscapes (2,3).
Within EOCRC, there are disparities in incidence and outcome by race and ethnicity. In particular, Black patients have worse survival (4), and the incidence of EOCRC in Hispanic patients appears to be rising disproportionately (5). Within the general CRC population, next-generation sequencing (NGS) technologies have yielded predictive and prognostic information for some patients. To date, there have been no prior studies on somatic mutational differences in EOCRC by race and ethnicity, in part because of underrepresentation of minority patients across large-scale genomic databases (6).
Here, we combined data on patients with EOCRC from a multi-institutional clinical NGS database, the American Association for Cancer Research (AACR) Project GENIE (7), along with our institutional cohort enriched in Hispanic patients, and assessed whether oncogenic drivers of EOCRC differed by racial and ethnic groups. Furthermore, we used RNA sequencing (RNAseq) data to examine gene expression profiles in EOCRC by race and ethnicity.
Our cohort comprised patients aged 18-49 years diagnosed with colon or rectal adenocarcinoma. Patients included those in the AACR GENIE database (10.2) who had NGS from the following panels covering more than 200 somatic gene mutations, with full exon coverage, and information on copy number alterations: DFCI-Oncopanel 1, 2, 3, 3.1; MSK-IMPACT 341, 410, 448; and VICC-01-T5A, T7. If a patient had more than 1 sample in the GENIE database (n = 103), only the first sample was included. We also included our Hispanic-enriched cohort at the University of Texas Southwestern (UTSW; Simmons Comprehensive Cancer Center and Parkland Health and Hospital System) sequenced using the Tempus xT 648-gene panel (8). Patients were classified into the following racial and ethnic groups: non-Hispanic Asian, non-Hispanic Black, non-Hispanic White, and White Hispanic. For patients in the AACR GENIE database, race and ethnicity were reported by each institution and self-reported by patients at hospital intake. For standardization, the reported race and ethnicity were mapped to the North American Association of Central Cancer Registries Data Dictionary item numbers 160 (race) and 190 (ethnicity) (7). For the UTSW cohort, race and ethnicity were self-reported at hospital intake and physician verified on Tempus intake form. We also obtained genetic ancestry for the UTSW Tempus-sequenced cohort and include this information (Supplementary Table 1, available online).
We used the Fisher exact test to test for enrichment by race and ethnicity of mutations in 22 genes with clinical or potential clinical implications in CRC: KRAS, NRAS, BRAF, APC, TP53, BRCA1, BRCA2, EGFR, SMAD4, ERBB2, PIK3CA, PTEN, CDH1, MUTYH, NTRK1, NTRK2, NTRK3, FBXW7, MLH1, PMS2, MSH6, and MSH2 (9). Using the same statistical method, we also compared frequency of copy number gains in ERBB2 and EGFR and copy number loss in SMAD4 (10). The Benjamini-Hochberg procedure was used to correct for multiple tests with a q value less than 0.05 considered statistically significant. All tests were 2-sided. Statistical tests were completed in R (version 4.1.0).
We used the elbow, silhouette, and gap statistic method to determine the optimal number (3) of clusters of transcriptional profiles (11-13). First, we clustered RNAseq samples into 3 groups using k-means then performed differential expression analysis using DESeq2 (version 1.32.0) and pathway analysis using clusterProfiler (version 4.0.0) with WikiPathways database version 20210610 (14-18). Additionally, we tested for enrichment of race and ethnicity by cluster using the Fisher exact test. See the Supplementary Methods (available online) for additional information. The study was approved by the University of Texas Southwestern institutional review board.
Among 1752 patients in our EOCRC cohort, 137 (7.8%) were non-Hispanic Asian, 128 (7.3%) were non-Hispanic Black, 1382 (78.9%) were non-Hispanic White, and 105 (6.0%) were White Hispanic (Supplementary Table 2, available online). Among UTSW patients, 3 (5.5%) were non-Hispanic Asian, 8 (14.5%) were non-Hispanic Black, 12 (21.8%) were non-Hispanic White, and 32 of 55 (58%) patients were White Hispanic; agreement between self-reported and imputed ethnicity derived from genetic ancestry was high (94.5%) (Supplementary Table 1, available online). A total of 6 patients in the UTSW cohort were microsatellite instability high. The 10 most frequently mutated genes included TP53, APC, KRAS, PIK3CA, FBXW7, SMAD4, TCF7L2, KMT2D, ARID1A, and SOX9, which ranged in mutation frequency from 76.2% (TP53) to 11.4% (SOX9) (Supplementary Figure 1, available online). When comparing among racial and ethnic groups, non-Hispanic Black patients had higher rates of mutations in KRAS (60.9% vs 43.3%, 41.9% and 40.1% in non-Hispanic White, White Hispanic, and non-Hispanic Asian patients, respectively; P = .001, q = 0.015), and non-Hispanic White and non-Hispanic Black patients had higher rates of APC mutations (77.1% non-Hispanic White, 76.6% non-Hispanic Black; P = .001, q = 0.015) (Figure 1). Mutation frequencies in several other genes including NTRK2 and 3 and DNA repair genes including BRCA1 and BRCA2, MSH2, MSH6, PMS2, and MLH1 were numerically higher among White Hispanic patients although the differences were not statistically significant. There was no statistically significant enrichment among racial and ethnic groups in any of the copy number alterations assessed (Supplementary Table 3, available online).

Mutation frequency of 22 genes with clinical or potential clinical implications in colorectal cancer. Values colored by rank of percent mutated in early onset colorectal cancer patients (darkest box = racial and ethnic group with highest mutation rate per gene).
Among 41 UTSW patients with primary tumor RNAseq data (Supplementary Table 4, available online) clustered into 3 whole transcriptome expression profiles, 1 cluster was statistically significantly enriched in White Hispanic patients (Figure 2, A; Supplementary Table 5, available online). Notably, this cluster had increased expression of genes involved in oxidative phosphorylation (P < .001, q = 0.025) (Figure 2, B).

RNA sequencing analysis. A) Principle Component Analysis of University of Texas Southwestern RNASeq Expression Data with k-means clustering assignment shown in color. One point in cluster 2 is not shown PC1: 138, PC2: -43. B) Heatmap of 10 genes in the WikiPathways oxidative phosphorylation pathway upregulated in cluster 3. Z score of log counts per million shown in color; individual patients sorted into clusters along x-axis.
Our study demonstrated racial and ethnic differences in rates of KRAS and APC mutations in EOCRC. The increased rate of KRAS mutations among non-Hispanic Black patients with EOCRC is consistent with literature in older adults and may contribute to their worse prognosis (19). More non-Hispanic White and non-Hispanic Black patients had APC mutations; in contrast, prior literature has shown lower rates of APC mutations among Black patients with EOCRC (20). White Hispanic patients had higher frequency of mutations across several clinically actionable genes including DNA repair genes, although not to the level of statistical significance. This may be because of sample size; our data suggest more research is needed among this population with EOCRC.
We also identified differences in gene expression by race and ethnicity and, in particular, enrichment of the oxidative phosphorylation pathway in a cluster predominantly represented by White Hispanic patients. Recent studies have shown that oxidative phosphorylation is upregulated across certain cancers such as Hodgkin lymphoma, breast, and colorectal cancer (21). Emerging in vitro and in vivo studies have demonstrated efficacy of drugs including metformin and atovaquone in inhibiting oxidative phosphorylation to slow cancer cell metabolism (21). Therefore, targeting this pathway could represent a therapeutic strategy for select EOCRC patients. Given these potentially unique findings in the Hispanic patient population, there may be a role for earlier and targeted prevention and education efforts.
Limitations of our study include limited clinical data including staging and disease outcomes. Hispanic patients represent a heterogeneous group with diverse ancestry, thus combining into a single racial and ethnic group represents oversimplification. There may also be differences in Hispanic patients born in the United States vs immigrants, which could be the subject of future investigations. Furthermore, our RNAseq analyses was limited by small patient numbers and was univariable (race and ethnicity). Our analyses were also based on self-reported race and ethnicity, however, we obtained genetic ancestry from the UTSW Tempus cohort to genetically impute race and ethnicity labels, and agreement was high. Nevertheless, our study is the first to report on genomic differences in EOCRC by race and ethnicity and among very few genomic studies distinguishing White Hispanic patients as a unique group.
Overall, our findings suggest there may be underlying biological factors, along with clearly documented social determinants of health, contributing to the well-documented disparities by race and ethnicity in incidence and clinical outcomes of EOCRC, particularly among young Hispanic patients. Our results suggest the need for biomarker-driven diagnostic tools and therapies for the growing number of patients diagnosed with EOCRC.
Funding
Dedman Family Scholar in Clinical Care (NNS)
Notes
Role of the funder: The funder had no role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; or the decision to submit the manuscript for publication.
Disclosures: The authors have no disclosures.
Author contributions: DH: conceptualization, methodology, formal analysis, data curation, writing—original draft, writing—reviewing and editing. WD: conceptualization, methodology, formal analysis, writing—reviewing and editing. MB: methodology, formal analysis, writing—reviewing and editing. SAK: conceptualization, data curation, writing—reviewing and editing. BR: data curation, methodology, writing—reviewing and editing. FMDLV: data curation, methodology, writing—reviewing and editing. ALJ: methodology, data curation, writing—reviewing and editing. RK: data curation, writing—reviewing and editing. WJ: conceptualization, methodology, writing—reviewing and editing. BC: conceptualization, methodology, formal analysis, writing—reviewing and editing, supervision. NNS: conceptualization, methodology, resources, writing—original draft, writing—reviewing and editing, supervision, funding acquisition
Prior presentations: Oral symposium presentation at annual AACR meeting (April 12, 2021).
Acknowledgements: The authors would like to acknowledge the American Association for Cancer Research (AACR) and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. The authors would also like to acknowledge Michael F. Berger, PhD, from Memorial Sloan Kettering Cancer Center for his collaboration on this manuscript.
Data Availability
Data from AACR Project GENIE is available at https://www.aacr.org/professionals/research/aacr-project-genie/. Additional data for this study will be shared on reasonable request to the corresponding author.
References
AACR Project GENIE Consortium.