-
PDF
- Split View
-
Views
-
Cite
Cite
Mourad Wagdy Ali, Jianhong Chen, Li Yan, Xiaoyu Wang, James Y Dai, Thomas L Vaughan, Graham Casey, Matthew F Buas, A risk variant for Barrett’s esophagus and esophageal adenocarcinoma at chr8p23.1 affects enhancer activity and implicates multiple gene targets, Human Molecular Genetics, Volume 31, Issue 23, 1 December 2022, Pages 3975–3986, https://doi.org/10.1093/hmg/ddac141
- Share Icon Share
Abstract
Nineteen genetic susceptibility loci for esophageal adenocarcinoma (EAC) and its precursor Barrett’s esophagus (BE) have been identified through genome-wide association studies (GWAS). Clinical translation of such discoveries, however, has been hindered by the slow pace of discovery of functional/causal variants and gene targets at these loci. We previously developed a systematic informatics pipeline to prioritize candidate functional variants using functional potential scores, applied the pipeline to select high-scoring BE/EAC risk loci and validated a functional variant at chr19p13.11 (rs10423674). Here, we selected two additional prioritized loci for experimental interrogation: chr3p13/rs1522552 and chr8p23.1/rs55896564. Candidate enhancer regions encompassing these variants were evaluated using luciferase reporter assays in two EAC cell lines. One of the two regions tested exhibited allele-specific enhancer activity – 8p23.1/rs55896564. CRISPR-mediated deletion of the putative enhancer in EAC cell lines correlated with reduced expression of three candidate gene targets: B lymphocyte kinase (BLK), nei like DNA glycosylase 2 (NEIL2) and cathepsin B (CTSB). Expression quantitative trait locus (eQTL) mapping in normal esophagus and stomach revealed strong associations between the BE/EAC risk allele at rs55896564 (G) and lower expression of CTSB, a protease gene implicated in epithelial wound repair. These results further support the utility of functional potential scores for GWAS variant prioritization, and provide the first experimental evidence of a functional variant and risk enhancer at the 8p23.1 GWAS locus. Identification of CTSB, BLK and NEIL2 as candidate gene targets suggests that altered expression of these genes may underlie the genetic risk association at 8p23.1 with BE/EAC.
Introduction
Esophageal adenocarcinoma (EAC) is the predominant subtype of esophageal cancer in Western nations and accounts for more than 10 000 deaths annually in the U.S. (1,2). Incidence has increased six- to eight-fold over the past five decades (3) and median survival remains less than one year (4). EAC arises from an epithelial precursor lesion, Barrett’s esophagus (BE), which typically develops in a setting of chronic inflammation associated with recurrent acid reflux. BE is characterized by replacement of the normal squamous epithelium in the lower esophagus with an intestinalized columnar metaplasia (5,6). Several strong epidemiologic risk factors have been defined for BE/EAC, including gastroesophageal reflux disease (GERD), visceral obesity and smoking (7–12). A significant etiologic role for inherited genetic variants has emerged from studies over the past decade (13).
Genome-wide association studies (GWAS) conducted by the Barrett’s and Esophageal Adenocarcinoma Consortium (BEACON) and other collaborating international consortia have led to the discovery of nearly 20 germline susceptibility loci for BE/EAC (14–18). Common inherited genetic variation is estimated to account for 25–35% of overall disease risk (19). There is considerable overlap between risk variants for BE and EAC, with all but one of the known risk loci common to both traits (15). Gene–environment (G × E) interaction scans have identified two genetic variants which appear to modify the degree of risk associated with reflux symptoms (14,20), and targeted post-GWAS analyses have implicated additional candidate risk genes and pathways (21–23).
As with most GWAS, little progress has been made in identifying the functional variants, gene targets or causal mechanisms which underlie these risk associations. Moving from association to biology is complicated by the fact that GWAS index variants are rarely functional, and usually in linkage disequilibrium (LD) with tens or hundreds of other SNPs, most of which lack obvious biological function. Increasing evidence suggests that most risk-associated functional/causal SNPs act through regulatory effects leading to alterations in gene expression (24–28).
To accelerate post-GWAS functional studies, we previously developed a flexible informatics pipeline for assigning ‘functional potential scores’ to candidate risk variants, based on consolidation of multiple complementary annotations derived from tissue-specific chromatin profiles, regulatory/ Expression quantitative trait locus (eQTL) relationships and sequence-based features. Using these scores to guide prioritization of candidate risk variants for subsequent experimental follow-up, we recently characterized two high-scoring BE/EAC susceptibility regions (29). At chr19p13.11, we identified a putative risk enhancer and functional variant (rs10423674) modulating its activity. Deletion of this enhancer correlated with reduced expression of two candidate gene targets, CREB-regulated transcription coactivator 1 (CRTC1) and cartilage oligomeric matrix protein (COMP). In this report, we selected two additional high-scoring BE/EAC risk loci for similar interrogation. Using luciferase reporter enhancer activity assays in EAC cell lines, we identified a putative enhancer region at chr8p23.1 and functional risk variant (rs55896564) that modulates its activity. Using CRISPR genome editing and gene expression profiling in EAC cell lines, we identified candidate gene targets of the putative enhancer. Genotype-expression correlations (eQTL) and differential expression analyses in normal human tissues and/or Barrett’s metaplasia were examined to obtain further support for candidate risk genes.
Results
Selection of candidate functional variants at prioritized risk loci
Using our recently published informatics scoring pipeline for estimating the relative functional potential of candidate risk variants (29), we prioritized two BE/EAC GWAS susceptibility regions for further study: chr3p13/FOXP1 and chr8p23.1/BLK1 (Supplementary Material, Table S1). At each locus, we examined candidate functional variants with high functional potential scores (FPS). By jointly considering FPS with r2 and genomic distance to the index variant(s) (Supplementary Material, Fig. S1), we selected one highly ranked variant from each locus for subsequent experimental testing—chr3p13/rs1522552 and chr8p23.1/rs55896564. At the chr8p23.1 locus, our selection between two high-priority variants was further guided by the presence of enhancer marks spanning rs55896564 in upper GI-tract tissues. By convention, variant alleles associated with increased risk of BE/EAC are referred to as the ‘risk alleles’: rs1522552 ‘A’ (versus ‘G’) and rs55896564 ‘G’ (versus A).
Candidate enhancer region characterization
Candidate enhancer regions (CERs) at each locus were defined as minimal genomic segments (∼0.5–1 kb) spanning the prioritized candidate functional/causal variant (Fig. 1). Because the selected variant at chr8p23.1 (rs55896564) is located between two additional candidate variants in immediate proximity and in strong LD, the CER defined at this locus encompassed three variants (CER2/rs17153498/rs55896564/rs56051089). The designated CERs were cloned separately into luciferase enhancer activity vectors (24,25), and enhancer activities of different alleles were independently assessed with measurements of luminescence following transfection of constructs into the EAC cell lines OE19 and OE33. Chr8p23.1/CER2 demonstrated enhancer activity in both cell lines, in both orientations. CER2 sequences bearing the risk allele (G) at rs55896564 (haplotypes AGT and GGC) exhibited lower enhancer activity than CER2 bearing the reference allele (haplotype AAT), in both cell lines, in the forward orientation (P < 5 × 10−16) (Fig. 2). These results indicate that rs55896564 genotype modulates enhancer activity in these assays. Chr3p13/CER1/rs1522552 did not show enhancer activity in either orientation, in either cell line.
![Genome Browser plots for selected GWAS risk loci. (A) chr3p13/FOXP1, (B) chr8p23.1/LINC00208/BLK). AVS, associated variant set; red: index variants, blue r2 > 0.80. Enh, predicted enhancer regions based on H3-K27ac ChIP-seq profiles in specific primary tissues: gastric (dark green), stomach smooth muscle (light green) (Hsniz 2013). Candidate enhancer regions, CER; CER1: 3:70906815-70 907 701 (rs1522552); CER2: 8:11446898-11 447 336 (rs17153498/rs55896564/56051089) [http://genome.ucsc.edu]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/hmg/31/23/10.1093_hmg_ddac141/1/m_ddac141f1.jpeg?Expires=1748303996&Signature=xfWQQUv1GJtw6cM1T1WOm~5N-PHFkjDEphbqul-Di5WgLegoqvm1V8NaQfy6clt9o8xfMW33iQ4iTCBzjidzOebBy2gjkRRSMxNXht0RrpHcMDMIukehH4WTy5V0OXaG3sg~u9Uy9LPcIDitvxB2wVcg9HPrXTF6pH3RWNI5j9TYNR0-PiIJbcsRPlPwR-K17QAOQiUHPpcHyLa0Wy32FmQzYNZSpBxg4--QN-neEuBVICAePHUkFm2TJwdfNyZCl~ZAHOHqYqIYH-2ohw5Uq0EZuHxeWS90p8YgbGqT1DBPG2N2OVXkGHlWv13ztFmgMAzkOS5CRWCL-49cnu42vg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Genome Browser plots for selected GWAS risk loci. (A) chr3p13/FOXP1, (B) chr8p23.1/LINC00208/BLK). AVS, associated variant set; red: index variants, blue r2 > 0.80. Enh, predicted enhancer regions based on H3-K27ac ChIP-seq profiles in specific primary tissues: gastric (dark green), stomach smooth muscle (light green) (Hsniz 2013). Candidate enhancer regions, CER; CER1: 3:70906815-70 907 701 (rs1522552); CER2: 8:11446898-11 447 336 (rs17153498/rs55896564/56051089) [http://genome.ucsc.edu]

Luciferase reporter enhancer activity assays. The candidate enhancer region at chr8p23.1 (CER2/rs17153498/rs55896564/rs56051089) was assessed for allele-specific activity, alongside positive and negative control fragments. Three independent experiments were performed in triplicate in two EAC cell lines: (A) OE19 and (B) OE33.
To determine if transcription factor binding has been described in the vicinity of the chr8p23.1/rs55896564 putative enhancer, we consulted the ReMap2022 compendium of ChIP-seq profiles (30). Focusing initially on normal esophageal tissues and EAC cell lines, we identified two transcription factors with evidence of local recruitment (Fig. 3A): KLF5 and CTCF. Interestingly, the reported peak of the KLF5 ChIP-seq signal, detected in the ESO-26 EAC cell line, was located only 4 bp away from rs55896564. Using the R package motifbreakR (31), we next screened for predicted DNA-binding motifs for KLF-family factors and CTCF, overlapping rs55896564, and evaluated the predicted impact of this variant on the strength of such motifs. At the selected stringency level (P = 0.005), we identified motifs for multiple KLF-family proteins, but not for CTCF. Motifs for five KLF factors were scored as weaker matches to consensus using the risk allele (G) at rs55896564, versus the reference allele (A) (Fig. 3B), with the largest score differential (−0.14) observed for the KLF4 motif (Fig. 3C). The KLF14 motif score was modestly increased by the risk allele. In an expanded screen for additional potential regulators at CER2/rs55896564, we identified several other transcription factors with nearby ChIP-seq signal peaks in non-esophageal cell types, and overlapping DNA-binding motifs altered by the variant: ZEB1, RUNX2 and RUNX1 (Supplementary Material, Fig. S2).
![Assessment of protein binding and DNA sequence motifs at CER2/rs55896564. (A) Human transcription factor ChIP-seq profiles in upper-GI-tract tissues and EAC cell lines. Two proteins showed evidence of local binding: KLF5 (ESO-26 cell line) and CTCF (esophagus, stomach, GE junction). Peak maximum indicated by vertical bar in binding region [http://genome.ucsc.edu]. (B) Predicted impact of rs55896564 on motif scores of overlapping KLF DNA-binding motifs identified via motifbreakR; motif scores computed using input sequences bearing the A or G allele are listed [pctA, pctG]. (C) Motif logo for KLF4; the position of rs55896564 indicated by dotted box.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/hmg/31/23/10.1093_hmg_ddac141/1/m_ddac141f3.jpeg?Expires=1748303996&Signature=ipGspi3cAPXA7cBS44mTdQ2Eu-ob5D3XdSb00zn2NT3R9vTUWKUTr6OYuWj3~9-zlokRBFijqqmum1aviDmdPTJmt6ZbtbCphTSHM2NzyolXSwc3pGPLknyqZqXyoTsKSr9wR~QNXH3CB5u8ngbGYdkSD4bf4S68FbPO0H6LNGWc7YnjWPuB2G6HBhPfmwAHfW8Laau7sjypsZbTkLs-2zLYrRyHG~uPnV8Qb2CXT-YESlAkfHYxvEt0b2WYdxxhhTxxd4d~3txMpPnbTRqtqq5nZ3h4B12542ZXq~9UHRzgdA6RGznW0j3x1uyNjpuElb2xOiOxeVSLNjUM36KdnA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Assessment of protein binding and DNA sequence motifs at CER2/rs55896564. (A) Human transcription factor ChIP-seq profiles in upper-GI-tract tissues and EAC cell lines. Two proteins showed evidence of local binding: KLF5 (ESO-26 cell line) and CTCF (esophagus, stomach, GE junction). Peak maximum indicated by vertical bar in binding region [http://genome.ucsc.edu]. (B) Predicted impact of rs55896564 on motif scores of overlapping KLF DNA-binding motifs identified via motifbreakR; motif scores computed using input sequences bearing the A or G allele are listed [pctA, pctG]. (C) Motif logo for KLF4; the position of rs55896564 indicated by dotted box.
Gene expression alterations associated with enhancer disruption
A number of genes mapping to chr8p23.1 have been implicated in EAC (32–35). We hypothesized that disruption of the putative enhancer at 8p23.1 (CER2/rs55896564) may affect the expression of multiple genes in this region. To test this hypothesis, we first used CRISPR-Cas9 genome editing to delete the ~500 bp region containing CER2 in EAC cell lines. Deletion efficiency in OE19 and OE33 cells was assessed by PCR, using harvested DNA and primers designed to amplify an approximately 2 kb genomic region across the putative enhancer. Mock-treated cells yielded a single 2 kb band, corresponding to an unedited genomic fragment, while CRISPR-Cas9-transfected cells yielded two bands: the unedited 2 kb fragment and the edited 1.5 kb fragment. Our results indicate that a large proportion of cells (though not all) in both cell lines were successfully edited (Fig. 4).

CRISPR-Cas9 genome editing of putative enhancer at chr8p23.1. DNA gel electrophoresis showing genome editing of the chr8p23.1 candidate enhancer region containing variants rs17153498, rs55896564 and rs56051089. 1.5 kb band demonstrates targeted deletion of the putative enhancer in OE19 and OE33 cells. gRNA (−): cells electroporated with mock conditions; gRNA (+): cells electroporated with cas9 vector and guide RNA target vectors.
We next quantified the expression of several genes located within ~250 kb of rs55896564 (Fig. 5A) using TaqMan qPCR expression assays in CRISPR genome-edited cells, and compared expression to mock CRISPR-edited cells. We observed statistically significant reductions in the expression levels of three genes: B lymphocyte kinase (BLK), nei like DNA glycosylase 2 (NEIL2) and cathepsin B (CTSB); expression of a fourth gene, GATA binding protein 4 (GATA4), was not significantly altered (Fig. 5B). We did not observe detectable expression of a nearby non-coding gene, long intergenic non-protein coding RNA 208 (LINC00208), in the EAC cell lines examined.

Gene expression changes following CRISPR-Cas9 deletion of putative enhancer at 8p23.1. (A) UCSC Genome Browser plot of ~500 kb genomic region centered on CER2 (highlighted). (B) CER2 was targeted for deletion in OE19 and OE33 cells using CRISPR-Cas9 technology. Pools of transfected cells were analyzed using TaqMan gene expression assays for BLK, GATA4, NEIL2 and CTSB, in triplicate, in three independent experiments. NC: mock electroporated parental cells; Cas9 + gRNA: cells electroporated with Cas9 vector and guide RNA target vectors. *P < 0.05; **P < 0.01; ***P < 0.001; and ****P < 0.0001.
eQTL analysis
To assess whether rs55896564 genotype correlates with altered expression levels of candidate gene targets in normal human tissues of the upper GI tract, we analyzed publicly available eQTL data from the GTEx V8 repository (36). Strong eQTL associations (P < 5 × 10−5) were noted for rs55896564 and CTSB in three of the four tissues examined: esophagus muscularis, gastroesophageal junction and stomach (Table 1). In each case, the risk allele (G) at rs55896564 correlated with lower CTSB expression (slope < 0). A weaker statistical association (P = 0.02) was also observed for BLK expression in esophagus mucosa; the risk allele (G) similarly tracked with lower transcript levels. No statistically significant eQTLs (P < 0.05) were identified for rs55896564 and GATA4 or NEIL2. Given that BE likely arises from cell populations at the GE junction/gastric cardia (37–39), it is noteworthy to observe strong eQTLs for CTSB in normal GE junction and stomach (P < 5 × 10−5).
Genotype-expression correlations for rs55896564 and candidate gene targets. Cis-eQTLs for rs55896564 and BLK, GATA4, NEIL2 or CTSB in upper GI-tract tissues (GTEx v8: EUR). GEJ, gastroesophageal junction. rs55896564 reference allele (A), alternate allele (G)
. | BLK . | . | GATA4 . | . | NEIL2 . | . | CTSB . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Tissue . | P . | Slope . | . | P . | Slope . | . | P . | Slope . | . | P . | Slope . |
Esophagus_Muscularis | . | . | 0.87 | −0.01 | 0.88 | 0.01 | 8.9 × 10 −8 | −0.15 | |||
Esophagus_Mucosa | 0.02 | −0.09 | . | . | 0.13 | −0.06 | 0.17 | 0.05 | |||
Esophagus_GEJ | . | . | 0.46 | 0.03 | 0.66 | 0.02 | 3.3 × 10 −5 | −0.14 | |||
Stomach | 0.08 | 0.08 | 0.051 | 0.08 | 0.12 | 0.06 | 5.2 × 10 −6 | −0.17 |
. | BLK . | . | GATA4 . | . | NEIL2 . | . | CTSB . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Tissue . | P . | Slope . | . | P . | Slope . | . | P . | Slope . | . | P . | Slope . |
Esophagus_Muscularis | . | . | 0.87 | −0.01 | 0.88 | 0.01 | 8.9 × 10 −8 | −0.15 | |||
Esophagus_Mucosa | 0.02 | −0.09 | . | . | 0.13 | −0.06 | 0.17 | 0.05 | |||
Esophagus_GEJ | . | . | 0.46 | 0.03 | 0.66 | 0.02 | 3.3 × 10 −5 | −0.14 | |||
Stomach | 0.08 | 0.08 | 0.051 | 0.08 | 0.12 | 0.06 | 5.2 × 10 −6 | −0.17 |
Genotype-expression correlations for rs55896564 and candidate gene targets. Cis-eQTLs for rs55896564 and BLK, GATA4, NEIL2 or CTSB in upper GI-tract tissues (GTEx v8: EUR). GEJ, gastroesophageal junction. rs55896564 reference allele (A), alternate allele (G)
. | BLK . | . | GATA4 . | . | NEIL2 . | . | CTSB . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Tissue . | P . | Slope . | . | P . | Slope . | . | P . | Slope . | . | P . | Slope . |
Esophagus_Muscularis | . | . | 0.87 | −0.01 | 0.88 | 0.01 | 8.9 × 10 −8 | −0.15 | |||
Esophagus_Mucosa | 0.02 | −0.09 | . | . | 0.13 | −0.06 | 0.17 | 0.05 | |||
Esophagus_GEJ | . | . | 0.46 | 0.03 | 0.66 | 0.02 | 3.3 × 10 −5 | −0.14 | |||
Stomach | 0.08 | 0.08 | 0.051 | 0.08 | 0.12 | 0.06 | 5.2 × 10 −6 | −0.17 |
. | BLK . | . | GATA4 . | . | NEIL2 . | . | CTSB . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Tissue . | P . | Slope . | . | P . | Slope . | . | P . | Slope . | . | P . | Slope . |
Esophagus_Muscularis | . | . | 0.87 | −0.01 | 0.88 | 0.01 | 8.9 × 10 −8 | −0.15 | |||
Esophagus_Mucosa | 0.02 | −0.09 | . | . | 0.13 | −0.06 | 0.17 | 0.05 | |||
Esophagus_GEJ | . | . | 0.46 | 0.03 | 0.66 | 0.02 | 3.3 × 10 −5 | −0.14 | |||
Stomach | 0.08 | 0.08 | 0.051 | 0.08 | 0.12 | 0.06 | 5.2 × 10 −6 | −0.17 |
Expression profiles of candidate gene targets in adult GI tract tissues and BE
We next sought to characterize the expression patterns of candidate gene targets across upper GI-tract tissues. Using GTEx RNA-seq profiles, we examined median expression levels of selected genes in normal esophagus (mucosa and muscularis), gastroesophageal junction and stomach (Supplementary Material, Fig. S3A). CTSB exhibited the highest expression levels overall; expression in stomach (<100 TPM) was notably lower than in esophageal and junctional tissues (~200–250 TPM). By contrast, GATA4 expression was highest in stomach (~30 TPM), and was only weakly detected in junctional tissues (~3 TPM), with signals in esophagus under the limit of detection (<0.1 TPM). NEIL2 was expressed at modest levels across the four tissues (~8–14 TPM), and BLK expression was either very low (0.7 TPM) or under the limit of detection (<0.1 TPM).
Using expression microarray data from the Gene Expression Omnibus (GEO) repository (40), we analyzed the relative expression levels of these genes in Barrett’s metaplasia versus normal esophagus (Supplementary Material, Fig. S3B). Statistically significant differences in expression were noted for GATA4 and CTSB in both datasets, with elevated GATA4 and reduced CTSB expression in BE. By contrast, BLK and NEIL2 expression levels were not significantly altered in BE.
Discussion
This study represents one of the first reported experimental efforts to bridge GWAS associations to biology in BE/EAC. Despite the success of GWAS in discovering genetic risk loci for BE/EAC over the past decade, almost all such loci remain uncharacterized (15). Identifying the functional/causal variants driving such association signals, and the risk genes and biological processes affected, remains a significant bottleneck in the path toward ultimate clinical translation. Given the labor-intensive and costly nature of experimental follow-up studies, we recently developed a systematic informatics pipeline to improve the efficiency of identifying and prioritizing variants with high estimated functional potential, via assignment of functional potential scores (FPS) (29). Building on our prior characterization of two prioritized BE/EAC risk loci, we selected two additional high-scoring loci for experimental interrogation in this study. Using luciferase reporter assays in EAC cell lines, we identified rs55896564 as a functional risk variant that modulates the activity of a putative enhancer region at chr8p23.1. Based on ChIP-seq profiles and in silico motif analysis, we defined KLF-family transcription factors as candidate trans-acting regulators. Using CRISPR genome editing and expression profiling, we identified BLK, NEIL2 and CTSB as candidate gene targets of this risk enhancer. eQTL mapping in normal esophageal/gastric tissues provided further support for CTSB. Our work provides new insight into the molecular and biological basis of genetic risk for BE/EAC, and further highlights our generalizable framework for accelerating post-GWAS functional analyses.
We have now experimentally evaluated a total of four independent BE/EAC risk loci that were prioritized and selected using our informatics scoring system (29). Three of the four candidate enhancer regions tested exhibited enhancer activity in luciferase reporter assays, and two of these three regions exhibited allele-specific activity: chr8p23.1/rs55896564 and chr19p13.11/rs10423674. While our total sample size remains small, a ‘success rate’ of 50% (2/4) is considerably higher than that achieved over many years of our prior work on colorectal cancer and glioma (24–28), providing additional support for the utility of using FPS to inform candidate variant prioritization. We believe that the tailored consolidation of multiple classes of genomic/epigenomic/regulatory annotations, into variant-level summary scores of estimated functional potential, represents a promising general approach for rational prioritization of a small subset of variants from large pools of candidates. Our FPS incorporates tissue-specific enhancer histone marks, chromatin accessibility and eQTL associations, among other features, which can be custom weighted toward specific tissues most relevant to the disease of interest.
At the chr8p23.1 risk locus, variants rs10108511 and rs2409797 were identified as lead index variants in prior BE/EAC GWAS (15). The risk estimates obtained for BE were very similar to the estimates for EAC, suggesting that this locus may primarily affect the risk of BE development, rather than BE progression to cancer, consistent with most BE/EAC GWAS loci identified to date (15–18). The associated variant set we assembled at the chr8p23.1 locus encompassed a total of 130 variants distributed across a ~ 721 kb genomic region. Our selection of rs55896564 for functional testing was guided by joint consideration of FPS, r2, and genomic distance. In contrast to the index variants, located ~12 kb upstream, rs55896564 (r2 = 0.92) was marked by notable annotation features predictive of an enhancer region in GI-tract tissues and/or EAC—chromatin accessibility (DNAseI hypersensitivity/ATAC-seq signal), active histone marks (H3K27ac) and transcription factor occupancy (41–44).
Our initial experimental support for the functional significance of rs55896564, derived from luciferase reporter enhancer activity assays, was bolstered by two complementary findings. First, published ChIP-seq profiles of the ESO-26 EAC cell line revealed binding of a key transcription factor—KLF5—in the immediate vicinity of rs55896564 (30). Second, DNA-binding motif analysis identified KLF motifs overlapping rs55896564, with the risk allele (G) resulting in a weaker predicted sequence match for most such motifs. Of interest, the chr8p23.1/CER2 fragment bearing the rs55896564 risk allele (G) also exhibited lower enhancer activity in our luciferase assays (Fig. 2). KLF5 is a member of the Krüppel-like factor family of transcription factors and was recently recognized as a master regulator in EAC (45). Multiple KLF factors including KLF5 are known to play key roles in epithelial homeostasis, wound healing and carcinogenesis in the GI tract (46,47). Future studies will be required to confirm the disruption of KLF5 binding at this site.
Previous studies have implicated chr8p22–23 in the biology of EAC. Several reports identified somatic chromosomal amplifications of this region in a subset of EAC tumors (32–35), and described overexpression of multiple constituent genes including GATA4, CTSB, NEIL2, FDFT1 and MTMR9 (33). These studies did not examine germline genetic variation at 8p23.1, however, or address whether expression alterations of such genes in normal tissues were linked to the development of BE or EAC. Our current study identified three candidate gene targets of a putative risk enhancer at 8p23.1: BLK, NEIL2, CTSB (Supplementary Material, Table S2). Further support for CTSB as a potential BE/EAC susceptibility gene emerged from strong eQTL signals in normal esophageal/gastric tissues. Notably, lower expression of CTSB was observed in individuals bearing the rs55896564 risk allele (G) for BE and EAC. Cathepsin B is a lysosomal cysteine protease expressed in both intracellular and secreted forms (48). While much of the literature on CTSB has focused on its elevated expression in various malignancies including EAC, and its role in cancer progression and metastasis via extracellular matrix remodeling (49,50), CTSB has also been implicated in normal epithelial homeostasis (51). Inhibition of CTSB impaired epithelial wound healing, limiting the migration of keratinocytes necessary for tissue regeneration (52). Given that BE is considered an abnormal (yet adaptive) response to chronic epithelial injury associated with acid reflux into the squamous-lined lower esophagus (53), the normal epithelial wound healing response represents a biologically plausible target of genetic modulators of BE susceptibility.
GTEx eQTL data suggested a correlation between rs55896564 genotype and BLK expression in esophageal mucosa (P = 0.02), but BLK transcript levels were extremely low in esophagus and stomach, and eQTL could not be assessed in GE junction. BLK encodes a non-receptor protein tyrosine kinase expressed predominantly in immune cell types and implicated in B-cell development and hematologic malignancies (54–56). Our expanded eQTL scan revealed a strong association (P < 5 × 10−8) of the rs55896564 risk allele (G) with lower BLK expression in whole blood. We also found that the genomic region surrounding rs55896564 was marked by open chromatin in multiple immune cell types (Supplementary Material, Fig. S4), consistent with regulatory function. Other genetic variants linked to reduced expression of BLK are associated with auto-immune diseases, and alterations in B-cell maturation and antibody production (57–59). These findings suggest that lower BLK expression in circulating and/or mucosa-resident immune cells, and resulting modulation of systemic and/or local inflammation (53), may partially underlie the genetic susceptibility signal for BE/EAC at chr8p23.1.
While eQTLs were not identified for rs55896564 and NEIL2 in upper GI tract tissues, it remains possible that such associations would manifest only in certain subpopulations of cells (versus bulk tissue), and/or only under specific conditions. NEIL2 encodes a DNA glycosylase enzyme that functions in the base excision DNA repair (BER) pathway (60,61). BER plays an important role in repairing oxidative DNA lesions, which have been linked to both acid reflux (62) and cigarette smoke (63), two established risk factors for BE/EAC. We also note that prior studies have revealed an important role for the transcription factor GATA4 in foregut development and columnar epithelial fate specification (64,65); GATA4 mRNA is elevated in BE (Supplementary Material, Fig. S3B) relative to squamous epithelium (66). While our current data do not support a link between CER2/rs55896564 and GATA4 expression changes, context- and cell-type-dependent effects cannot be ruled out.
Inherited genetics likely acts through multiple core pathways to influence BE/EAC risk. First, genetic variation could alter the propensity to develop intermediate conditions linked to altered susceptibility, such as GERD or visceral obesity. Second, inherited variants could affect inflammatory responses and/or critical buffering systems that respond to epithelial injury and DNA damage, key contributors to BE/EAC pathogenesis. As noted above, each of the three candidate risk genes we identified at chr8p23.1—CTSB, BLK, NEIL2—have been implicated in such processes. Third, genetics may influence the intrinsic characteristics or abundance of the BE/EAC ‘cell of origin’. The identity of this originating cell remains actively investigated, with candidates including various epithelial subpopulations such as undifferentiated gastric cardia cells (39), transitional basal epithelial cells (TBECs) (38), reserve embryonic cells (RECs) (37) and esophageal submucosal glands (67). Molecular profiling of these cells, and resulting maps of chromatin states and regulatory associations (68), will continue to improve our understanding of how genetic variation may influence specific cell pools that give rise to BE.
This study has certain limitations. Additional functional variants may underlie the BE/EAC risk association at 8p23.1, and could act via alternate mechanisms such as modulation of microRNA binding or alternative splicing. In our CRISPR experiments, we have not determined the impact of rs55896564 directly, but of the enhancer containing that variant. It will be important in future studies to conduct additional analyses in primary cell cultures and organoid model systems. We cannot rule out additional gene targets of CER2, particularly at long range. Our inspection of transcription factor binding in proximity to rs55896564 was restricted to a small fraction of the total pool of known regulators, due to limited availability of ChIP-seq data, particularly from esophageal/gastric tissues. Nevertheless, we believe that we provide strong supportive evidence for the identification of rs55896564 as a functional variant relevant to bc/EAC risk.
In conclusion, this study represents a significant early step forward in the functional evaluation of inherited genetic risk loci associated with BE/EAC. Our results identify a functional risk variant and putative regulatory enhancer at the chr8p23.1 risk locus, and define downstream gene targets (CTSB, NEIL2, BLK) and potential candidate protein regulators (KLF5) of this enhancer in EAC cell lines. eQTL findings from normal human tissues further support CTSB and BLK as candidate risk genes at chr8p23.1, providing clues that genetically mediated modulation of epithelial wound healing and inflammation may influence BE/EAC risk. Future studies will contribute to characterizing regulatory effects of CER2/rs55896564 in specific cell types critical to BE/EAC etiology, and probing biological consequences of altered target gene expression in physiologic model systems. Our work underscores the utility of functional potential scores to focus experimental follow-up on prioritized variants and improve the efficiency of post-GWAS functional studies.
Materials and Methods
Candidate risk variant prioritization
The informatics pipeline we developed for assigning functional potential scores (FPS) to candidate functional/causal variants has been described in detail (29). These scores were used to prioritize a subset of BE/EAC GWAS risk regions for experimental testing, based on the assessment of the maximum FPS obtained, by region, for any candidate risk variant. Two top-scoring regions were selected for analysis in this report. Within each region, the final selection of a candidate enhancer region spanning one or more high-scoring candidate functional risk variants was guided by joint consideration of FPS, r2, and genomic distance to the index variant(s), with visualization of individual annotation features via customized UCSC Genome Browser plots.
Cell culture
OE19 and OE33 EAC cell lines were obtained from ECACC (European Collection of Authenticated Cell Cultures) via MilliporeSigma (St. Louis, MO). Routine Mycoplasma testing of the cell lines revealed no contamination. Stocks of the cell lines were frozen at low passage numbers. OE19 and OE33 cells were grown in DMEM (Thermo Fisher) supplemented with 5% Fetal Bovine Serum (Thermo Fisher), and 1% Penicillin/Streptomycin, and grown at 37|${}^{{}^{\circ}}$|C in 5% CO2.
Plasmids and luciferase assays
DNA fragments containing alternate alleles of the candidate variants/haplotypes were PCR amplified from normal human genomic DNA and subcloned into Sac I and Xho I restriction enzyme sites (in both orientations) upstream of a TK (thymidine kinase) minimal promoter-driven firefly luciferase vector (27) using CloneAmp HiFi PCR Premix and the In-Fusion HD cloning kit (Takara). Plasmid clones were sequenced by Sanger sequencing (Genewiz) to confirm the presence of candidate variants and the absence of any PCR amplification-induced mutations. For enhancer assays, OE19 and OE33 cells (1 × 105 cells/well) were seeded into 96-well plates. Cells were co-transfected with reporter plasmids and constitutively active pNL1.1.TK [Nluc/TK] Vector (Promega) using Lipofectamine 2000 Reagent (Thermo Fisher) according to the manufacturer’s instructions. After 48 h, cells were assayed for luciferase activity using the Dual-Luciferase Reporter Assay System (Promega) and Synergy H1 Microplate Reader (BioTek), according to the manufacturers’ instructions. Negative and positive controls were included and enhancer activity was quantified as described previously (29).
CRISPR-Cas9 genome editing
Upstream and downstream CRISPR guide RNAs (gRNAs) were designed flanking the chr8p23.1 candidate enhancer region using CRISPRscan (69) and synthesized by Synthego (Menlo Park, CA) (guide sequences; gRNA1: 5’ GTCATAAATCTCTGGCACTG 3′, gRNA2: 5’ GACGTGGATCGGTGATGTAC 3′). The chemical modifications 2’-O-Methyl at three first and last bases and 3′ phosphorothioate bonds between first three and last two bases were introduced into the gRNAs in order to provide superior editing in the cell lines used (Synthego) (guide sequences following chemical modifications; gRNA1 5’ GUCAUAAAUCUCUGGCACUG 3′, gRNA2 5’ GACGUGGAUCGGUGAUGUAC 3′). The Cas9 2NLS Nuclease was purchased from Synthego. OE19 and OE33 cells were electroporated with gRNA and Cas9 (1:3 ratio) using the Neon Transfection System (Thermo Fisher). Electroporated cells were allowed to grow for 48 hours prior to DNA and RNA harvesting. Genomic DNA was purified using the QIAamp DNA Mini Kit (Qiagen) and enhancer deletion was confirmed with PCR amplifications (forward primer 5’ TGGCAATGTCATTCCTTTCA 3′, reverse primer 5’ ATGACCACTTGGGCTGTTTC 3′).
Quantitative real-time PCR
RNA was isolated using Trizol reagent (Thermo Fisher) and cDNA was synthesized from 2 μg of total RNA using the High Capacity Reverse Transcriptase cDNA kit (Thermo Fisher). Quantitative real-time polymerase chain reaction (qPCR) was performed on the QuantStudio 5 (Thermo Fisher) using the Superscript III kit (Thermo Fisher) and TaqMan assays for selected genes mapping within ~250 kb upstream or downstream of rs55896564: BLK (Hs01017458_m1), LINC00208 (Hs00326507_m1), GATA4 (Hs00171403_m1), NEIL2 (Hs00979610_g1), CTSB (Hs00947433_m1). GUSB (Hs00939627_m1) served as an internal control. Data analysis was performed using GraphPad Prism (version 8.3.0, GraphPad Software, La Jolla, CA, USA). Reactions were normalized using the control gene GUSB, and calculations were performed according to the 2-ddCT method. Fold change in expression was determined from three independent experimental repeats, each performed in triplicate. The statistical significance of expression differences was assessed using an analysis of variance, with Bonferroni correction for multiple hypothesis testing. *P < 0.05; **P < 0.01; ***P < 0.001; and ****P < 0.0001 indicate levels of significance.
Expression quantitative trait locus mapping
eQTL data for rs55896564 in European-ancestry (EUR) individuals were obtained from the Genotype-Tissue Expression (GTEx) Project V8 repository (36). Summary statistics were downloaded from Google cloud and converted from Parquet format to plain text using Python.
ChIP-seq and gene expression datasets
The ReMap2022 repository was queried for transcription factor ChIP-seq profiles derived in upper GI-tract primary tissues and EAC cell lines (30). Transcription factor recruitment to chr8p23.1/CER2 was assessed by visual inspection of ChIP-seq peaks using the UCSC Genome Browser. All ReMap2022 hg19 binding tracks were downloaded to enable filtering based on the proximity of the signal peak maximum to the genetic variant of interest. RNA-seq data from normal tissues in the upper GI tract were obtained from GTEx V8 (36). Expression levels of selected genes in each tissue were evaluated by plotting transcript per million (TPM) median values. RNA microarray datasets derived from Barrett’s tissue and normal esophagus were obtained from two independent studies in the Gene Expression Omnibus (GEO) repository: GSE26886 (70) and GSE34619 (66). Differential expression of selected genes by tissue was assessed using the GEO2R analysis tool (40).
DNA-binding motif analysis
The R package MotifbreakR was used to identify predicted DNA-binding motifs overlapping a single nucleotide variant of interest, and to estimate the impact of the variant on the predicted strength of each motif (31). All human motifs included in MotifDb and MotifbreakR_motif were considered at the outset. The threshold criterion for motif identification was a motif P-value of 0.005 using either the reference or alternate allele DNA sequence. Motif scores, normalized to a scale of 0 to 1 (pct), reflect the strength of match between an input sequence and consensus motif. When multiple motif definitions were captured for the same DNA-binding protein, the motif predicted to be most altered by the variant of interest was retained; a protein was excluded from further consideration if the predicted direction of impact of the variant on motif score was inconsistent across multiple motif definitions identified. Motif logo schematics for selected motifs predicted to be most impacted by the variant were generated using the plotMB function.
Author’s contribution
Conception and design: M.F.B., G.C., M.W.A. Data processing/extraction and bioinformatics: J.H.C., M.F.B., L.Y. Laboratory assays: M.W.A. Analysis and/or interpretation of data: M.W.A., M.F.B, L.Y., J.H.C., G.C., X.W., J.Y.D., T.L.V. Drafting of the manuscript: M.W.A., M.F.B., G.C. Study supervision: M.F.B., G.C. All authors approved of the final draft submitted.
Conflict of Interest statement. The authors declare that there are no conflicts of interests.
Funding
National Institute of Diabetes and Digestive and Kidney Disorders (R01DK128615), and by startup and/or discretionary funds awarded to M.F.B by the Roswell Park Comprehensive Cancer Center and to G.C. by the University of Virginia; National Cancer Institute (P30CA016056 to the Roswell Park Comprehensive Cancer Center; R01CA136725 to T.L.V to support the BEACON GWAS). Additional funding sources for individual studies included in the BE/EAC GWAS meta-analysis have been acknowledged previously (Gharahkhani et al., 2016).