Genome-wide association studies of colorectal cancer (CRC) have identified a number of common variants associated with modest risk, including rs3802842 at chromosome 11q23.1. Several genes map to this region but rs3802842 does not map to any known transcribed or regulatory sequences. We reasoned, therefore, that rs3802842 is not the functional single-nucleotide polymorphism (SNP), but is in linkage disequilibrium (LD) with a functional SNP(s). We performed ChIP-seq for histone modifications in SW480 and HCT-116 CRC cells, and incorporated ChIP-seq and DNase I hypersensitivity data available through ENCODE within a 137-kb genomic region containing rs3802842 on 11q23.1. We identified SNP rs10891246 in LD with rs3802842 that mapped within a bidirectional promoter region of genes C11orf92 and C11orf93. Following mutagenesis to the risk allele, the promoter demonstrated lower levels of reporter gene expression. A second SNP rs7130173 was identified in LD with rs3802842 that mapped to a candidate enhancer region, which showed strong unidirectional activity in both HCT-116 and SW480 CRC cells. The risk allele of rs7130173 demonstrated reduced enhancer activity compared with the common allele, and reduced nuclear protein binding affinity in electromobility shift assays compared with the common allele suggesting differential transcription factor (TF) binding. SNPs rs10891246 and rs7130173 are on the same haplotype, and expression quantitative trait loci (eQTL) analyses of neighboring genes implicate C11orf53, C11orf92 and C11orf93 as candidate target genes. These data imply that rs10891246 and rs7130173 are functional SNPs mapping to 11q23.1 and that C11orf53, C11orf92 and C11orf93 represent novel candidate target genes involved in CRC etiology.
Genome-wide association studies (GWAS) of colorectal cancer (CRC) have led to the identification of risk variants mapping to 20 chromosomal regions including 1q41, 3q26.2, 6p21, 8q23.3, 8q24.21, 10p14, 11q13.4, 11q23.1, 12q13.12, 14q22.2, 15q13.3, 16q22.1, 18q21.1, 19q13.11, 20p12.3, 20q13.33 and Xp22.2 in Caucasian populations (P-value ≤ 5 × 10−8) (1–12). Most of the variants map to non-coding regions of the genome and the majority of them are not expected to be functional due to the fact that the commercial SNP arrays used in most GWAS use tagSNPs to represent large areas of the genome. Only limited effort has been directed towards identifying and characterizing the functional variants related to GWAS implicated regions. Identifying functional variants remains a major challenge as fine mapping approaches often provide only limited information due to the extent of linkage disequilibrium (LD) across these regions.
One example of a risk variant that maps to a non-coding region is the rs6983267 CRC risk variant mapping to 8q24.21. Identified through several GWAS, rs6983267 maps to an enhancer region that is believed to interact with MYC (13–17). However, few CRC-associated tagSNPs are expected to be functional, and there is growing evidence that risk associated SNPs are likely to be in LD with functional SNPs that map within nearby genes, regulatory regions such as enhancers and promoters or long non-coding RNAs (lncRNAs) (16,18). This implies that the identification of functional SNPs could be facilitated through incorporation of chromatin status such as histone modification and/or DNase I hypersensitivity sites into post-GWAS analyses, and that this may represent a powerful approach to identify tissue-specific regulatory elements (19–21).
The common SNP rs3802842 mapping to chromosome 11q23.1 is associated with CRC risk in populations of European descent (CEU) (4). Within the LD block (r2 > 0.2) defined by rs3802842 lie POU2AF1, three putative genes, C11orf53, C11orf92 and C11orf93, and one microRNA MIR4491. There are no known lncRNAs or other transcription products within this region. TagSNP rs3802842 does not map to a coding region or a putative regulatory region, and none of the SNPs in LD (r2 > 0.2) with rs3802842 map to exons. We hypothesized, therefore, that rs3802842 is in LD with a nearby functional SNP(s) within a genetic regulatory element. To test this hypothesis, we generated chromatin immunoprecipitation-sequencing (ChIP-seq) data of histone modifications for SW480 and HCT-116 CRC cells and interrogated ENCODE ChIP-seq and DNase I hypersensitivity data from CRC cell lines to identify putative promoters and enhancers within this region (22).
Using this approach, we identified two potentially functional SNPs, rs10891246 and rs7130173 that are in LD with the CRC-associated SNP rs3802842. In CRC cell lines, rs10891246 mapped to a H3K4me3 peak suggesting it fell within an active promoter of genes C11orf92 and C11orf93, whereas rs7130173 mapped to both H3K4me1 and DNase I hypersensitivity peaks suggesting the presence of an enhancer. We found that a genomic fragment containing the SNP rs10891246 demonstrated bidirectional promoter activity in luciferase-based cellular assays, and the risk allele (A) resulted in lower reporter gene expression activity than the major allele (G). Furthermore, a 1 kb genomic fragment containing SNP rs7130173 showed enhancer activity, and the risk (A) allele led to reduced enhancer activity compared with the common (C) allele. Additionally, an oligonucleotide containing the rs7130173 risk (A) allele showed reduced protein binding affinity compared with the common (C) allele in electromobility shift assays suggesting differential TF binding. We also showed that the rs7130173 risk (A) variant correlated with reduced expression of three predicted open-reading frame genes, C11orf53, C11orf92 and C11orf93, in normal human colon tissues. These data implicate rs7130173 and rs10891246 as functional variants mapping to the 11q23.1 region and C11orf53, C11orf92 and C11orf93 as novel candidate genes involved in CRC etiology.
To investigate the functional basis of the chromosome 11q23.1 association with CRC etiology, we began by identifying potential functional variants using the LD between tagSNP rs3802842 and the other known SNPs within 1 Mb (Fig. 1). There are 89 SNPs in LD with rs3802842 with an r2 ≥ 0.2 defining a 137-kb region of chromosome 11 from coordinates 111,119,694 to 111,256,668 (1000 Genomes Project release June 2011; CEU population, hg19) (23,24). Of those, 20 SNPs are in very high LD with rs3802842 with an r2 ≥ 0.8. None of these variants mapped to coding exons. Additionally, there are no known lncRNAs in this region (25–27). When we applied FunciSNP (21), an R/Bioconductor tool which systematically identifies correlated SNPs coinciding with chromatin features, to the ENCODE data available for the region and ChIP-seq data we generated (discussed below), we found a short list of SNPs, including rs7130173 and rs10891246 discussed below, residing in potential biofeatures. Thus, our attention turned to variants that may affect genetic regulatory elements.
One potentially active promoter region and several candidate enhancer regions were identified using high-resolution genome wide ChIP-seq profiles for histone modifications including H3K4me1, H3K4me3 and H3K9/14ac (Figure 1A and Supplementary Material, Fig. S1), which we generated for the SW480 and HCT-116 CRC cell lines. DNase I hypersensitive (DNase I)-seq data were also available in the UCSC genome browser for HCT-116 and Caco-2 CRC cells through the ENCODE consortium (22,28–30) and a single DNase I hypersensitivity site in HCT-116 mapped within a candidate enhancer fragment (Fig. 1A). These data were overlaid with SNPs in strong LD with the tagSNP to determine which elements contained potential functional variants. Putative regulatory elements lacking SNPs in LD (r2 ≥ 0.2) with the tagSNP were not tested for activity.
Promoter activity determination
The only promoter-correlated histone peak (H3K4me3) in CRC cell line SW480 corresponded to the C11orf92 and C11orf93 bidirectional promoter region near their transcriptional start sites (Fig. 1A). The area encompassing the peak (Ch11 : 111,169,359–111,171,279, hg19) was cloned into a luciferase assay vector and activity was tested in both SW480 and HCT-116 cells. This region showed bidirectional promoter activity in both cell lines (Fig. 2A). The cloned H3K4me3 peak contained two SNPs in LD with the tagSNP: rs10891246 (r2 = 0.967, D′ = 1) and rs7105857 (r2 = 0.826, D′ = 1). Two constructs were mutated to the minor allele using site-directed mutagenesis and promoter activities were measured (Fig. 2B). Only the rs10891246 SNP demonstrated a statistically significant reduction in bidirectional promoter activity when changed from the major allele G to minor allele A. SNPs rs7122375 (r2 = 1, D′ = 1 with the tagSNP) and rs11213823 (r2 = 0.878, D′ = 1 with the tagSNP), included within the cloned fragment but outside of the promoter peak, had no allele-specific effects on promoter activity (Supplementary Material, Fig. S2A).
Enhancer activity determination
The presence of overlapping intronic H3K4me1 and DNase I hypersensitivity peaks suggested the presence of an enhancer, which contained two SNPs in LD with the tagSNP (Fig. 1A). To investigate potential enhancer activity of candidate enhancer regions, several DNA fragments centered over H3K4me1 chromatin marks were PCR amplified from DNA isolated from a normal human lymphoblastoid cell line and cloned in both directions upstream of a luciferase reporter gene driven by the thymidine kinase (TK) minimal promoter (13). Enhancer activities of these fragments were determined following transient transfection in HCT-116 and SW480 CRC cells (Supplementary Material, Fig. S2B). Only one region showed enhancer activity, chr11:111,152,290–111,154,290, and this was seen in both CRC cell lines (Fig. 3A). The enhancer activity was unidirectional and was only observed in the forward orientation. This enhancer region contained two SNPs (rs1987128, r2 = 0.77 and rs7130173, r2 = 0.96) correlated with the risk variant (Fig. 3A). Notably, the rs7130173 SNP mapped within the DNase I hypersensitive site seen in HCT-116 CRC cells (Fig. 1).
To refine the region responsible for the observed enhancer activity, the 2 kb fragment was cloned as two smaller overlapping fragments ∼1 kb each. One of the fragments contained rs1987128, the other contained rs7130173. Both fragments were evaluated for enhancer activity as described above. Enhancer activity (once again unidirectional) was only seen with the fragment containing the rs7130173 variant (Fig. 3A). No enhancer activity was seen in either direction with the fragment carrying the rs1987128 SNP, excluding that SNP as a candidate functional SNP. Given that rs7130173 mapped within a DNase I hypersensitive site, we tested a smaller 300 bp region encompassing the DNase 1 peak for enhancer activity and unidirectional enhancer activity was again observed. This enhancer element displays some tissue specificity as neither the 1 kb-B nor the 300 bp fragment exhibited enhancer activity in the prostate cancer cell line PC-3 (Supplementary Material, Fig. S3).
To determine whether the rs7130173 allele modulated enhancer activity, the rs7130173 SNP was changed from the common (C) to the risk (A) allele by site-directed mutagenesis and enhancer activity was re-assessed (Fig. 3B). Following mutagenesis, the 1 kb fragment containing the risk (A) allele exhibited an enhancer activity statistically lower than that seen for the common (C) allele (Fig. 3B). We obtained the same result when the 2 kb fragment was also mutagenized (data not shown). Surprisingly, when the risk allele was tested in the context of the 300 bp fragment, enhancer activity increased compared with the major allele. This result was seen in both SW480 and HCT-116 cells. This result is not entirely without precedent, as a regulatory element, the enhancer region and its bound TFs by definition interact with other factors bound to additional regulatory elements corresponding to the target genes. The effect of the 700 bp of neighboring sequence upon the action of the smaller enhancer element containing rs7130173 in an allele-specific manner may be key to understanding the CRC mechanism at 11q23.1. As the 2 and 1 kb fragments represent a larger portion of the true genomic context, we postulate that they are more representative of the effect of the SNP in situ, and therefore it would be expected that the minor allele of rs7130173 would correlate with a reduction in gene expression of the target gene(s). To test this hypothesis, we looked for eQTLs with rs7130173 among nearby genes.
Four genes map within 100 kb of rs7130173 (and within an r2 ≥ 0.2 with the tagSNP rs3802842): POU2AF1 and three putative open reading frame genes, C11orf53, C11orf92 and C11orf93 (Fig. 4). To determine the relationship between these genes and the candidate functional variant rs7130173, we examined their gene expression in 308 pathologically normal human colon tissue samples obtained by colonoscopy through the Aspirin/Folate Polyp Prevention Study (31–33). We show a statistically significant correlation between reduced expression of C11orf53 (P = 1.8 × 10−7), C11orf92 (P = 1.2 × 10−4) and C11orf93 (P = 2.6 × 10−9) and the risk (A) allele, whereas expression of POU2AF1 was not statistically different (P = 0.52). The two other genes expressed in colon tissues within 400 kb of rs7130173, LAYN and SIK2 were also investigated but expression of these genes did not show any significant relationship with either allele of rs7130173. The promoter SNP rs10891246 and enhancer SNP rs7130173 are in complete LD with one another (r2 = 1, D′ = 1), and thus the eQTL results are essentially equivalent for both SNPs. When the eQTL analysis was performed for the same genes and the tagSNP rs3802842, we found a similar result to rs7130173 with slightly higher (i.e., less significant) P-values [C11orf53 P = 2.4 × 10−7, C11orf92 P = 1.2 × 10−3, C11orf93 P = 1.4 × 10−8, POU2AF1 P = 0.90].
Allele-specific electromobility shift
Considering our observed allele-specific luciferase assay activity (Fig. 3) and the eQTL results in normal colon tissues (Fig. 4), we predicted that rs7130173 was more likely to be a functional SNP than the tagSNP rs3802842. Further, we hypothesized that differential enhancer activity with the risk allele at rs7130173 is due to changes in binding of TFs to the sequence containing and surrounding the SNP. To test this hypothesis, we used electromobility shift assay (EMSA) to compare the migration patterns of 41mers labeled with near-infrared fluorescent dyes representing both alleles following incubation with nuclear protein extracts from colon cancer cell lines SW480 and HCT-116. The ability to label each allele's probe with a different color allows the direct comparison of binding products to the individual alleles in the same binding reaction and gel lane.
Figure 5A shows the merged color image with the C allele probe in red and the A (risk) allele probe in green; areas with equal amounts of each probe appear yellow. Figure 5B and C shows the individual probe emission channels (700 or 800 nm) of the same gel as black and white images (for reproduction clarity). Lanes 1 and 2 contain the probes alone in the absence of nuclear proteins. In Lanes 3 and 5, we see several distinct protein/DNA bands containing the C allele probe. These bands are faint or absent with the A allele probe (Lanes 4 and 5). To determine if these complexes are specific to the sequence surrounding rs7130173, unlabeled competitor DNA oligonucleotides were added to the reactions in Lanes 6 and 7. The bands marked with arrows are lost upon incubation with unlabeled competitor DNA, and are thus likely specific to the probe sequence. These bands may represent several proteins that bind to the sequence independently, the components of a multiprotein complex, or degradation products of a single protein or protein complex that binds the probe. Both alleles, in 200-fold excess, effectively compete with the C allele for binding factors. Experiments with varying ratios of labeled C and A allele probes show that the binding affinity of the C allele is roughly four times greater than the A allele (data not shown).
Lanes 3 through 7 show the binding of the rs7130173 allele probes to nuclear proteins from colon cancer cell line SW480. Recall that this cell line exhibited histone marks for an active enhancer element containing rs7130173 in our ChIP-seq experiments (Fig. 1). Lanes 8 through 12 show the same set of binding experiments using nuclear extracts prepared from the HCT-116 cell line which did not exhibit strong histone marks for an active enhancer. It is notable that several of the protein/DNA bands appear different when comparing the two cell line extracts. The prominent, specific band in SW480 is less distinct in HCT-116, while there are additional bands in HCT-116 near the top of the lane absent in SW480. The physiological relevance of these cell line differences remains to be elucidated.
Candidate Transcription Factors
Determining the TF responsible for the differential enhancer activity is of great interest to understanding the cancer risk mechanism at chromosome 11q23. Using the Biobase Match tool built upon the TRANSFAC matrix of TF motifs, a list of candidate TFs was generated (Table 1) (34,35). This table catalogues all TFs predicted to have affinity for the region surrounding rs7130173 with the C allele present which show a reduction in binding score if the A allele is present instead. TFs with no change or increase in binding score with the A allele were not included on this list. Future experimental efforts will be needed to determine which of these candidates is responsible for the effects observed in our studies, but several: p53, HIC1, PPARα/γ, ZF5, Spz1 and AP-2rep, are intriguing possibilities due to their known links to carcinogenesis.
|Sequence||Transcription factor(s)||C allele||Difference with A allele|
|ccaccgagccTGCCCca||LXR, PXR, CAR, COUP, RAR||0.821||0.716||−0.147||−0.064|
|agcctgCCCCAccaagggaaa||Muscle initiator sequence-20||1.000||0.805||−0.004||−0.002|
|Sequence||Transcription factor(s)||C allele||Difference with A allele|
|ccaccgagccTGCCCca||LXR, PXR, CAR, COUP, RAR||0.821||0.716||−0.147||−0.064|
|agcctgCCCCAccaagggaaa||Muscle initiator sequence-20||1.000||0.805||−0.004||−0.002|
Search parameters used for Biobase match program were Matrix library: TRANSFAC MATRIX TABLE, Release 2013.1, Profile: vertebrate_non_redundant.prf, only high-quality matrices, minimize false negatives. Candidate TF list was determined using the C allele ±20 bp. SNP position is underlined in binding sequence. TFs with no change or increase in core and matrix score upon search using the A allele were ignored. Reduction in core and matrix scores with the A allele are listed in column Group 2.
Identification of functional variants mapping to GWAS regions has proven to be a significant challenge as several hypotheses must be tested for every GWAS locus, requiring a number of complex molecular, genetic and bioinformatics approaches (36). Until now, no candidate functional SNP had been identified at 11q23.1. No non-synonymous SNPs in high LD with the tagSNP have been found in neighboring genes (3,4). In this comprehensive study, we describe the identification and preliminary characterization of two candidate functional SNPs, rs10891246 and rs7130173, mapping to the 11q23.1 CRC GWAS region. These two SNPs modulate gene expression in cell-based assays in an allele specific manner, with rs10891246 mapping within a promoter region and rs7130173 in an enhancer region. This is manifest in normal human colon tissues as eQTL for three genes, C11orf53, C11orf92 and C11orf93. Using bicolor EMSA, we further showed that protein binding is markedly different for the sequences containing the two alleles of rs7130173, with binding affinity of the C allele roughly four times greater than the A allele. This level of reduction in affinity for the risk allele is consistent with the moderate effect the risk allele has on enhancer activity as seen in the luciferase assays. However, a 4-fold change in TF binding affinity may have significant consequences for gene expression in cells and colon tissues. In fact, this was demonstrated for the 8q24 MYC enhancer containing rs6983267, the SNP associated with more human cancers, including CRC and prostate cancers, than any other GWAS identified SNP. In Myc-335−/− mice lacking the enhancer element, expression of MYC in colon crypts was only modestly reduced, but when this is mutation is made in a APCmin mouse, there were profound effects on the number of polyps formed (37,38).
There are four candidate genes that map to the risk region C11orf53, C11orf92, C11orf93 and POU2AF1, with rs7130173 mapping within intron 2 of C11orf53, and rs10891246 mapping within the bidirectional promoter between C11orf92 and C11orf93. POU2AF1, also known as BOB.1 or OBF1, was an attractive candidate for the CRC risk mechanism as it is a TF known to regulate TCF12, a TF linked to colon cancer metastasis and invasion (39,40). However, we did not see a relationship between our SNP alleles and the level of POU2AF1 in normal colon tissue samples. The three open reading frame genes represent better candidate targets as the eQTL analysis revealed a statistically significant correlation between the rs7130173 risk allele and reduced expression of C11orf53, C11orf92 and C11orf93.
Data available through the Cancer Genome Atlas reveal a low frequency (<5% of cases) of overexpression for genes C11orf53, C11orf92 and POU2AF1 in the ∼193 CRC tumors for which RNA-seq and other data are available. In addition, a very low frequency (<1%) of CRC tumors showed somatic mutations in C11orf53, C11orf92, C11orf93 or POU2AF1. There have been no published reports implicating any of the genes in cancer other than a single study that found C11orf92 up-regulated in ovarian cancers, and the clinical relevance of these uncharacterized gene products remains unclear. Immunohistochemical staining for C11orf92 shows the protein in extracellular granules in normal mucosa and at the periphery of colon cancer cells, with the level of protein staining and genotype correlation consistent with our eQTL result (41). Additional functional analyses will be required to determine the role of these candidate genes in CRC development.
While none of the candidate genes have been implicated strongly in CRC, chromosome 11q23–q24 has been reported frequently deleted in CRCs and other tumor types (42). Two published studies report however that allelic imbalance across this region does not correlate with the rs3802842 risk allele (4,43) suggesting any allelic imbalance seen in tumors is independent of the risk effect of this locus.
The 11q23.1 tagSNP rs3802842 (along with the risk variant from 8q23.3) has been implicated in increased risk and earlier onset of CRC in Lynch syndrome females in a dose-dependent manner (44–46). Lynch syndrome is the most common inherited form of CRC and is caused by an inherited mutation in one of the DNA mismatch repair (MMR) genes. The individual risk within an MMR-deficient CRC family varies widely among family members and this variability may in part be accounted for by the risk variant on 11q23.1. Interestingly, the HCT-116 CRC cell line, which is MMR deficient, does not show histone marks for an active promoter around rs10891246 nor an active enhancer encompassing rs7130173 in our ChIP-seq data. The histone biofeature marks that sparked our interest in the rs1089126/rs7130173 haplotype in CRC risk were found using the MMR proficient CRC cell line SW480. SW480 carries a truncation in the APC gene, the gene commonly associated with autosomal dominant familial adenomatous polyposis (47). It also appeared that the proteins that bound the major allele of rs7130173 in our EMSA assay were in higher abundance in SW480. Further studies will be needed to determine the relationship between these variants, the MMR pathway, and the expression of C11orf53, C11orf92 and C11orf93, not only in CRC cells lines but also in colon stem cells, during colon development or in normal colon crypts. Our eQTL data imply that one or more of the genes C11orf53, C11orf92 and C11orf93 functions as tumor suppressors since lower levels of expression correlated with the CRC risk alleles.
Characterizing the enhancer at rs7130173 led to the interesting observation that the SNP alleles correlate with differential luciferase activity depending on the length of the fragment assayed in CRC cell lines. Our eQTL result strongly implies that the more physiologically relevant construct is the 1 kb fragment with 700 bp of additional sequence upstream of the enhancer element containing rs7130173. Some of the functional significance of the allele specific activity appears to lie in an interaction between proteins bound to additional upstream regulatory elements in the 700 bp region. The chromatin landscape of the entire region is of interest and further studies with chromatin capture technologies are needed to untangle the effects of each DNA–protein interaction. It is important to note that downstream of rs7130173 (within 100 bp) is a CCCTC-binding factor, CTCF and Rad21/Cohesin binding site in HCT-116 and other cell lines (48–52). Once regarded as only an insulator element, it is now known that CTCF and cohesion can also drive gene expression by linking enhancer elements to their gene targets (53,54). It is therefore possible that the enhancer element encompassing rs7130173 also regulates another gene(s) in trans at quite a linear distance.
In this study, ChIP-seq and DNase I hypersensitivity peak chromatin biofeatures were used to identify an active promoter whose activity could be modulated by SNP rs10891246 and several candidate enhancer regions including one whose activity could be modulated by the SNP rs7130173. Instead of identifying a single causal SNP, we found a haplotype with two SNPs having complementary, or perhaps even synergistic, effects on gene expression. These two SNPs are part of a haplotype including four additional SNPs (rs3087967, rs4477469, rs10789822 and rs7103178, r2 = 1, D′ = 1, MAF = 0.22, CEU population) in strong LD with the tagSNP rs3802842. Our data suggest that two of these SNPs are functional and contribute to the risk for this region. We did not identify any functional effects for the remaining four SNPs within this haplotype. SNPs rs3087967, rs4477469 and rs10789822 map to intergenic regions lacking histone modifications in our ChIP-seq studies, but rs7103178 lies within the 3′ UTR of C11orf92.
The use of biofeature information such as ChIP-seq data to identify candidate enhancers is similar to the approach used previously by members of this team to identify enhancers within the 8q24 region (13) and represents a powerful tool to identify candidate functional SNPs that map to regulatory regions. We have performed a comprehensive analysis of the 11q23.1 region associated with CRC: this region is not highly complex as it has relatively few ChIP-seq peaks corresponding to histone modifications or open chromatin and lacks putative long non-coding RNAs. Applying the same strategy to other CRC GWAS regions represents a powerful systematic and comprehensive approach to understanding CRC risk. This comprehensive evaluation may be quite challenging, however, for those regions with larger LD structure, multiple regions of open and active chromatin, large numbers of genes and lncRNAs. Use of bioinformatics tools, such as FunciSNP, will help facilitate identification of high priority candidates, but it should be noted that comprehensive experimental strategies are still required to elucidate biochemical and genetic mechanisms unique to each region in order to develop the predictive diagnostics and potential therapies that will demonstrate the importance of these studies. This case also highlights the importance of agnostic GWAS approaches in cancer research as this region of chromosome 11q23.1 was formerly overlooked and largely uncharacterized, but is now clearly relevant to understanding colorectal cancer in large populations.
MATERIALS AND METHODS
HCT-116 and SW480 CRC cell lines and the PC-3 prostate cancer cell line were obtained from the American Type Culture Collection (Manassas, VA, USA). HCT-116 and SW480 cells were grown in McCoy's 5A (Mediatech) supplemented with 10% fetal bovine serum (Omega Scientific, Inc.), and 1% penicillin/streptomycin, and incubated at 37°C and 5% CO2. PC-3 cells were grown in DMEM (Mediatech) supplemented with 10% fetal bovine serum (Omega Scientific, Inc.), and 1% penicillin/streptomycin, and incubated at 37°C and 5% CO2.
ChIP was performed as previously described (55) using SW480 and HCT-116 CRC cells. Quantitative real-time PCR was performed using SYBR green (Bio-Rad) and Platinum Taq DNA Polymerase (Life Technologies) using an iQ5 Real Time System (Bio-Rad) and the following cycling conditions: 95°C for 5 min, followed by 40 cycles of 95°C for 30 s and 60°C for 30 s. To calculate the enrichment of each immunoprecipitation at a target PCR amplicon a signal threshold was selected and a cycle threshold (Ct) determined. Fold Enrichment for signal was then determined using the formula: fold enrichment = 2((Ct Ip) − (Ct Input)). The antibodies used were: H3K4me1 (#ab8895, Abcam), H3K4me3 (#04-745, Millipore), Acetylated H3 (#06-599, Millipore), Rabbit IgG control (#ab46540, Abcam) and Histone H3 (#ab1791 Abcam).
ChIP DNA along with ChIP input DNA were prepared as described above, and high-throughput sequencing of the fragments was performed using the Illumina GAII platform. Enrichment for known target sequences was verified by SBYR green quantitative real-time PCR before ChIP and input DNA were sequenced. Libraries for ChIP-seq were prepared, and single-end sequencing of 40 cycles was performed following protocols recommended by Illumina. For SW480, four libraries were sequenced to generate from 18–65 million tags each after standard Illumina quality (PF) filtering. Sequence tags were aligned to UCSC genome assembly hg19 using MAQ (PMID 18714091), and those that were unambiguously mapped to a single position in the genome (Mapping Quality score greater than or equal to 30) were retained for downstream analysis. All sequencing lanes for each library were merged using SAMTOOLS (PMID 19505943), and potential PCR duplicates (tags mapping to identical genomic positions) were removed. On average, 70% of all sequencing tags were unambiguously mappable and passed duplicate filtering. In order to create genomic coverage (wiggle) tracks, each tag was extended by half the median fragment size (200 bp) relative to the mapped strand, and this estimate of the fragment midpoint was used to calculate total tag coverage along the genome.
Plasmids and luciferase assays
DNA fragments corresponding to candidate promoter and enhancer regions were PCR amplified using genomic DNA from a normal lymphoblastoid cell line. Fragments were amplified and cloned using CloneAmp HiFi PCR Premix and the In-Fusion HD cloning kit (Clontech). For the promoter assay, the candidate region Chr11: 111,169,359-111,171,279 was cloned into plasmid pGL4.10 (Promega) at the Kpn1 site. A region of Ch10 with no open chromatin peaks or proximal genes, Chr10: 8,690,709–8,690,917 served as the negative control, while vector gGL4.13 served as the positive control. For the enhancer assays, PCR fragments (2 kb: chr11: 111,152,290–111,154,290; 1 kb-A: chr11: 111,152,290–111,153,217; 1 kb-B: chr11: 111,153,185–111,154,290; 300 bp: chr11: 111,153,991–111,154,290) were subcloned into the Sac II restriction enzyme site upstream of a TK minimal promoter-firefly-luciferase vector in both directions. PCR fragments were sequenced in both directions using Sanger sequencing to confirm the presence of the candidate variants and the absence of PCR amplification-induced mutations (Genewiz). More than three independent constructs in both directions were evaluated. For both promoter and enhancer assays, HCT-116 and SW480 cells (1.25 × 104 cells/well) were seeded into 96-well plates. Cells were co-transfected with reporter plasmids and constitutively active pRL-TK Renilla luciferase plasmid (Promega) using Lipofectamine 2000 Reagent (Life Technologies) according to the manufacturer's instructions. After 24 h, cells were harvested and extracts were assayed for luciferase activity using the Dual-Luciferase Reporter Assay System (Promega) according to the manufacturer's instructions, and measured using a Dynex MLX Microtiter Plate Luminometer or a Tecan Infinite F200Pro Microplate Reader. The ratio of luminescence from the experimental sample to the control reporter was calculated for each sample, and defined as the relative luciferase activity. The data are presented as mean ± SD of at least three independent transfection experiments each conducted in triplicate. To assess allele-specific effects, specific SNP alleles were generated by mutagenesis using the QuikChange site-directed mutagenesis kit (Agilent Technologies). Plasmids were sequenced and transfected into the cells as above. At least six-independent clones of each construct were generated, confirmed by Sanger sequencing in both directions and tested for activity as above. The mutagenesis data are presented as mean fold change ± SD of at least three independent transfection experiments each conducted in triplicate. Two-sided P-values between alleles were calculated using the Student's t-test.
DNA and RNA isolation
RNA and DNA were extracted from 320 fresh frozen tissue biopsies of normal colorectal mucosa that were obtained from surveillance colonoscopy as part of the Aspirin/Folate Polyp Prevention Study (31–33). The tissue was homogenized using the Precellys Minilys bead mill (Bertin Technologies) and total RNA was extracted using the mirVana kit (Life Technologies). Genomic DNA was extracted using the MELT kit (Life Technologies). The Aspirin/Folate Polyp Prevention Study was approved by ethics committees at the participating institutions and all subjects provided written informed consent.
The genotypes of rs7130173, rs3802842, rs7105857 and rs10891246 were determined using Taqman SNP genotyping assays (Life Technologies) and the Type-it Fast SNP Probe PCR Kit (Qiagen). Assays were read using an Applied Biosystems 7900HT Real Time Instrument and analyzed with the manufacturer's software, SDS2.3 (Life Technologies).
Following quantitation, 308 samples had sufficient RNA to proceed to cDNA synthesis using 250 ng of total RNA with the High Capacity RNA-to-cDNA kit (Life Technologies). cDNA samples were then preamplified with TaqMan Preamp Master Mix and 96 TaqMan Gene Expression Assays (Life Technologies), and loaded on a microfluidics chamber for real-time PCR analysis (96.96 Dynamic Array and BioMark HD system, Fluidigm). Relative quantity (RQ) of expression of C11orf53 (Hs00736614_m1), C11orf92 (FLJ45803) (Hs04186919_m1), C11orf93 (Hs00416978_m1) and POU2AF1 (Hs01573371_m1) was measured using the comparative CT method (ΔΔCT). Expression was normalized using beta-glucuronidase.
The effect of each extra minor allele (0, 1 or 2) on RQs was evaluated utilizing linear regression and assuming an additive genetic model. RQs were log-transformed prior to analysis to ensure the normal distribution. Test for significance was obtained from a likelihood ratio test and a two-sided P-value of < 0.05 was considered statistically significant. Our findings remained statistically significant even after applying overly conservative Bonferroni correction for multiple testing (P < 4.2 × 10−3). Levels of log-RQs were plotted as a function of genotypes using a box plot—dot plot overlay. All data were analyzed with R 12.13.1 using SNPassoc package (56).
Electrophoretic mobility shift assay
Near-infrared dye (Li-Cor Bioscience)-labeled EMSA oligonucleotide probes spanning the SNP rs7130173 were synthesized by Integrated DNA Technologies and annealed in 1× TE (5′-IRDye 700-GTGTGAGCCACCGAGCCTGCCCCACCAAGGGAAACTTTATG-3′ to 5′-IRDye 700-CATAAAGTTTCCCTTGGTGGGGCAGGCTCGGTGGCTCACAC-3′, and 5′-IRDye 800-GTGTGAGCCACCGAGCCTGCACCACCAAGGGAAACTTTATG-3′ to 5′-IRDye 800-CATAAAGTTTCCCTTGGTGGTGCAGGCTCGGTGGCTCACAC-3) Nuclear extracts from HCT116 and SW480 CRC cell lines were prepared using the NE-PER nuclear and cytoplasmic extraction kit (Pierce, Thermo Scientific). Binding reactions containing 10× binding buffer (100 mm Tris, 500 mm KCl, 10 mm DTT, pH 7.5), 1 μg poly(dI•dC), 2.5 mm DTT/0.25% Tween 20 and 5 μg nuclear extract were preincubated with 20 pmol unlabeled competitor DNA at room temperature for 10 min (5′-GAGCCACCGAGCCTGCCCCACCAAGGG-3′, 5′-GAGCCACCGAGCCT GCACCACCAAGGG-3′, Integrated DNA Technologies). IRDye-labeled oligos were added (50 fmol per oligo) and reactions were incubated in darkness for 20 min at room temperature prior to addition of 10× Orange loading dye (Li-Cor Biosciences). Complexes were resolved on a 6% native polyacrylamide gel run at 200 V for 90 min at 4°C in 0.5× TBE and imaged in the glass plates using a Li-Cor Odyssey Imager.
Analysis of TF binding sites
Forty-one base pair DNA sequences centered on rs7130173, C allele, were scanned for TF binding motifs using the BioBase match tool (BioBase Biological Databases). TRANSFAC MATRIX TABLE, Release 2013.1, profile: vertebrate_non_redundant.prf, only high-quality matrices, minimize false negatives. This analysis was repeated for the rs7130173 A allele. Core and matrix scores were compared between alleles and binding factors with no change in binding scores or increases with the A allele were removed from the candidate TF list.
This work was supported by the National Institutes of Health (R01 CA143237 to G.C.; U19 CA148107 to G.C. and G.A.C.; R01 CA059005 to J.B.). The scientific development and funding of this project was in part supported by the USC Norris Comprehensive Cancer Center Support Grant (NCI P30 CA014089) and by the CORECT consortium on behalf of the Genetic Associations and Mechanisms in Oncology (GAME-ON) network. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.
We thank Dr John Baron, collaborators and participants of the Aspirin/Folate Polyp Prevention Study for the normal colon tissues used in this study.
Conflict of Interest statement. None declared.
- colorectal cancer
- cell line
- deoxyribonuclease i
- genes, reporter
- linkage disequilibrium
- nuclear proteins
- single nucleotide polymorphism
- promoter regions (genetics)
- transcription factor
- candidate disease gene
- genome-wide association study
- enhancer of transcription
- quantitative trait loci