The mechanisms of genome-wide target gene regulation by TCF7L2 in liver cells

In the liver Wnt-signaling contributes to the metabolic fate of hepatocytes, but the precise role of the TCF7L2 in this process is unknown. We employed a temporal RNA-Seq approach to examine gene expression 3–96 h following Tcf7l2 silencing in rat hepatoma cells, and combined this with ChIP-Seq to investigate mechanisms of target gene regulation by TCF7L2. Silencing Tcf7l2 led to a time-dependent appearance of 406 differentially expressed genes (DEGs), including key regulators of cellular growth and differentiation, and amino acid, lipid and glucose metabolism. Direct regulation of 149 DEGs was suggested by strong proximal TCF7L2 binding (peak proximity score > 10) and early mRNA expression changes (≤18 h). Indirect gene regulation by TCF7L2 likely occurred via alternate transcription factors, including Hnf4a, Foxo1, Cited2, Myc and Lef1, which were differentially expressed following Tcf7l2 knock-down. Tcf7l2-silencing enhanced the expression and chromatin occupancy of HNF4α, and co-siRNA experiments revealed that HNF4α was required for the regulation of a subset of metabolic genes by TCF7L2, particularly those involved in lipid and amino-acid metabolism. Our findings suggest TCF7L2 is an important regulator of the hepatic phenotype, and highlight novel mechanisms of gene regulation by TCF7L2 that involve interplay between multiple hepatic transcriptional pathways.


ChIP-Seq sample preparation
ChIP-Seq was performed on scramble and siTcf7l2 treated cells in duplicate (for TCF7L2) and each sequenced sample represented a pool of 2 independent immunoprecipitations, each consisting of between 10 and 20 µg of chromatin. Chromatin was prepared from approximately 2.5 x 10 7 siTcf7l2 or scramble electroporated cells and sheared with MNase using the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling Technology). Chromatin was fragmented such that the majority of the chromatin was between 150 bp and 900 bp and each aliquot was incubated with either 5 µl of the anti-TCF7L2 antibody or 1 µl of the anti-HNF4α antibody. A sample consisting of 2% of the total input chromatin was removed from each and served as the sequencing control (see data analysis for more details). Following an overnight incubation with antibody, the immunoprecipitated material was purified as described by the manufacturer Adapter ligation was carried out using stock Illumina adapters and Kapa T4 ligase at 30°C for 10 minutes, and this was followed by a single DNA clean-up using homemade SPRI beads. A minimal PCR amplification of the adapted material was performed prior to size selection to convert y-shaped adapters to double-stranded DNA for more reliable electrophoresis and size selection. This PCR consisted of 5 cycles with the Illumina PCR primer cocktail and Kapa HiFi HotStart Polymerase ReadyMix (cat # KK2601). Following a Qiagen Minelute purification, the adapted material was size selected on a 2% agarose gel before a final PCR amplification of between 8 -12 cycles and SPRI bead clean-up. Final libraries were checked for size using Bioanalyzer HS DNA or DNA 1000 chips and enrichment was confirmed in the final product using Axin2, Lef1 (for TCF7L2) and Apoc3 (for HNF4α) loci using realtime PCR.

Defining the H4IIE-specific transcriptome
To account for the relatively incomplete nature of the rn4 reference transcriptome (compared to human and mouse genomes), we first defined the H4IIE-specific transcriptome from a large pool of 213.4 million RNA-seq reads originating from a separate experiment (24 h treatment of TCF7L2 or scramble siRNA, 3 biological replicates each). Because the rn4 genome has a short gap within the Tcf7l2 gene that includes part of an exon -rendering RNA-seq read alignment at this loci ineffectual -it was partially filled using rat EST sequences, resulting in updating the reference exon start location from chr1:262,210,923 to 262,210,894. A corresponding fix was also introduced into the UCSC reference transcriptome for that Tcf7l2 exon. The pooled reads were mapped, using Tophat version 2.0.4 (with underlying Bowtie version 0.12.8) (1,2), against UCSC known transcripts for rn4, supplemented with additional transcripts from Ensembl, and allowing for the identification of novel exon-exon junctions for the reference genome (rn4).
Essential Tophat command arguments were: --library-type fr-unstranded --prefiltermultihits --no-coverage-search -G <UCSC+Ensembl known transcripts>. Transcript mappings were further processed using Cufflinks package version 2.0.2 (3) with essential command arguments for Cufflinks: --frag-bias-correct --multi-read-correct --upper-quartilenorm -g <UCSC+Ensembl known transcripts> --library-type fr-unstranded, and for Cuffcompare: -r <UCSC+Ensembl known transcripts> -R -d 20 -V <.gff output of Cufflinks>. The resulting initial transcriptome was cleaned by removing all transcripts that were identified as strandless (in Cuffcompare class codes u, o, x, and, if also single-exon, i), likely artifacts (Cuffcompare class codes e, p, r, s and c), or fully silent (FPKM=0) in this large pooled read set with roughly 6 times more reads than in the subsequent time course RNA-seq samples. The transcripts that did not represent any known rn4 gene were then analyzed against mouse (mm10), human (hg19), as well as an updated rat reference transcript sequences using discontiguous megablast with stringent cutoffs to provide official gene symbols additional gene identification. The resulting H4IIE-specific transcriptome that contains 22125 transcripts representing 15768 genes, out of which 2112 (9.5%) transcripts representing 861 (5.5%) genes remained without an official gene symbol and is made available as part of the GEO accession GSE53862.

Defining the consensus peak locations
The main motivation behind the consensus summit calculation method (4) is to answer to the difficult question of which ChIP-seq peaks represent "the same" binding sites in different samples for the same transcription factor. To better distinguish nearby binding sites from each other, the method resorts to, instead of the peaks themselves that typically are a few hundred bp wide, the punctate peak summits, i.e. the single nucleotide positions within each peak where the ChIP-seq signal is at maximum. The main processing steps include 1) collecting all peak summit locations from all ChIP-seq samples to be compared, 2) collapsing that whole summit location data set into a single vector of summit locations per chromosome, 3) inspecting the local density of the summit locations to determine, with suitable cutoffs and other parameters (essentially, triangular density with bandwidth of 20 bp and therein, local maxima detection with span of 100 bp), which individual summits likely represent the same binding site, thereby 4) forming the set of consensus summit locations that accurately capture all the binding events that were identified by the original ChIP-seq peak calling in any of the included samples. Subsequent processing assigns each consensus summit location in each sample the desired numerical values, e.g. FE (with the help of bedgraph pileup files saved during the original peak calling and extracting the scores using the MACS2 bdgcmp program (https://github.com/taoliu/MACS/)), regardless of whether a peak was originally identified at that location or not, thereby enabling easy and reliable comparisons across the entire dataset. Finally, for uses were wider, peak-like regions are required or beneficial, the consensus summit locations were extended by ±100 bp but without allowing overlaps between adjacent summits less than 200 bp apart. At best this method is when there are more than 2 samples to compare; our motivation to use it also here was to be able to compare peak FE values across the entire peak set, including those locations were only one sample had an original peak.

Integration of DEGs, ChIP-seq peaks and binding motifs
'Peak proximity scores' were calculated, using the algorithm introduced by Ouyang et al. (5), on our TCF7L2 and HNF4α ChIP-seq data in conjunction with the RNA-seq-identified DEGs over the siTcf7l2treatment time-course. Briefly, for each peak-gene pair (the gene represented by its TSS), a geometrically decaying distance was multiplied by the peak FE, and these individual scores summed for that gene (TSS). In the calculations for the geometrically decaying distance, we defined the critical parameter 'd0' as 50,000 bp in order to restrict, in practice, any contribution from a peak to the gene-wise score to up to about ± 200 kb. Given the tolerable increase in computation time, we nonetheless performed the calculations on all peaks that were up to 1 Mb from the gene TSS.
The known motif analysis using HOMER version 4.3 was performed using a set of known motifs supplied with HOMER that was first reduced to remove similar motifs using a similarity cutoff 0.80 (6).