Nuclear receptors (NRs) regulate gene expression by binding specific DNA sequences consisting of AG[G/T]TCA or AGAACA half site motifs in a variety of configurations. However, those motifs/configurations alone do not adequately explain the diversity of NR function in vivo . Here, a systematic examination of DNA binding specificity by protein-binding microarrays (PBMs) of three closely related human NRs—HNF4α, retinoid X receptor alpha (RXRα) and COUPTF2—reveals an HNF4-specific binding motif (H4-SBM), xxxxCAAAGTCCA, as well as a previously unrecognized polarity in the classical DR1 motif (AGGTCAxAGGTCA) for HNF4α, RXRα and COUPTF2 homodimers. ChIP-seq data indicate that the H4-SBM is uniquely bound by HNF4α but not 10 other NRs in vivo , while NRs PXR, FXRα, Rev-Erbα appear to bind adjacent to H4-SBMs. HNF4-specific DNA recognition and transactivation are mediated by residues Asp69 and Arg76 in the DNA-binding domain; this combination of amino acids is unique to HNF4 among all human NRs. Expression profiling and ChIP data predict ∼100 new human HNF4α target genes with an H4-SBM site, including several Co-enzyme A-related genes and genes with links to disease. These results provide important new insights into NR DNA binding.
Nuclear receptors (NRs) are ligand-dependent transcription factors (TFs) that play important roles in nearly every aspect of human physiology ( 1 ). They are linked to human disease, are popular drug targets and play a major role in regulating the expression of genes responsible for drug metabolism ( 2–4 ). Therefore, proper elucidation of their target genes has important clinical ramifications. NRs regulate gene expression primarily by first binding specific DNA response elements in the regulatory regions of target genes. Basic rules for NR DNA binding that consist of distinct half sites motifs for steroid (AGAACA) and non-steroid (AGGTCA) receptors and half site configurations—direct (DR, →→), inverted (IR, →←), everted repeats (ER, →←) and non-repeats (nRs) as well as spacing between the repeats—were established early on ( 5–8 ). This dogma, the ‘DR rule’, successfully drove the identification of individual NR target genes in the pre-genomic era. However, there remains considerable overlap in the apparent binding specificity of many NRs, which could be due to the high degree of conservation among the NR DNA-binding domains (DBDs) and/or to the fact that NR DNA binding specificity has not been analyzed in a systematic, global fashion.
The magnitude of the complexity and intricacy of DNA binding specificity is illustrated by the fact that for a 13-nt-long motif, such as a direct repeat of AGGTCA with a spacing of 1 nt (DR1, AGGTCAxAGGTCA), there are 4 13 /2 (∼34 million) different potential DNA sequences. If a given NR binds just 0.01% of those sites for a specificity of 1 in 10 000, that still yields 3400 unique sequences, all of which could have just a minor variation on the DR1 consensus. The question then arises as to how many of those different sequences will bind a given NR and whether there are motifs specific to different NRs.
There are three well-characterized subfamilies of receptors that bind DR1s as homodimers—HNF4α (NR2A1), COUPTF2 (NR2F2) and RXRα (NR2B1). They are all very highly conserved, closely related and regulate a variety of metabolic genes in common tissues such as liver, kidney and intestine ( 9 , 10 ). HNF4α is a constitutive activator that binds an endogenous ligand (linoleic acid) ( 11 ). It is considered to be a master regulator of liver-specific gene expression, including genes involved in intermediary metabolism as well as xenobiotic and drug metabolism ( 12–15 ). HNF4α is linked to several human diseases including diabetes, hemophilia, hepatitis, atherosclerosis and inflammatory bowel disease ( 16 , 17 ).
RXRα is also expressed primarily in the liver, as well as kidney, gut, muscle and skin ( 10 ). While it is best known as a heterodimeric partner of other NRs such as PPAR (NR1C), RAR (NR1B), FXR (NR1H), PXR (NR1I2), TR (NR1A) and VDR (NR1I1) ( 7 , 18 ), it binds directly to the synthetic ligand 9- cis retinoid acid and has been shown to activate transcription in the absence of an ectopically expressed partner ( 19–23 ). COUPTF2 is also present in the liver and, like RXRα, is fairly ubiquitously expressed. However, it acts primarily as a repressor of transcription ( 24 , 25 ); it remains an orphan receptor in that its endogenous ligand has not yet been identified, although it has been shown to bind and respond to retinoids ( 26 ). Like HNF4α, COUPTF2 binds DNA well as a homodimer but unlike HNF4α it can also heterodimerize with RXRα ( 21 , 25 , 27 ). It was recognized early on that HNF4α, RXRα and COUPTF2 all share common binding sites (that roughly resemble DR1s) in the promoters of certain genes and consequently compete for regulation of those genes ( 28–30 ). However, it was also noted that there are a couple of binding sites in other genes that were not bound by all three NRs ( 28 , 31 , 32 ). While these results suggested the existence of NR-specific binding motifs, common versus unique binding features were never identified and the extent of the overlap remained obscure.
New genome-scale technologies, both in vivo and in vitro , now allow us to address the issue of NR binding specificity in an appropriately global fashion ( 33 ). For in vivo binding, chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) enables robust and accurate identification of TF binding regions, but the resolution is not sufficient to identify the exact site to which TFs bind; this must be done by statistical inference ( 34 ). For in vitro binding, protein-binding microarrays (PBMs), can assay the binding of TFs to 10 000 s of DNA probes in a high throughput fashion ( 35 ). PBM data are extremely useful for mining ChIP-seq data to identify the precise location of TF binding and to predict new target genes by cross-referencing. We recently applied a modified PBM approach to HNF4α and identified 100 s of new direct targets of HNF4α by combining the PBM results with genome-wide location and expression profiling data ( 13 ).
Here, we apply the PBM approach to the problem of distinguishing the binding specificity of HNF4α, RXRα and COUPTF2. We show that RXRα homodimers bind DNA very well in the PBM and that they have a specificity nearly identical to COUPTF2 homodimers, with a clear preference for the 3′ half site of a DR1 motif. The PBMs also identified an HNF4α-specific binding motif (H4-SBM) (CAAAGTCCA) that was verified in vivo in ChIP-seq data. Finally, we determine the amino acid residues in the HNF4α DBD (Asp69 and Arg76) responsible for the unique binding specificity of HNF4α and identify ∼100 new human target genes that contain the HNF4–specific motif. These results have important implications not just for HNF4α, RXRα and COUPTF2, but for the entire NR superfamily as well.
MATERIALS AND METHODS
PBM design, assay and data processing
Two different custom arrays in 8 × 15 k format ordered from Agilent Technologies (Santa Clara, CA, USA) were used to test binding specificity (PBM2) ( 13 ) and half site preferences (PBM6.1) (see Supplementary Figure S2 for design and Supplementary Table S4 for complete sequences of PBM6.1). Each of the ∼3000 sequences was spotted four (PBM6.1) to five (PBM2) times in each of the eight grids. The PBM assay was performed as previously described ( 13 ) with minor modifications: 1.6 µM Cy5 dUTPs (Enzo Life Sciences) was used to label the double-stranded DNA on the slide; ∼0.8–1.2 µg of each full length NR in crude nuclear extracts from transfected Cos-7 cells was applied to the PBM; and bound NRs were detected with primary antibodies for human HNF4α, RXRα, COUPTF2 and RARα from R&D Systems (Catalogue # PP-H1415, PP-K8508, PP-H7147 and PP-H1920-00, respectively) at a 1:100 dilution overnight at room temperature, and then secondary antibody (Dαm-Dylight-Cy3, Jackson ImmunoResearch) at a 1:50 dilution for 1 h. After three washes in PBS-0.1% Tween-20, the slide was dried and scanned at 633 (Cy5) and 543 nm (Cy3). Data extraction and normalization was performed as previously described ( 13 ). The binding threshold of each NR was set to two SD from the mean or the quantile 0.95 of the random controls, whichever was higher. DNA motifs (represented using position weight matrices, PWM) were generated using SeqLogo ( 36 ) and Weblogo v2.8.2 ( 37 ). Hypergeometric tests are performed using phyper in R. While the HNF4α PBM2 was repeated, our previous raw data for the HNF4α PBM2 ( 13 ) was used for analysis since the signal intensities were closer to that of RXRα and COUPTF2.
Motif mining of ChIP-seq peaks
DNA sequences in ChIP-seq peak regions were extracted by Cisgenome and analyzed for relative enrichment level (fold enrichment) compared to matched control regions as described in ( 38 ). The length of each control region was determined by the average length of all peaks in a data set and the total number of control regions was five times as many as ChIP-seq peaks. If multiple data sets were available for an NR, the average was used as the final score and SD was given ( Supplementary Table S1 ). The empirical P value of the enrichment score for each motif was determined using the enrichment distribution of 1000 6- or 9-nt random genomic sequences. Coordinates of peak centers were either given in the preprocessed BED or WIG files or obtained by reprocessing the data (in the case of ERRβ and Rev-Erbα). Density plots were generated by R; variances between plots were calculated using a two-sided F -test. Motif mining was performed by Gibbs Motif Sampler ( 39 ) implemented in Cisgenome, using 3000 iterations and the third-order Markov Chain model. The best AGGTCA-like motif was selected from 10 candidates with the highest scores.
Identification of HNF4-specific target genes
DNA sequences from −10 to + 10 kb of each gene in the human genome (hg19) were downloaded from the Ensembl genome browser ( http://uswest.ensembl.org/index.html ) using BioPerl API. The occurrences of HNF4-specific binding sequences, obtained from either the PBM data or support vector machine (SVM) prediction (see details in Supplementary Data ), were identified by Seqmap ( 40 ) using exact match. A total of 3000 random 13-mers were selected randomly from the human genome. The difference between two frequency distributions was calculated using the Student’s t -test in R. The online search engine for HNF4-specific binding sequence was developed using Perl and cgi, and is hosted by the Bioinformatics Core of Institute for Integrative Genome Biology at University of California at Riverside at http://nrmotif.ucr.edu/NRBSScan/H4SBM.htm . HNF4α ChIP-seq peaks from HepG2 cells ( 41 ) were examined and only those peaks containing at least one HNF4-specific binding site were selected as HNF4-specific peaks. If a gene had at least one HNF4-specific peak within 10 kb of its transcription start site (TSS), then this gene was considered as an HNF4-specific candidate target. Only down-regulated candidate genes in HNF4 RNAi assay (in HepG2 cells) ( 13 ) were chosen as final targets. Gene Ontology (GO) analysis was performed using DAVID ( 42 ).
PBMs reveal a polarity in the DR1 motif for RXRα, COUPTF2 and HNF4α
Custom PBMs were used to examine the DNA binding specificity of human HNF4α2 (referred to as HNF4α), RXRα and COUPTF2. They bound 1371, 1285 and 1530 unique DNA sequences, respectively, although there was considerable overlap, especially between RXRα and COUPTF2 ( Figure 1 A). Notably, while RXRα in the absence of a heterodimeric partner typically does not bind DNA in shift gels ( 22 , 23 , 27 ), it bound very well in the PBM in both the presence and the absence of its ligand 9- cis retinoid acid; control experiments verified that the amount of endogenous NRs such as RARα was too low to be detected in the PBMs (see Supplementary Figure S1 for details and Supplementary Table S4 for a list of all sequences). There are reports of RXRα activating transcription or binding DNA in the absence of added NR partners ( 19–23 , 43–45 ), and recent ChIP-seq experiments reveal considerable RXR binding in the absence of a PPAR partner ( 46 ), suggesting that RXR homodimers bind DNA in vivo as well as in vitro .
All bound sequences were ranked according to their normalized signal intensities (binding scores) and DNA motifs were generated for strong, medium and weak binders [binding affinity is linearly correlated with the binding signal from PBMs ( 47 )]. Motifs of the top 10% strongest binders showed that both the 5′ and 3′ half sites of the DR1 motif contain essential positions for binding, reinforcing the notion that these three NRs bind as homodimers on the PBM ( Figure 1 B). We also noted that the strong binders for RXRα and COUPTF2 had more conserved positions in the 3′ than in the 5′ half site, suggesting that the 3′ half site may be more critical for DNA binding. Since, the 3′ half site is farther from the glass slide in the PBM, it was possible that the observed polarity was an artifact of the PBM design. Therefore, we designed a second PBM in which 30 strong binders of COUPTF2 with AGGTCA half sites on the 3′ side were compared with 30 weak binders with AGGTCA on the 5′ side; a third group of probes had weak AGGTCA-like motifs in both half sites. The reverse compliment (RC) of each probe was also examined ( Supplementary Figure S2A ). If COUPTF2 DNA binding is affected by the distance from the slide, then the RC probes of corresponding strong binders will become weak binders, and vice versa ( Figure 1 C). We found that the strongest binders for both the original and the RC probes were the same sequences with a clear preference for AGGTCA in the 3′ half site, even when it was close to the slide; conversely, the weakest binders had the AGGTCA motif in the 5′ half site regardless of the distance to the slide ( Figure 1 D). Similar results were observed for RXRα and HNF4α ( Supplementary Figure S2B ). We also examined the effect of the distance of the entire 13-mer from the slide as well as the free end of the probe and found that the strongest binding by all three NRs resulted when the 13-mer is 27 nt from the slide and 5 nt from the free end of the probe ( Supplementary Figure S2 C and S2D). All together, these results suggest that in vitro RXRα, COUPTF2 and HNF4α prefer the 3′ half site of the DR1 motif.
PBMs identify an HNF4α-specific binding motif
The DNA binding specificity of HNF4α, RXRα and COUPTF2 was further analyzed using scatter plots ( Figure 2 A). RXRα and COUPTF2 shared very similar binding specificities with an R2 of 0.84 for all bound sequences. In contrast, HNF4α showed a different profile with >200 DNA sequences bound exclusively by HNF4α. The consensus motif of HNF4-specific binders had a subtle yet consistent change at position 10 (p10) and p11, where GT was replaced by TC. Together with seven other conserved positions, it defined the HNF4-specific binding motif (H4-SBM) xxxxCAAAG TC CA (x refers to any nucleotide; the nucleotides that differ from the canonical AGGTCA are underlined) ( Figure 2 B). Hypergeometric plots using all the binders indicate that TC at p10–p11 are disfavored by both RXRα and COUPTF2 ( Figure 2 C). Interestingly, the probes that were bound by RXRα and COUPTF2 but not HNF4α strongly resembled a canonical DR1 motif ( Figure 2 B).
We next determined that all 81 of the PBM probes bearing the H4-SBM were in fact HNF4-specific binders; they all bound HNF4α but none bound RXRα or COUTF2 ( Figure 2 D, left panel). There were two exceptions (AGGTCAAAG TC CA and GGGTCAAAG TC CA) that bound RXRα and COUPTF2 weakly; they are hybrids of the canonical DR1 and the HNF4-specific half site. In contrast, the probes containing AG TC CA only in the 5′ half site were not completely specific for HNF4α ( Figure 2 D, middle panel) while those with the canonical AGGTCA in the 3′ half site preferred COUPTF2 and RXRα ( Figure 2 D, right panel). Finally, a similar analysis of 16 H4-SBM-like motifs with all possible permutations at p10 and p11 confirmed that the H4-SBM xxxCAAAG TC CA is the only motif recognized by HNF4α specifically ( Supplementary Figure S3 ).
H4-SBM is bound by HNF4α but not RXRα in vivo
Since the DNA in the PBM does not necessarily have exactly the same conformation as DNA wrapped in nucleosomes, we determined whether the H4-SBM bound by HNF4α in vitro is also bound in vivo by analyzing 26 publicly available genome-wide occupancy profiles for 14 NRs ( Supplementary Table S1 ). Since data from the same tissue/cell type were not available for all the NRs and since we are concerned with binding motifs and not specific target genes, we combined and compared ChIP-seq data for a given NR across different conditions/cell types for greater statistical power. First, we compared HNF4α ChIP’ed from mouse liver ( 48 ) and RXRα ChIP’ed from mouse 3T3-L1 (pre-adipocyte) cells ( 49 ). In 12 097 peaks unique to HNF4α, the motif mined by Gibbs Motif Sampler ( 39 ) was found to be very similar to the H4-SBM, whereas the motif in 4285 peaks unique to RXRα was more similar to the canonical DR1 ( Figure 3 A). Interestingly, there were 736 ChIP peaks for HNF4α in liver and RXRα in pre-adipocytes that overlapped; the motif mined from those peaks had minor variations from both the H4-SBM and the canonical DR1 but was nearly identical for RXRα and HNF4α.
We next compared the enrichment of the 16 different half site sequences with all possible permutations at p10 and p11 in ChIP-seq peaks of RXRα and HNF4α. The H4-specific half site sequence AG TC CA had a 2-fold higher enrichment level compared to control regions ( P = 0.001) in the HNF4α peaks but not the RXRα peaks ( Figure 3 B). In contrast, the canonical AGGTCA half site showed high and comparable enrichment in both the HNF4α and RXRα peaks. These results suggest that the H4-specific half site is preferred by HNF4α but not RXRα in vivo .
Since ChIP-seq peaks typically cover a region much larger than the size of a TF binding site, it is frequently assumed that the binding site of a protein is close to the peak center (peak summit). Therefore, we compared the distance from the peak center to the canonical and HNF4-specific half site in the ChIP peaks. In both the HNF4α and RXRα peaks, the canonical half site AGGTCA was significantly enriched around the peak center compared to random sequences ( P < 0.001); adding 1 nt at a time to the 5′ side of the motif significantly increased the enrichment ( Figure 3 C, left panels). In contrast, when the HNF4-specific half site was analyzed, the enrichment was observed in HNF4α peaks but not the RXRα peaks (right panels), confirming that the H4-SBM (CAAAG TC CA) is bound in vivo by HNF4α but not RXRα.
Finally, in order to determine whether there was any evidence of half site polarity in vivo , the enrichment levels of 9-mer sequences with the canonical (AGGTCA) and the HNF4-specific (AG TC CA) half site on both the 5′ and 3′ side were compared in the HNF4α and RXRα ChIP-seq peaks. HNF4α had a clear preference for the 3′ side for both half site sequences ( Figure 3 D, left panel, blue versus red bars). While a similar result was observed for the canonical half site in the RXRα peaks, the HNF4-specific motif was not enriched in either position for RXRα (right panel). These results further confirm that AG TC CA is not bound by RXRα in vivo and that the polarity for the half site preference observed in the PBM for RXRα and HNF4α is also observed in vivo .
H4-SBM is exclusive to HNF4α in vivo
To determine whether the H4-SBM is recognized by any other NR aside from HNF4α, we analyzed ChIP-seq data from 12 additional NRs that are known to be expressed in one or more of the same tissues as HNF4α—PPARγ (NR1C3) ( 46 , 49 , 50 ), PPARδ (NR1C2) ( 51 ), VDR (NR1I1) ( 52 ), FXRα (NR1H4) ( 53 , 54 ), PXR (NR1I2) ( 55 ), ERRβ (NR3B2) ( 56 ), RARα (NR1B1) ( 57 ), ERα (NR3A1) ( 58 ), LXRβ (NR1H2) ( 59 ), LRH-1 (NR5A2) ( 60 ), Rev-Erbα (NR1D1) ( 61 ). The glucocorticoid receptor (GR, NR3C1) ( 46 , 62 , 63 ), which prefers a different half site AGAACA ( 64 , 65 ), served as a negative control ( Supplementary Figure S4A ). Motifs of consensus sequences derived from the 12 NR ChIP-seq peaks did not reveal any binding to the H4-SBM ( Supplementary Figure S4B ). When 6-mer half site sequences were examined, the H4-specific half site (AG TC CA) was found to be enriched in PXR peaks but not in peaks for the remaining 11 NRs ( Figure 4 A and Supplementary Figure S5 ). In contrast, there was an enrichment of the canonical half site AGGTCA in 8 of the 12 NRs ( Supplementary Figure S5 ), which served as a positive control. We also examined the half site sequences as a function of distance from the peak center. There was an enrichment in AGGTCA at the peak center for all tested NRs except GR. The HNF4-specific half site, however, was only enriched in peak centers for HNF4α and three other NRs: Rev-Erbα, FXRα and PXR ( Supplementary Figure S6 ).
Since searching with the 6-mer (AG TC CA) does not provide any information about its position within a 13-nt motif (i.e. 5′ or 3′ side), the search was repeated with the 9-mer H4-SBM (CAAAG TC CA). We found a 13.5-fold enrichment in HNF4α peaks and a considerable enrichment (>3-fold) in FXRα, PXR and Rev-Erbα peaks; the remaining NRs continued to show no enrichment ( Figure 4 B). Furthermore, the ChIP-seq peaks for FXRα, PXR and Rev-Erbα ChIP-seq showed considerable overlap with HNF4α peaks ( Figure 4 C and Supplementary Table S5 ). Since ChIP cannot distinguish between direct and indirect binding, it was possible that FXRα, PXR and Rev-Erbα were binding to the H4-SBM through HNF4α. To address this, we compared the enrichment level of the HNF4-specific half site (AG TC CA) and the full H4-SBM (CAAAG TC CA) in the peaks of each of the three NRs that overlapped with the HNF4α peaks to those that did not overlap. If a particular sequence is a direct binding target of a NR, then similar enrichment levels are expected in both overlapping (sector C) and non-overlapping peaks (sector A) ( Figure 4 D, left). On the other hand, if the NR requires HNF4α to bind the sequence, then the overlapping peaks should have a higher enrichment level than the non-overlapping peak ( Figure 4 D, right). The results show that both AG TC CA and CAAAG TC CA are enriched in the overlapping peaks but not the non-overlapping peaks for all three NRs ( Figure 4 E). The fold enrichment for H4-SBM was in fact 3- to 6-fold higher in the overlapping peaks compared to the non-overlapping peaks. Furthermore, the HNF4-specific half site was the only 1 of 16 candidate 6-mers that showed such an enrichment for these NRs ( Supplementary Figure S7C ). A similar analysis with the peaks unique to HNF4α (sector B) verified the methodology ( Supplementary Figures S7B and S7D ). These results suggest that while PXR, FXRα and Rev-Erbα are localized to regions in the genome that contain the HNF4-specific motif, they may do so only when HNF4α is also present in the same region. While we cannot definitively rule out direct interactions between these NRs and HNF4α, AGGTCA-like motifs were found in about half of the overlapping peaks that contain the H4-SBM ( Supplementary Figure S7E–S7H ). This suggests that PXR, FXRα and Rev-Erbα may bind canonical AGGTCA-containing motifs cooperatively with HNF4α in regions that contain the H4-SBM. For example, on the Cyp7a1 promoter there are overlapping ChIP-seq peaks for HNF4α, PXR and Rev-Erbα but the only H4-SBM is at the center of the HNF4α peak ( Supplementary Figure S7I ).
HNF4-specific DNA recognition is mediated by two residues in the DBD, Asp69 and Arg76
To determine the molecular basis of HNF4-specific DNA recognition, protein–DNA interactions were analyzed in the DBD structures of HNF4α and RXRα ( 66 , 67 ). Among all the residues making contacts with the DNA, only four residues differ between RXRα and HNF4α. In HNF4α, two of these residues, Asp69 and Arg76, lie in the first DNA recognition helix of the DBD and interact with side chains of nucleotides at p3 and p4 in both half sites of the DR1 motif ( Figure 5 A). These two residues are completely conserved in HNF4 genes across all species from human down to Trichoplax, except for Caenorhabditis elegans which contains ∼260 HNF4-like genes ( Supplementary Figure S8A ). Residues at equivalent positions in RXRα and COUPTF2 are Glu and Lys, respectively ( Figure 5 B).
To examine the effect of Asp69 and Arg76 on HNF4α binding specificity, single and double point mutations D69E, R76K and D69E/R76K were introduced into the HNF4α DBD to convert it into an RXR/COUPTF2-like DBD. In the PBM, the D69E mutant selectively abolished binding to the H4-SBM while R76K seemed to decrease binding in a non-selective fashion; interestingly, the double mutant D69E/R76K yielded a profile nearly identical to that of D69E ( Supplementary Figure S8B ), suggesting that both Asp69 and Arg76 are necessary for optimal discrimination between H4-SBM and other sites. Importantly, the HNF4α D69E/R76K double mutant also altered the binding profile to more closely resemble that of RXRα/COUPTF2; none of the HNF4-specific binders were bound by the double mutant ( Figure 5 C and D). While some common binders of HNF4α and RXRα/COUPTF2 were also affected by the mutations ( Supplementary Figure S8C and S8D ), most of them bear the AG TT CA motif in the 3′ half site that is also preferred by HNF4α but not RXRα in vivo ( Figure 3 B). Finally, HNF4α and HNF4γ are the only human NRs with an aspartic acid at residue 69 and an arginine at residue 76 [ Figure 5 B and ( 7 )], suggesting that the H4-SBM may be truly specific to HNF4 in the entire NR superfamily.
HNF4α activates gene expression using HNF4-specific binding sites
To determine whether the HNF4α binding site preference results in a functional outcome (i.e. gene expression), luciferase reporter assays were performed using a known HNF4α/RXRα response element from the human APOA1 promoter. As predicted, HNF4α activated gene expression using both the wild-type response element (WT-RE) AGGGCAgGGGTCA and an HNF4-specific mutant (MUT-RE) AG TC CAgGG TC CA; in contrast RXRα activated expression only from the WT-RE ( Figure 6 A). In addition, COUPTF2, a repressor, competed with HNF4α on the WT-RE but not the MUT-RE ( Figure 6 B). These results verify that HNF4α is capable of activating gene expression using the H4-SBM, while RXRα and COUPTF2 cannot functionally compete with HNF4α for the H4-SBM response element.
The effect of the D69E mutation was also examined in the luciferase assay using an endogenous promoter from a known HNF4α target gene, APOC2 ( 31 ). WT HNF4α successfully activated the expression of the WT promoter containing an AGGCCAaAG TC CT motif whereas the D69E mutant failed to do so ( Figure 6 C). In contrast, both the WT and D69E mutant activated gene expression when the H4-specific response element was mutated to a more canonical DR1 element AGGCCAaAG GT CT ( Figure 6 C, reverse complement sequences are shown in the figure to maintain the orientation in the genome). This result confirms that Asp69 is responsible for the HNF4-specific DNA binding.
Functionality of HNF4-specific binding sequences in vivo
In order to determine whether the H4-SBM is functional under physiological conditions, we trained an SVM model on ∼200 verified H4-SBM sites from the PBM to predict all 13-nt sequences that can be bound specifically by HNF4α (variations in the less conserved positions of the consensus H4-SBM, CAAAG TC CA, such as p6-p8 and p13 can be tolerated, Figure 1 B). This method generated ∼3000 high confidence, predicted HNF4-specific binding sequences. (See Supplementary Table S3 for a list of all H4-SBM sequences and http://nrmotif.ucr.edu/NRBSScan/H4SBM.htm for a web-based, HNF4-specific binding sequence search tool.)
The predicted H4-SBM sites were first validated by comparing their position in ChIP-seq peaks to the verified H4-SBM sites and the AG TC CA half site ( Figure 7 A). The predicted and verified H4-SBM sites were then examined across all protein-coding genes in the human genome relative to the TSS (+1), −10 to +10 kb. (HNF4 binding sites are symmetrically distributed on both side of TSS, Supplementary Figure S9 ). The results show comparable occurrence frequencies for the predicted and verified sites ( P > 0.05) that were significantly higher than those of random sequences ( P < 0.001) ( Figure 7 B). Finally, we used the predicted H4-SBM sites to search HNF4α ChIP-seq and expression profiling data to identify potential HNF4α target genes. A total of 730 genes in the human liver cell line HepG2 were found to have at least one predicted H4-SBM site in a HNF4α ChIP-seq peak within −10 to + 10 kb of +1 ( Figure 7 C). Among these genes, 137 were down-regulated by HNF4α RNA interference (RNAi) in HepG2 cells ( 13 ), suggesting that these genes are direct targets of HNF4α. GO analysis showed that the H4-specific genes were enriched in a variety of metabolic processes (e.g. lipid, carbohydrate, xenobiotic/drug metabolic processes; homeostasis; transport), typical of classical HNF4α targets ( Supplementary Table S3 ). There were also genes in some of the new categories of HNF4α targets that we identified in our previous study ( 13 ), such as immune response, signal transduction, apoptosis and cell structure ( Figure 7 D). That study, however, did not identify HNF4-specific motifs or targets. All told, this analysis identified ∼100 new predicted, direct HNF4α target genes that have an H4-SBM in a ChIP peak and are down regulated by an HNF4α RNAi.
Interestingly, several genes associated with acyl Co-enzyme A metabolism (acyl-CoA synthases, thioesterases, ligases and a co-factor for a desaturase) were identified as putative HNF4α targets with an H4-SBM site ( ACSM2B , EHHADH , ACOT2 , ACSF2 , SLC27A2 , CYB5A , AGXT ). Others have shown that HNF4α binds not only acyl Co-A binding protein but also fatty acyl thioesters of Co-enzyme A that appear to act as modulators of HNF4α function ( 68–70 ). Acyl Co-A is well known for donating an acetate group in acetylation reactions of lipids and proteins. HNF4α itself has been shown to be acetylated in its hinge region ( 71 ), as have many of the enzymes encoded by HNF4α target genes—e.g. PEPCK ( PCK1 ), a classical HNF4α target ( 72 ) that catalyzes the rate limiting step in gluconeogenesis, and the new H4-SBM target reported here, EHHADH , which encodes enoyl–coenzyme A hydratase/3-hydroxyacyl–coenzyme A that catalyzes two steps in fatty acid oxidation ( 73 ). While it is not only possible but very likely that these H4-SBM genes are also regulated by other TFs, including other NRs via other sites in the promoters, these results nonetheless reinforce the notion that there is an important relationship between HNF4α and processes involving Co-enzyme A. Finally, we note that several of the newly identified H4-SBM containing HNF4α target genes are linked to human diseases ( LIPA , EHHADH , AGXT , PIPOX , HGD , CYB5A , PDZK1 , F11 ), expanding the clinical relevance of HNF4α. (See Supplementary Table S3 for a complete list of H4-SBM target genes from both a −10 to + 10 kb and a −2 to +1 kb analysis.)
The dogma in the NR field is that NRs recognize DNA targets based on one of two motif modules, AGGTCA (non-steroid receptors plus ER) and AGAACA (all other steroid receptors) ( Figure 8 A). While there are additional rules for spacing and orientation [as well as some variations on the half site motif, such as AGTTCA ( 7 )] that distinguish the different receptors, those rules do not adequately explain the diversity of NR function in vivo . We addressed this issue by comparing the binding specificity of three highly related NRs—HNF4α, RXRα and COUPTF2—on ∼3000 different variations of a canonical DR1 (AGGTCAxAGGTCA) using PBMs and full length receptors in crude nuclear extracts. We identify a new motif module for HNF4α, AG TC CA and provide additional insight into NR DNA binding ( Figure 8 A–D).
NR DNA binding polarity
In contrast to most gel shift results, including our own ( 27 ), we observed excellent binding of RXRα in the PBM in the absence of an ectopically expressed partner. The RXRα binding was nearly identical to that of COUPTF2, and both receptors exhibited a polarity in binding with a preference for the 3′ half site of the DR1 ( Figure 8 B and C). To our knowledge, this is the first report to examine polarity in DNA binding using PBMs, and the first report of polarity for NR homodimers. Since RXRα homodimers can activate transcription in the presence of the 9- cis retinoic acid and since COUPTF2 generally represses transcription, this suggests that the competition already observed between these two NRs on a few genes may in fact be a much broader phenomenon.
While binding polarities of NR homodimers have not been reported previously, the concept in heterodimers is not new. For example, RXR binds the 5′ half site in DR3, DR4 and DR5 motifs as heterodimers with VDR, TR and RAR, respectively ( 74–76 ). This polarity was also noted in crystal and solution structures of RXR heterodimers with RAR and VDR ( 77 , 78 ). However, on DR1 motifs the polarity of the RXR heterodimers is reversed with RXR in heterodimers with RAR and PPAR occupying the 3′ half site in DR1 motifs ( 78–82 ). Since our results now show that the RXRα homodimer also prefers the 3′ half site in DR1-like motifs, this suggests that there could be an exchange between the 5′ RXRα monomer in the homodimer and PPAR or RAR monomers ( Figure 8 C). The net result would be a replacement of the RXRα homodimer with a heterodimeric complex, perhaps while the 3′ RXRα monomer remains bound to the DNA. This would suggest a new paradigm for RXRα dimer exchange and a potential new role for RXRα homodimers as placeholders for RXRα heterodimers on DR1-like motifs, but must be experimentally proven. A thorough analysis of RXR heterodimer DNA binding specificity also needs to be done.
An NR-specific DNA binding motif
Comparison of the HNF4α PBM results to that of RXRα and COUPTF2 allowed us to identify in vitro an HNF4-specific binding motif (xxxxCAAAG TC CA) that we had not identified previously when we analyzed HNF4α alone ( 13 ). In vivo analysis showed that ∼8 and 42% of HNF4α ChIP-seq peaks contain CAAAG TC CA and AG TC CA, respectively (motif variations not considered). Comparable results were observed for CAAAGGTCA and AGGTCA ( Supplementary Figure S10 ), suggesting a similar importance of AGGTCA- and AG TC CA-based motifs for HNF4α binding. The H4-SBM was not bound by RXRα or COUPTF2 in the PBM, nor eight other NRs in ChIP-seq data ( Figure 8 B). However, three NRs (FXRα, PXR and Rev-Erbα) were associated with the H4-SBM in vivo , most likely via their own binding sites, not the H4-SBM ( Figure 8 D). While there are a few reports of these NRs regulating the same target genes as HNF4α via their own binding sites ( 83–87 ), our results suggest that this may be a much broader phenomenon. The considerable overlap between PXR, FXRα, Rev-Erbα and HNF4α peaks in ChIP-seq data also demonstrates the complexity of NR-mediated regulation and the difficulty of precisely identifying binding sites for a given TF in ChIP-seq peaks. This complexity is increased by protein–protein interactions between HNF4α and other NRs such as PXR and FXR and competition for co-regulators ( 12 ). Our analysis shows how this problem can be overcome at least in part with a better understanding of binding specificities generated by the PBMs.
While this is the first identification and genome-wide analysis of an HNF4-specific binding motif, there were previous reports using classical methods suggesting that such a motif might exist. A survey of the literature identified 28 response elements that had been examined for responsiveness to HNF4α, RXRα and/or COUPTF2. Six of those elements carry the H4-SBM TC instead of the DR1-like GT at p10 and p11, and all six were responsive to HNF4α but not RXRα or COUPTF2. In contrast, the remaining 22 response elements did not have a TC at p10-11 and were responsive to RXRα or COUPTF2 ( Supplementary Table S2 ). However, since these findings were generated by different groups over a period of years, an HNF4-specific motif was never identified.
In addition to the H4-SBM, we also identified a pair of residues in the HNF4 DBD that is responsible for the H4-SBM binding (Asp69 and Arg76) ( Figure 8 A). Asp69, which has the greatest effect on HNF4-specific binding, is in the P box which was shown previously to be responsible for the different half site of GR and related receptors (AG AA CA) ( 5 , 88 ). Arg 76 is in the helix that contacts the DNA but to our knowledge has not been previously associated with DNA binding specificity. HNF4 is the only human NR with the combination of Asp69 and Arg76 and all HNF4 DBDs, except some of the HNF4-like genes in C. elegans , have these residues, suggesting that the H4-SBM may be truly unique to HNF4. There is only one other NR that has an Asp at position 69, TLX (NR2E1), but it also has a Lys at position 76 instead of an Arg ( 7 ). Since the R76K mutant of HNF4 lost most of its DNA binding activity, we assume that TLX would not be able to recapitulate the HNF4-specific binding, although that remains to be determined.
Finally, it is of interest to note that HNF4, RXR and COUPTF are among the most ancient of all the NRs ( 9 ). Since the biologically least complex metazoans currently in existence all have at least two NR genes (one HNF4-like and one RXR/COUPTF-like), it has been proposed that the NR family evolved from a now extinct early metazoan that contained a single NR gene, which was most similar to HNF4 ( 89 ). Intriguingly, the HNF4 DBDs of primitive metazoans all have the Asp69-Arg76 pair of mammalian HNF4.
In conclusion, our results highlight a complexity of NR DNA binding specificity that was previously under appreciated. They also demonstrate the usefulness of the PBM approach to more accurately define that complexity and thereby more precisely identify NR target genes in vivo . It will be of interest to compare the binding specificity of other NRs in a similar fashion.
Supplementary Data are available at NAR Online: Supplementary Tables 1–5, Supplementary Figures 1–10, Supplementary Materials and Methods, and Supplementary References [ 16 , 27 , 30 , 32 , 39 , 41 , 48 , 53 , 54 , 66 , 67 , 90–91 ].
National Institutes of Health (MH087397 to F.M.S. and T.J.) and PhRMA Foundation (to E.B.). Funding for open access charge: National Institutes of Health (MH087397 to F.M.S. and T.J.).
Conflict of interest statement . None declared.
We thank Dr T. Girke (UC Riverside) for bioinformatics input, M. Tsai (Baylor College of Medicine) for Flag.COUPTF2 and J.M. Kurie (M.D. Anderson Cancer Center) for pCDNA6-His-hRARalpha.