Androgen and glucocorticoid receptor direct distinct transcriptional programs by receptor-specific and shared DNA binding sites

Abstract The glucocorticoid (GR) and androgen (AR) receptors execute unique functions in vivo, yet have nearly identical DNA binding specificities. To identify mechanisms that facilitate functional diversification among these transcription factor paralogs, we studied them in an equivalent cellular context. Analysis of chromatin and sequence suggest that divergent binding, and corresponding gene regulation, are driven by different abilities of AR and GR to interact with relatively inaccessible chromatin. Divergent genomic binding patterns can also be the result of subtle differences in DNA binding preference between AR and GR. Furthermore, the sequence composition of large regions (>10 kb) surrounding selectively occupied binding sites differs significantly, indicating a role for the sequence environment in guiding AR and GR to distinct binding sites. The comparison of binding sites that are shared shows that the specificity paradox can also be resolved by differences in the events that occur downstream of receptor binding. Specifically, shared binding sites display receptor-specific enhancer activity, cofactor recruitment and changes in histone modifications. Genomic deletion of shared binding sites demonstrates their contribution to directing receptor-specific gene regulation. Together, these data suggest that differences in genomic occupancy as well as divergence in the events that occur downstream of receptor binding direct functional diversification among transcription factor paralogs.


INTRODUCTION
The interplay between transcription factors (TFs), genomic DNA binding sites and the chromatin context in which recognition sequences are embedded plays a pivotal role in specifying where, when, and at which level genes are expressed. Differences in the DNA binding specificity among TFs can guide them to distinct genomic loci and allow TFs to carry out unique functions by regulating unique sets of target genes (1). However, even related TFs with very similar DNA-binding domains--and consequently overlapping DNA sequence preferences--perform non-redundant functions (2). Mechanistically, unique functions can derive from cell type-specific expression of related TFs (3). However, functional diversification is also observed for paralogs that are expressed in the same cell type yet direct divergent genome-wide occupancy and gene regulation (4,5). One explanation for this is that subtle differences in the intrinsic DNA binding specificity among paralogs in vitro contributes to their differential binding in vivo. This was shown for paralogous TFs from different families with divergent binding mainly occurring at medium-and low-affinity binding sites (6,7). The differences in sequence preference can occur within the core binding site but also in the regions flanking the motif (8,9). Another explanation for functional diversification among paralogs was offered in a study of HOX proteins which showed that paralogs acquire novel and distinct DNA preferences when they pair with another TF (10). Of note, differential genome-wide occupancy can also be a consequence of different abilities to interact with relatively inaccessible chromatin as shown for Hox paralogs and for members of the Pou famility of TFs (11)(12)(13)(14). Finally, specificity could also be derived from events that occur downstream of binding when protein sequence differences between paralogs influence their activity. In this scenario, genomic binding sites that are shared, would selectively allow one of the TF paralogs to regulate the expression of target genes. Mechanistically, TF-specific activity from shared binding sites can be due to selective recruitment of cofactors, either coactivators or corepressors, that modulate transcriptional output (15). However, the degree to which shared binding sites actually contribute to directing TF-specific gene regulation and the underlying mechanisms remain largely unexplored.
Steroid receptors are a family of ligand-dependent TFs and provide an attractive model to study functional diversification among paralogs given that their activity can be switched on or off by the addition or removal of their ligand. This on/off switch facilitates the relatively straightforward identification of changes in the cell, e.g. in gene expression, genome-wide binding and the chromatin landscape, that occur upon their activation. Two members of the steroid receptor family are the androgen receptor (AR) and the glucocorticoid receptor (GR). Despite their nearly identical DNA binding interface, the physiological roles of the hormones that activate AR (androgen) and GR (glucocorticoids), are different (16). Androgens are male sex steroids and are, among other things, involved in development and the maintenance of reproductive organs. Glucocorticoids were named after their role in glucose metabolism as they promote gluconeogenesis in the liver and inhibit glucose uptake by skeletal muscle by antagonizing insulin but serve many additional functions. The different functions of AR and GR also translate into differences in their therapeutic use. Glucocorticoids are widely used to treat chronic inflammatory conditions with long-term use associated with muscle and bone wasting as side-effects (17). In contrast, androgens have anabolic properties and can be used to treat glucocorticoid-induced osteoporosis showing that AR and GR can have antagonizing effects (18). However, in certain castration-resistant prostate cancers, GR appears capable of taking over the role of AR as a driver of cancer progression, indicating the function of both receptors can also overlap depending on the context (19).
In line with both shared and divergent functions of AR and GR, their genome-wide binding patterns partially overlap in prostate cancel cells (4,19). Accordingly, the genes regulated by AR and GR show some overlap but also diverge with each receptor also regulating a unique set of genes (4,19). Shared genomic binding sites are expected given that AR and GR have nearly identical DNA binding interfaces and consensus sequence recognition motifs (7). The mechanisms that direct differential binding and gene regulation by AR and GR are less clear. In vitro studies indicate that the intrinsic DNA binding preference varies between AR and GR with differences both within the core motif and in the sequences directly flanking it (7). In addition, AR is able to selectively bind response elements that diverge from the consensus motif which consists of inverted hexameric half-sites separated by a 3 bp spacer. Selective AR binding sequences have a more degenerate second halfsite, which often resembles a direct rather than an inverted repeat (16). Together, these studies demonstrate that divergent DNA-binding preferences contribute to the differential binding observed in vivo, however this does not explain all differential binding observed in vivo (4,7). Moreover, selective binding does not explain how genes with nearby binding sites that are occupied by both AR and GR can be regulated in a receptor-specific manner (19).
Here, we set out to study the mechanisms that endow AR and GR with unique functions by examining human osteosarcoma cell lines that express either GR or AR. Specifically, we studied the role of chromatin and sequence in directing AR and GR to distinct genomic loci. Our results indicate that differences in binding can be explained by distinct DNA sequence preferences both within the motif and in the sequence composition of regions of several kb that surround sites that are selectively occupied by one of the receptors. In addition, we find indications that divergent binding is generated by receptor-specific abilities to interact with 'inaccessible' chromatin. Finally, we compared receptor-induced changes that occur downstream of binding and find that binding sites that are shared by both receptors can nevertheless direct receptor-specific changes in histone modifications, enhancer activity and gene regulation. Together, our results indicate that divergence in gene regulation by AR and GR is driven by both difference in genomic binding and in the events that occur downstream of binding.

Experimental
Plasmids. The AR expression construct was modified from the pGFP-AR plasmid (20) by replacing a NheI-BspTI fragment containing EGFP and 510 bp of human AR and replacing it with a PCR-amplified fragment with primer-introduced ATG, NheI and BspTI sites. Individual STARR reporter constructs were generated by digesting the human STARR-seq vector (21) with SalI and AgeI and subsequent insertion of fragments of interest by In-Fusion HD cloning (TaKaRa). Fragments of interest: positive control region (near IP6K3 gene, hg19: chr6:33 698 504-33 698 853), AQP3 enhancer (hg19: chr9:33 437 258-33 437 811) and the AQP3 enhancer with AR binding site mutations (AQP3-Deleted and AQP3-AGA → TGT) were ordered as a gBlock (IDT) or GeneStrand (Eurofins) (see Supplementary Table S1 for the sequence of regions).
Cell lines, transient transfections. U2OS cells stably transfected with rat GR␣ (U2OS-GR) (24) were grown in DMEM supplemented with 5% FBS. U2OS cells stably transfected with human AR (U2OS-AR) were generated as follows: U2OS cells (ATCC HTB-96) were transfected with 30 ng of the AR expression construct using Lipofectamine and Plus Reagent (Invitrogen). The next day, ∼10 000 cells were transferred to a 15 cm dish and single-cell derived clonal lines stably transfected with the AR expression construct were selected using G418 (800 g/ml). For both cell lines the site of integration is unknown and expression of both AR and GR is driven by the CMV promoter.
Transient transfections to test of gene regulation patterns observed in the clonal U2OS-AR and U2OS-GR cell lines could be recapitulated were done as follows: On day 1, 400 000 U2OS cells were transfected with 400ng or 800 ng of either empty pcDNA3, the AR-expression construct or pcDNA3-rat GR␣ using the 4D-Nucleofector (Lonza) in 20 l nucleocuvette strips using program CM-104. Cells were transferred to two wells of a 24-well plate, incubated overnight before adding either 100 nM Dex or 0.1% ethanol as vehicle control (for GR) or either 5 nM R1881 or 0.1% dmso as vehicle control (for AR). After 24 h, mRNA was isolated using the RNeasy kit from Qiagen, reverse transcribed using the PrimeScript One Step Kit (Takara) and analyzed by qPCR using primers as listed in Supplementary Table S3.
To test the activity of individual STARR reporters, 1 million U2OS cells were transfected using the Nucleofector 2b with 2 g Plasmid DNA using kit V (Lonza) according the manufacturer's instructions.

RNA-seq.
For U2OS-GR, the RNA-seq data was from previous studies (26,27), ArrayExpress accession number E-MTAB-6738. U2OS-GR cells were treated for 4h with either 1M Dex or 0.1% ethanol as vehicle control. For U2OS-AR, cells were treated for 24 h with either 5 nM R1881 or 0.1% dmso as vehicle control. mRNA was isolated from 1.2 million cells using the RNeasy kit from Qiagen and oligo d(T) beads. Sequencing Libraries were prepared for paired end Illumina sequencing using the TruSeq RNA library Prep Kit (Illumina). ArrayExpress accession number: E-MTAB-9622.
RNA isolation, cDNA synthesis and qPCR analysis. RNA was isolated using a RNeasy Mini Kit (Qiagen). For the analysis of cells transfected with STARR reporters, total RNA was reverse-transcribed with the PrimeScript One Step Kit (Takara) using gene-specific primers for GFP (CAAACTCATCAATGTATCTTATCATG) and RPL19 (GAGGCCAGTATGTACAGACAAAGTGG) which was used for data normalization. qPCR and data analysis were done as described (28). For all other experiments, cDNA synthesis was performed with Random Primer and Oligo dT primer provided by the PrimeScript One Step Kit. Primers for qPCR are listed in Supplementary Table S3.
ChIP assays for subsequent analysis by qPCR were done as described (28) with the treatment times and hormone concentrations as described above. For Med1 and EP300 ChIPs, cells were cross-linked with 1% formaldehyde for 10 min instead of 3 min as used for all other targets. Hormone treatment was 4 h for both AR and GR. Primers for qPCR are listed in Supplementary Table S3. ATAC-seq and ATAC-qPCR. ATAC-seq data for GR (1 M Dex, 1.5 h or 0.1% EtOH as vehicle control) is from a previous study (30), ArrayExpress accession number E-MTAB-7746. For AR (5 nM R1881 or 0.1% dmso as a vehicle control, 4 h), ATAC-seq was done as described (30). ArrayExpress accession number for ATAC-seq data: E-MTAB-9606.
For the analysis of ATAC experiments by qPCR, DNA fragments were amplified using p5 and p7 primers, twosided size selected using AMPure XP beads (Beckman Coulter) and analyzed by qPCR using the primers listed in Supplementary Table S3.

FAIRE-STARR-seq
Library construction. Accessible chromatin regions were isolated from U2OS-GR cells treated for 1.5 h with 1 M Dex, using the FAIRE method (31). In short, cells were fixed with 1% formaldehyde, DNA was isolated from the aqueous phase, cross-linking was reversed, and the DNA was purified. The resulting library of DNA fragments was ligated to Illumina adapters (NEB #E7335) using the NEBNext Ultra DNA Library kit (NEB #E7370) according to manufacturer's instructions, except that the final PCR amplification step was omitted. Instead, six PCR reactions with 2 l adapter-ligated DNA were preformed using NEBNext Q5 Hot Start HiFi PCR Master Mix and primers with 15nt extension for subsequent In-Fusion cloning (fw: TAGAGCATGCACCGGACA CTCTTTCCCTACACGACGCTCTTCCGATCT & rev: GGCCGAATTCGTCGAGTGACTGGAGTTCAGAC GTGTGCTCTTCCGATCT) (21). The resulting DNA fragments were cloned into the linearized STARR-seq vector (Addgene #71509, AgeI-SalI digested) using the In-Fusion HD kit (Takara), followed by transformations into MegaX DH10B T1R electrocompetent cells (ThermoFisher Scientific). A total of 20 In-Fusion HD reactions and transformations were performed, pooled and the plasmid library was extracted using a Plasmid Plus Mega Kit (Qiagen). Transfection of U2OS-AR or U2OS-GR cells. Five million cells were transfected with 5 g of DNA FAIRE-STARR library plasmid using the Amaxa Nucleofector kit V (Lonza). For each condition, four transfections were pooled and seeded into two 15 cm dishes with medium containing PKR (C16, Sigma; cat# I9785-5MG) and TBK1/IKK inhibitors (BX-795, Sigma; cat# SML0694-5MG) at a final concentration of 0.5 M per inhibitor to suppress the type I interferon response (32). U2OS-GR cells were treated with 1 M Dex or 0.1% EtOH as a vehicle control. U2OS-AR cells were treated with 5 nM R1881 or 0.1% DMSO as vehicle control. After 14 h, cells were harvested by trypsinization, snap frozen and stored at -80 • C.
FAIRE-STARR-seq library preparation. RNA isolation, reverse transcription and amplification of cDNA for subsequent Illumina 50 bp paired-end sequencing were essentially done as described (21) except that a modified primer (5 -CAAGCAGAAGACGGCATACGAG ATnnnnnnnnGTGACTGGAGTTCAGACGTGTGCT CTTCCGATCT -3 ) was used during the reverse transcription step to introduce unique molecular identifiers (UMIs). 50 ng cDNA/reaction was used as a template in every 50 l PCR reaction using Kapa Hifi hotstart ready mix (Roche). PCR products were purified with AMPure XP beads (Beckman Coulter; beads/reaction ratio = 1) and submitted for paired-end Illumina sequencing. ArrayExpress accession number for STARR-seq data: E-MTAB-9614.

Genome editing using CRISPR/Cas9
Clonal lines with CRISPR/Cas9-deleted GR-bound regulatory regions near the GILZ gene (GILZ ΔGBS1-4, ΔGBS A-G) were from a previous study (27). To generate single-cell-derived clonal lines lacking an AR-bound region upstream of the AQP3 gene, 1 million U2OS-AR cells were transfected with 1.2 g each of the PX459-AQP3 214 and PX459-AQP3 216 plasmids using the Amaxa V Kit (Lonza). Cells were plated and the next day we selected for transfected cells by refeeding cells with medium containing puromycin (10 g/ml). Twenty-four hours later, surviving cells were trypsinized and seeded to isolate singlecell-derived clonal lines. To genotype single cell-derived clonal lines, genomic DNA was isolated using the Blood and Tissue kit (Qiagen), the targeted region was amplified by PCR using primers spanning the target region (Supplementary Table S4) and deletion of the region was analyzed by agarose gel electrophoresis and Sanger sequencing. The residual presence of wt alleles was analyzed using primers spanning the breakpoints at both ends of the targeted locus (Supplementary Table S4).

RIME
After hormone deprivation, the U2OS-GR and U2OS-AR cell lines were treated with either 1 M Dex or 5 nM R1881 for 4h, respectively. Next, RIME experiments were performed and analyzed as previously described (33). The following antibodies were used: anti-GR (12041, Cell Signalling Technology), anti-AR (06-680, Merck), or antirabbit IgG (sc-2027, Santa Cruz Biotechnology) as control. Tryptic digestion of bead-bound proteins was performed as described previously (34). LC-MS/MS analysis of the tryptic digests was performed on an Orbitrap Fusion Tribrid mass spectrometer equipped with a Proxeon nLC1000 system (Thermo Scientific) using the same settings, with the exception that the samples were eluted from the analytical column in a 90-min linear gradient. Raw data were analyzed by Proteome Discoverer (PD) (version 2.3.0.523, Thermo Scientific) using standard settings. MS/MS data were searched against the Swissprot database (released 2018 06) using Mascot (version 2.6.1, Matrix Science, UK) with Homo sapiens as taxonomy filter (20,381 entries). The maximum allowed precursor mass tolerance was 50 ppm and 0,6 Da for fragment ion masses. Trypsin was chosen as cleavage specificity allowing two missed cleavages. Carbamidomethylation (C) was set as a fixed modification, while oxidation (M) and deamidation (NQ) were used as variable modifications. False discovery rates for peptide and protein identification were set to 1% and as additional filter Mascot peptide ion score >20 or Sequest HT XCorr >1 was set. The PD output file containing the abundances was loaded into Perseus (version 1.6.1.3) LFQ intensities were log 2 -transformed and the proteins were filtered for at least 66% valid values. Missing values were replaced by imputation based on the standard settings of Perseus, i.e. a normal distribution using a width of 0.3 and a downshift of 1.8. Differentially enriched proteins were called using a t-test. Data corresponding to RIME experiments was plotted using ggplot2 package in R. The geneset enrichment analysis was performed using GSEA-R (https://github.com/GSEA-MSigDB/GSEA R) (35,36).

Overlap gene regulation by AR and GR (Venn diagram).
Differentially expressed genes (DEGs) between the hormone-treated samples and vehicle condition samples were identified both for AR (24 h post hormone treatment versus control) and GR (4 h post hormone treatment versus control) using the LfcShrink function in DESeq2 R package (version 1.24.0, (38)). Genes with an adjusted P-value < 0.05 and log 2 (fold change) >1.5 were designated as significant upregulated genes. As a result, 777 and 364 upregulated genes (187 shared) were identified for GR and AR respectively.
Shared and receptor-specific peaks were obtained with BEDtools intersect. GR-specific peaks at regions of low accessibility were obtained by sorting the ATAC-seq signal (vehicle control treatment) of U2OS-GR cells in descending order at all GR-specific peaks (±250 bp around the peak center) using computeMatrix from deepTools and subsequently extracting the bottom 17 125. Peak calling was performed on ATAC-seq data (vehicle control treatment) in U2OS-AR cells and on ATAC-seq data (vehicle control treatment) in U2OS-GR cells using MACS2 v2.1.2 (42) (options: -broad -broad-cutoff 0.05) and resulting peaks were removed from the GR-specific peaks with low accessibility, to ensure the list represented inaccessible sites.

Heatmaps and mean signal plots at AR and GR peaks.
Heatmaps and their respective mean signal plots (±2 kb around the peak center) of shared and receptorspecific peaks were generated with computeMatrix (options: reference-point) and plotHeatmap from deepTools v3.4.1 (41), using bigWig files as input which had been generated with deepTools bamCoverage (options: -normalizeUsing RPKM).

Intersect binding and gene regulation
Gene categories. Receptor-specific and non-regulated genes were defined as follows: receptor-specific: adjusted P-value <0.05 and log 2 (fold change) >1.5 for either AR or GR; Shared: adjusted P-value <0.05 and log 2 (fold change) >1.5 for both AR and GR; non-regulated: adjusted P-value <0.5 and 0.5 > log 2 (fold change) > 0.
Assigning ChIP-seq peaks to genes. For each gene, we scanned a 60-kb genomic window centered on transcriptional start sites (TSSs) for overlap with each peak file (ARspecific peaks, GR-specific peaks, shared peaks between AR and GR) using bedtools window. The size of the window to assign peaks to genes was guided by the observation that the correlation between transcriptional regulation by either GR or AR and peaks decreases with increasing distance from the TSS. The 60 kb window yielded a reasonable number of genes while the presence of a peak in this window still increases the chance that a nearby gene is regulated substantially when compared to genes that lack a peak in this window. TSS annotations of genes (hg19) were obtained from the NCBI RefSeq database using the SeqMiner package (46). To make sure that each peak is only assigned to one of the gene categories, peaks assigned to more than one gene category were assigned to the nearest gene using the package ChIPpeakAnno in R (47). Genes lacking binding sites in the 60 kb window are labelled as 'no peaks'. The distribution of binding events for each gene category are shown as stacked bar graphs. We used the same approach to link GR-specific peaks in regions of low chromatin accessibility to the different gene categories.
Statistical tests were performed using the Fisher Exact test on 2 × 2 contingency tables comparing the number of genes in each differential gene category that have specific binding events to the number of non-regulated genes that have corresponding binding events.
FAIRE-STARR and other features at shared peaks near different gene categories. FAIRE-STARR-seq (window ±250 bp), H3K27ac, AR/GR ChIP-seq or ATAC-seq mean signal (window ±2 kb around the peak center) was plotted for shared peaks of the different gene categories (for categorization see: Intersect binding and gene regulation) using deepTools v3.4.1 (41) computeMatrix (options: reference-point) and plotProfile, using BigWigs files as input.

Motif analysis and GC content of AR and GR-specific binding sites, AR binding sites of the AQP3 enhancer.
Motif enrichment analysis at AR and GR binding sites was performed with AME (48) of the MEME suite v5.1.1 (49). All AR-specific peaks as well as GR-specific peaks (either 6593 peaks randomly sampled or the 6593 peaks with the highest chromatin accessibility as sorted by ATAC-seq signal (vehicle treatment) of U2OS-GR) were used as input. The peak sequences (±250 bp around the peak center) were scanned for the JASPAR 2018 CORE Vertebrates Clustering motifs (1) including the AR-specific DR3 motif (Supplementary Figure S4A). Control sequences were either set to be the shuffled input sequences or peak sequences (±250 bp around the center) of the other hormone receptor. For the heatmap representation, motif hits were included if the E value was <10 -30 for either AR or GR. Alternatively, the top 7 motif logos are shown.
GC content profiles were generated at all AR-specific and all GR-specific peaks as well as at high accessibility ARand GR-specific peaks. The high accessibility regions were obtained by sorting the ATAC-seq signal (vehicle treatment) at all AR-or GR-specific peaks (±250 bp around the peak center) of the respective cell line in descending order and extracting the top 3296 peaks. Next, the hg19 reference genome was binned into 50 bp bins with the BEDtools v2.27.1 (43) makewindows function. For each bin, the GC content was obtained using BEDtools nuc and the resulting bedgraph file was converted into bigWig format with bedGraphToBigWig (50). The GC content profiles ±5 kb around peak centers were generated using the deepTools v3.4.1 (41) commands computeMatrix (options: referencepoint) and plotProfile.
For GC content plots in VCaP and LNCaP-1F5 cells, processed AR (100 nM dihydrotestosterone, 2h) and GR (100 nM Dex, 2 h) ChIP-seq peaks from a previous study (4) were downloaded from GEO (GSM980657, GSM980658, GSM980660, GSM980662, GSM980664). For AR peaks in VCaP cells, peaks from both replicates were intersected using BEDtools intersect v2.27.1 (-u) (43) and overlapping peaks were extracted. ENCODE blacklisted regions for hg19 (44) were removed from all peaks. Shared and receptor-specific peaks for each cell line were obtained with BEDtools intersect. GC content profiles were plotted for all AR-and GR-specific regions in each cell line. Statistical tests were performed using Mann-Whitney-U test comparing GR and AR-specifically occupied regions. To identify motif matches in the AQP3 enhancer sequence [hg19: chr9:33437258-33437811], it was scanned for the AR (JAS-PAR ID MA0007.1-3) and GR (JASPAR ID MA0113.1-3) motif using the JASPAR CORE database (51). A total of six putative AR and four GR sites were found with a relative profile score threshold 80%. The top three AR MA0007.1-3 motif hits are shown ( Figure 7B).
Heatmaps shared, AR-specific, non-regulated genes, GRspecific RIME interacting genes (mediator and chromatin categories). Using the function normTransform in DE-Seq2 (38), the un-normalized gene counts were transformed into log transformed normalized gene counts for heatmap visualization of genes from previously obtained differential gene categories. To check the gene expression of AR-specific, GR-specific upregulated genes after hormone treatment in AR and GR, genes were sorted by log fold change and the top 50 AR-specific, GR-specific and shared target genes between AR and GR with highest fold change were plotted using the pheatmap and ggplot2 packages. For the non-regulated gene category, RNA-seq heatmaps of 50 randomly selected genes were plotted. Similarly, heatmaps for GR-specific RIME interaction partners were generated.

Transcriptional Regulation and Genomic Binding by GR and AR
To study how TF paralogs can diverge functionally despite having nearly identical DNA binding domains, we used the same parental cell line to generate cells that either express GR (24) or AR ( Figure 1A, Supplementary Figure S1B). Characterization of these cell lines, using whole-cell [ 3 H] steroid binding assays, showed that AR levels were about 3 times lower than for GR ( Figure 1B). Further, robust regulation of the FKBP5 and other target genes was observed 4 h after hormone treatment of the U2OS-GR line. In contrast, regulation of FKBP5 and other genes required a markedly longer hormone treatment for the U2OS-AR line ( Figure 1C). Given the slower kinetics of gene regulation for AR, we decided to generate and compare RNA-seq data for U2OS-GR cells treated for 4 h and U2OS-AR cells treated for 24 h. For upregulated genes, we identified three classes of genes: GR-specific genes (590) AR-specific genes (177) and genes regulated by both AR and GR (187 genes), ( Figure 1D, Supplementary Figure S1A). Importantly, for each of the three classes of genes the RNA-seq data showed similar basal expression levels for individual genes in the U2OS-AR and U2OS-GR cell lines (Supplementary Figure  S1A). Basal levels were also alike for a set of non-regulated genes we plotted (Supplementary Figure S1A) indicating that the cell lines we analyzed have comparable transcrip-tomes prior to hormone treatment. Furthermore, we could recapitulate the pattern of gene regulation observed in our cell lines when we transiently transfected the parental U2OS line with expression constructs for AR or GR (Supplementary Figure S1C-E) and analyzed a shared gene (FKBP5) and two genes each that are either GR-specific (GILZ and IGFBP1) or AR-specific (AQP3 and SIGLEC14). This suggests that the receptor-specific regulation observed for these genes is caused by differences between the two steroid receptors, however we cannot rule out that some of the differences might be due to clonal differences between the two cell lines examined. The number of repressed genes was much smaller for AR (17) than for GR (230) with little overlap (three genes) between the two gene sets.
Differential patterns of genomic occupancy for AR and GR likely play a role in directing receptor-specific gene regulation (4,7). To compare the cistromes between AR and GR, we generated and analyzed ChIP-seq data for both hormone receptors. We called peaks and created three peak categories: shared peaks (peak called in each of the two replicates for both AR and GR), AR-specific peaks (peak called in both AR replicates, but not for GR) and GRspecific peaks (peak called in both GR replicates, but not in AR). Consistent with what we observed in terms of gene regulation, a substantial fraction of AR binding peaks overlaps with GR-occupied loci (Figure 2A). In addition, we find a category of AR-specific peaks and a large number of binding sites that are occupied in a GR-specific manner (Figure 2A, B).
Next, we assessed whether differential occupancy contributes to the receptor-specific transcriptional regulation we observed. Given the small number of repressed genes for AR and the ambiguous link between GR binding and transcriptional repression (54), we decided to focus our analysis on upregulated genes. Genes were categorized as either nonregulated by either AR or GR, shared between AR and GR, GR-specific or AR-specific (Supplementary Figure S1A). For each gene within a category, we scanned a 60 kb window centered on the transcriptional start site for the presence of either a peak shared by AR and GR, a GR-specific peak, an AR-specific peak or the absence of a peak. As expected, we found that a larger fraction of non-regulated genes contains no peaks in this window than genes in the other categories, indicating that nearby receptor binding correlates with gene activation ( Figure 2D). Furthermore, AR-specific binding is enriched near AR-specific genes whereas GRspecific binding is enriched near GR-specific genes as well as shared target genes ( Figure 2C, D). Together, these data indicate that receptor-specific binding is a driver of receptorspecific gene activation. However, for the majority of genes that are activated specifically by either AR or GR, we do not observe receptor-specific binding suggesting that receptorspecific regulation might also be governed by events downstream of binding.

Role of chromatin in shaping genome-wide receptor binding
To investigate the role of chromatin accessibility in shaping the genome-wide binding of AR and GR, we performed ATAC-seq (55) and ChIP-seq experiments for a panel of histone modifications for vehicle-treated cells to capture the chromatin landscape the receptors encounter when activated by hormone. Next, we intersected the ATAC-seq data with the three peak categories (shared; AR-specific; GR-specific binding). The most striking difference between the peak categories is that GR-specific peaks are, on average, markedly less accessible prior to hormone treatment than either AR-specific or shared peaks in both the U2OS-GR and the U2OS-AR cell line ( Figure 3A). We also performed ChIP-seq experiments for a panel of histone modifications that are associated with either closed or open chromatin (44). Consistent with preferential GR-specific binding at relatively inaccessible loci ( Figure 3A), the levels of histone modifications associated with closed chromatin (H3K27me3 and H3K9me3) are higher for the GRspecific peaks than for either shared or AR-specific peaks (Supplementary Figure S2A, B). Conversely, the levels of histone marks associated with open chromatin (H3K4me1, H3K27ac, (44)) are lower for the GR-specific peaks (Figure 3B, Supplementary Figure S2C). Notably, chromatin features under basal conditions show a similar pattern in the cell lines and suggest that the differences in receptor binding observed are not a consequence of striking differences in the chromatin landscape between the U2OS-AR and U2OS-GR cell lines. Under basal conditions, the GR-specific peaks have low ATAC-seq and H3K27ac signal and relatively high levels of histone modifications associated with closed chromatin when compared to shared and AR-specific peaks in both cell lines. Accordingly, analysis of multiple biological replicates by qPCR indicates that ATAC (Supplementary Figure S3A) and H3K27ac levels (Supplementary Figure S5B) under basal conditions are similar for the U2OS-AR and U2OS-GR cell lines. Together, these results argue that a different propensity to bind relatively inaccessible chromatin plays a role in directing receptor-specific binding.
In our set-up, GR levels are about three times higher than AR levels ( Figure 1B). Therefore, we wanted to test if GRspecific binding at relatively inaccessible chromatin is a simple consequence of higher GR levels, or an intrinsic property that distinguishes GR from AR. To test if GR can still bind when reduced levels of hormone-occupied receptor are present, we assayed GR occupancy at hormone concentrations below the reported K D s of GR for dexamethasone (∼3-5 nM, (56,57)). We first confirmed GR-specific binding at four low-accessibility loci when a saturating amount of hormone was used (Supplementary Figure S3A). At lower hormone concentrations (0.5 and 1 nM), GR occupancy was reduced but still detectable at each of the loci examined (Supplementary Figure S3A). Previous studies have shown that GR binding induces increased chromatin accessibility (58). Consistent with GR-specific binding, we observed an increase in chromatin accessibility upon hormone treatment at these 4 low-accessibility loci for GR but not for AR (Supplementary Figure S3B). The increase was smaller, but still observable, at sub-saturating hormone concentrations (1 nM) arguing that binding and opening is not a simple consequence of higher receptor levels for GR than for AR.
Next, we investigated if GR binding at regions of low chromatin accessibility might contribute to GR-dependent gene regulation. Therefore, we filtered GR-specific peaks for those mapping to relatively inaccessible chromatin ('Pioneering GR-peaks') and intersected them with the different categories of regulated genes. In line with a role in regulating gene expression, we found that these low-accessibility GR peaks are enriched near genes regulated by GR ( Figure  3C). Together, these results argue that a different propensity to bind inaccessible chromatin plays a role in directing receptor-specific binding and gene regulation.

Role of sequence in shaping genome-wide receptor binding
Differences in DNA-binding specificity between related TFs can induce differential binding and gene regulation (59). To study the role of sequence composition in directing AR and GR to different genomic loci, we used AME (Analysis of Motif Enrichment, (48)) to scan the clustered JASPAR CORE vertebrates motif collection (60) supplemented with a direct repeat AR/GR consensus motif (DR3) which is reported to be AR-specific (61). Similar to a previous study comparing the motif composition of AR-and GR-specific sites (4), we found that enriched sequence motifs largely overlap between AR-and GR-specific sites with some differences (Supplementary Figure S4C). For example, the DR3 motif was more enriched for AR-specific binding sites whereas the canonical inverted AR/GR consensus motif was more enriched for GR-specific sites. The most striking difference was for the AP-1 motif which was the most enriched motif for AR-specific sites with little enrichment for GR-specific sites. Given the role of AP-1 in maintaining open chromatin (62), the lack of motif enrichment for GR-specific sites could reflect preferential binding at inaccessible chromatin. To remove chromatin accessibility as a potential confounding factor, we repeated the motif analysis using GR-specific sites at open chromatin regions which show a similar level of ATAC-seq signal as the ARspecific binding sites ( Figure 4A). Analysis of selective receptor binding in regions with similar chromatin accessibility showed motif hits that resembled the results for all binding sites except, for example, that the GR-specific sites in open chromatin were also enriched for AP-1 ( Figure 4A).
Next we repeated the AME analysis however this time using AR-specific peaks as background for GR-specific peaks (either all peaks or only GR peaks in accessible chromatin) and vise-versa to identify motifs that are selectively enriched. Interestingly, the top AR-specific motifs were mostly AT-rich whereas the top motifs for GR-specific peaks were often high in GC content suggesting that the sequence composition of selectively occupied regions is different ( Figure  4B). Accordingly, when we scanned the GC content of AR and GR-specific genomic regions, we found a higher GC content for GR occupied regions than for regions occupied by AR ( Figure 4C). This difference was most pronounced when we compared receptor-specific peaks in regions of accessible chromatin ( Figure 4C). This finding is in line with a recent study showing that AR binding is distinguished from GR by a preference for poly(A) sequences directly flanking the consensus binding site (7). Surprisingly, in contrast to the in vitro study we find that the difference extends far beyond the core motif and its direct flanks (>10 kb) arguing that the sequence composition of a large region surrounding the binding site might play a role in directing receptorselective recruitment.

Receptor-specific consequences of DNA binding
To compare the events that occur downstream of AR and GR binding, we assayed the effect of hormone treatment on several chromatin features. First, we compared changes in chromatin accessibility by analyzing ATAC-seq signal after hormone treatment. In agreement with published data (63,64), both AR and GR induce an increase in chromatin accessibility at occupied loci ( Figure 3A). For GR, accessibility increased at shared and GR-specific sites but not at AR-specific regions. For AR, the increase in accessibility was most pronounced at shared and AR-specific peaks however, a slight increase was also observable for the GRspecific sites indicating that some AR binding also occurs at some of these loci.
Next, we analyzed the effects of hormone treatment on histone modifications. Specifically, we analyzed H3K27ac as a marker for active enhancers and indicator of enhancer activity as well as H3K4me1 as a marker of active and primed enhancers (65,66). Consistent with increased chromatin accessibility at occupied loci, we found that GR activation by hormone, and to a lesser degree for AR, induced nucleosome shifts for H3K4me1 and H3K27ac resulting in reduced signal at the center of the receptor-occupied locus relative to the regions directly flanking it ( Figure 3B, Supplementary Figure S2C). In line with expectation, GR activation resulted in a marked increase in H3K27ac levels at GR-occupied loci ( Figure 3B). In contrast, H3K27ac levels barely changed in response to AR activation at ARoccupied loci ( Figure 3B). To test if the receptor-specific changes in H3K27ac levels are restricted to the time-point assayed, we tested several loci occupied by both AR and GR by ChIP-qPCR. Specifically, we chose four loci located near genes that are regulated by both receptors (Supplementary Figure S5A) and assayed H3K27ac levels after both 4 and 24 h of hormone treatment. Consistent with our genomewide analysis, a marked increase in H3K27ac levels was observed 4 h after GR activation with even higher levels after 24 h (Supplementary Figure S5B). For AR, H3K27ac levels did not change after 4h of hormone treatment whereas relatively small increases were observable after 24 h. Together, these results argue that both AR and GR induce chromatin remodeling and increased chromatin accessibility upon genomic binding. However, the consequences downstream of AR and GR binding diverge when H3K27ac levels are examined with robust and rapid increases for GR whereas AR activation only induces modest changes in H3K27ac levels that occur with slower kinetics.
Although several studies indicate that H3K27ac levels are indicative of enhancer activity (65,66) some studies argue that H3K27 acetylation is dispensable for enhancer activity (67). For a quantitative comparison of the ability of clusters at all AR-specific and highaccessibility GR-specific peaks (±250 pb around the peak center). Shuffled input sequences were used as background for the motif enrichment analysis. Motifs were included if the E value was <10 -30 for either AR or GR. (B) Top: Top 7 enriched differential motif clusters and corresponding E value at all AR-specific peaks when compared to an equal number of high-accessibility GR-specific peaks (±250 pb around the peak center). Bottom: Top 7 enriched differential motif clusters at high accessibility GR-specific peaks when compared to all AR-specific peaks (C) Mean GC content at all AR-specific peaks (AR all), all GR-specific peaks (GR all) and peaks in regions of accessible chromatin (AR high access or GR high access) (±5 kb around the peak center). Based on Mann-Whitney U test, GC content is significantly higher for GR-specific peaks (P-value < 2.2e-16).
receptor-occupied regions to enhance transcription, we performed STARR-seq (Self-Transcribing Active Regulatory Region sequencing) for both AR and GR with two modifications ( Figure 5A). First, to limit the number of putative enhancers we focused on genomic regions isolated by FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) from dexamethasone-treated U2OS-GR cells to include regions that gain accessibility upon GR activation. Second, we added unique molecular identifiers (UMIs) during the reverse transcription stage to facilitate quantitative measurements of enhancer activity for each fragment. To quantify enhancer activity, we transfected U2OS-AR and U2OS-GR cells with the FAIRE-reporter library and assayed regulatory activity for vehicle-treated cells and for cells treated overnight with the corresponding hormone. In-tersection of the STARR-seq data with the different groups of receptor binding sites, showed that enhancer activity increased upon dexamethasone treatment for GR-occupied regions whereas the enhancer activity of the group of ARspecific peaks did not change ( Figure 5B). For AR, we observed increased enhancer activity for the shared and ARspecific sites upon R1881 treatment. In addition, AR activates enhancers of the GR-specific group of binding sites indicating that AR can bind these regions when they are taken out of their endogenous chromatinized genomic context. Quantitatively, at the time point examined, the overall level of activation is higher for GR than for AR. However, despite the very modest induction of H3K27 acetylation for AR, enhancer activation is also observed for AR. Moreover, an exemplary enhancer near the AR-specific AQP3 gene shows activation by AR but not by GR ( Figure  7A, Supplementary Figure S5E) in the STARR-seq assay whereas H3K27ac changes upon binding are much more pronounced for GR indicating that enhancer activation and H3K27 acetylation can be uncoupled.
Notably, both regulated-and non-regulated genes harbor proximal receptor binding sites ( Figure 2D) arguing that events downstream of binding play a role in specifying if a nearby gene is regulated or not. To test if the enhancer activity of shared peaks correlates with the regulation of nearby genes, we compared shared peaks that are located near genes that are either non-regulated, shared targets of AR and GR or receptor-specifically regulated ( Figure 5C). Consistent with a role of of enhancer activity in directing changes in gene expression of nearby genes, we find that for both AR and GR the STARR-seq activity of shared peaks near regulated genes is higher than for shared peaks that are located near non-regulated genes ( Figure 5C). Furthermore, for GR enhancer activity after hormone treatment is highest for shared peaks near shared and GR-specific genes with lower activity at shared peaks near AR-specific genes and non-regulated genes ( Figure 5C). Similarly, enhancer activity for AR is highest for shared peaks near shared genes followed by AR-specific genes, GR-specific and finally non-regulated genes. Further arguing for a role for events downstream of binding in directing specificity, we find that H3K27ac levels ( Figure 5D) and ATAC-seq signal (Supplementary Figure S5C) at shared peaks after hormone treatment correlate with receptor-specific regulation. The ChIP-seq signal for AR and GR at shared peaks is also a bit higher at peaks near regulated genes, indicating that receptor levels likely contribute to the differences in enhancer activity (Supplementary Figure S5D).
Together, these results show that enhancer activity of receptor-occupied regions correlates with transcriptional activation of genes. Comparison between AR and GR shows that activity is highest near shared genes. For GR, peaks near GR-specific genes are more active than those lo-cated near AR-specific genes. This order is reversed for AR, arguing that receptor-specific enhancer activity at shared binding sites plays a role in directing receptor-specific regulation of nearby genes.

Shared binding sites contribute to receptor-specific target gene regulation
The presence of shared binding sites near receptor-specific target genes prompted us to explore experimentally if shared binding sites contribute to the regulation of the nearby gene. Specifically, we picked the GR-specific target gene GILZ and the AR-specific gene AQP3 and confirmed the receptor-specific regulation by qPCR ( Figure 6A). For GILZ, each of the GR-occupied peaks in an ∼100 kb window centered on the regulated promoters is also occupied by AR ( Figure 6B). We previously showed that simultaneous deletion of the promoter-proximal peaks and a region downstream of GILZ containing multiple peaks resulted in a virtual loss of GR-dependent regulation (27). Reexamination of the clonal lines confirmed that the proximal enhancer and distal GR binding sites in the downstream region are required for GR-dependent regulation of GILZ. For the control gene FKBP5, a GR target gene located on another chromosome, increased expression is still observed upon addition of dexamethasone to activate GR ( Figure  6C). However, activation of FKBP5 is weaker in the edited single-cell derived clonal lines, a phenomenon we also observed for unedited single-cell derived clonal lines in a previous study (27) indicating that the selection process blunts the GR-dependent activation of this gene.
Thus, despite occupancy for both AR and GR at each of the peaks required for GR-dependent regulation, AR fails to regulate GILZ. To assess whether differential enhancer activity could explain receptor-specific transcriptional regulation, we compared the STARR-seq signal at shared peaks for hormone-treated cells (Supplementary Figure S6A). For several shared peaks of the GILZ gene, the STARR-seq activity upon hormone treatment appears higher for GR than for AR indicating that the level of enhancer activity at shared binding sites might play a role in directing GRspecific regulation.
For the AR-specific AQP3 gene, two prominent shared peaks occupied by both AR and GR are located in a region ∼5-10 kb downstream of the transcriptional start site (Figure 6D). To test if these peaks contribute to AR-dependent regulation, we used CRISPR/Cas9 (68) and a pair of sgR-NAs to remove a ∼4 kb genomic fragment containing both peaks ( Figure 6D, Supplementary Figure S6B). Analysis of the resulting clonal lines showed that AR no longer regulates the AQP3 gene when these peaks are deleted ( Figure  6D) whereas the AR target gene FKBP5, which is located on another chromosome, is still regulated by AR. Interestingly, the peak closest to the AQP3 gene ('AQP3 enhancer') shows an increase in STARR-seq signal in U2OS-AR cells upon hormone treatment whereas no such increase is observed for GR ( Figure 7A). To characterize the AQP3 enhancer, we constructed a reporter containing a 554 bp region centered on the peak. Testing the activity of this reporter confirmed that the AQP3 enhancer is activated in an AR-specific manner whereas an enhancer near the IP6K3 gene was regulated by both AR and GR ( Figure 7A). To test the influence of sequence in directing the observed receptor-specific regulation of the AQP3 enhancer, we scanned the sequence for occurrences of the AR consensus motif and deleted the top 3 matches (Jaspar MA0007.2, Figure 7B). Each of these three sites contains one well-defined half-site with a more degenerate second half site. Simultaneous deletion of all three motif matches by mutating key positions resulted in a loss of hormone-dependent activation for AR showing that these binding sites are required for regulation ( Figure 7B). Next, we mutated each of the top 3 matches to resemble the GR consensus motif with a well-defined second half-site (Figure 7B, AGA → TGT). This mutated AQP3 reporter could now be activated by both AR and GR, indicating a role of these sequences in directing AR-specific regulation of the reporter.
Together, these results argue that bound regions that are shared between AR and GR play a role in directing receptor-specific regulation.

RIME uncovers different interactomes for AR and GR
AR and GR have very similar DNA binding domains whereas the rest of the protein, especially the N-terminal domain, is much less conserved. As a consequence, AR and GR might interact with different transcriptional coregulators. This in turn could contribute to receptor-specific regulation, e.g. when cofactors that address a rate limiting step in the transcriptional activation of a gene are recruited in a receptor-specific manner. To compare the proteins that interact with GR and AR, we performed RIME (rapid immunoprecipitation mass spectrometry of endogenous proteins). This method combines formaldehyde fixation of intact cells to stabilize protein complexes with immunoprecipitation of a protein of interest and finally mass spectrometry for protein identification (69). We performed RIME for hormone-treated U2OS-GR and U2OS-AR cells and identified 105 significantly enriched proteins for GR and 173 for AR (Supplementary files S1 data and S2 data, Figure 8A). As expected, many of the significantly enriched proteins for GR (59%) were also significantly enriched for AR and enriched gene sets include transcription coactivators, nuclear receptor-coactivators and chromatin remodelers ( Figure 8B). Furthermore, several of the identified proteins are previously validated interactors of GR and AR (e.g. ARIDA1, NCOA1, NCOA3, EP300, CREBBP, NCOR1, NCOR2, TRIM28) (70,71).
Next, we compared the RIME data between AR and GR to identify proteins that interact in a receptor-specific manner ( Figure 8C, supplementary file S3 data). For AR, enriched proteins are linked to RNA splicing and processing whereas for GR, geneset enrichment analysis revealed the mediator complex as the top hit and also included a category of genes linked to acetyltransferase activity (Supplementary Figure S7A, supplementary file S3 data). Importantly, enriched proteins in these categories showed comparable expression levels in our RNA-seq data for hormone-treated U2OS-AR and U2OS-GR cells indicating that their enrichment is not a simple consequence of hormone-induced changes in gene expression (Supplementary Figure S7B, C). Analysis of the RIME signal indi-Nucleic Acids Research, 2021, Vol. 49, No. 7 3869  cated that each of the mediator subunits is enriched for GR whereas for AR only a subset is enriched with overall lower signal ( Figure 8D). Notably, GR-specific interactions with mediator subunits have also been reported in other studies (72,73) indicating that the selectivity of this interaction is not restricted to the cell line we examined. To test if the GRspecific association in our RIME experiments translates into GR-specific recruitment, we performed ChIP experiments for one of the mediator complex members, MED1. Analysis of several GR-occupied loci showed robust MED1 recruitment upon dexamethasone treatment ( Figure 8E). In contrast, no obvious MED1 recruitment was observed for these AR-occupied loci upon R1881 treatment (Figure 8E). Given the striking difference in induced H3K27 acetylation between AR and GR ( Figure 3B), we were surprised to see that the enzymes responsible for the majority of H3K27 acetylation, CREBBP and EP300 (74), were significantly enriched for both receptors ( Figure 8A, D, Supplementary file S1 data and S2 data). Furthermore, ChIP experiments targeting EP300 showed that it is recruited to several receptor-occupied loci by both AR and GR ( Figure  8F). Thus, the difference in H3K27 acetylation does not appear to be a simple consequence of selective EP300 recruitment by GR but could be due to selective recruitment of other proteins that influence H3K27ac levels or modulation of the activity of EP300/CREBBP or other histone acetyl transferases that target H3K27.
Taken together, our proteomic data suggests that GR and AR recruit distinct sets of transcriptional co-regulator proteins which may contribute to receptor-specific gene regulation.

DISCUSSION
Despite their similar modes of action, the physiological consequences of androgen and glucocorticoid signaling diverge. This specificity can be a consequence of tissue specific expression of the receptor (16). However, even in cells that express both AR and GR, the target genes only partially overlap (4,19). Here, we present evidence that receptor-specific The enhancer with AR-specific activity is highlighted. Cells for ChIP-seq were treated as for Figure 2A. Treatment for FAIRE STARRseq as described for Figure 5B. Right: Transcriptional activity of STARR-seq reporters containing either the APQ3 enhancer or an enhancer near the IP6K3 gene that is activated by both AR and GR. Relative mRNA levels ± SD are shown for cells treated overnight with either vehicle or with 5 nM R1881 (U2OS-AR cells) or 1 M Dex (U2OS-GR cells). Averages ± SD are shown (n = 3). (B) Left: Top 3 AR motif (JASPAR MA0007.1-3) matches of the AQP3 enhancer region. Positions highlighted in bold were changed to ATA to delete each of the three motif matches (Deleted). AGA sequence was mutated to TGT to create motifs resembling the canonical GR consensus motif (AGA → TGT). Right: Transcriptional activity of STARR-seq reporters as indicated for GR and AR. Cells were treated as for (a). Averages ± SD are shown (n = 3). gene regulation can be facilitated by different mechanisms. First, differences in DNA binding site preference and distinct abilities to bind 'closed' chromatin can direct divergent genomic binding patterns and target gene regulation. Second, events downstream of binding facilitate receptorspecific target gene regulation from genomic binding sites that are occupied by both AR and GR. Notably, for several genes examined, including AQP3 and GILZ, we find that receptor-specific regulation is observed at more than one hormone concentration and time-point (Supplementary Figure S8). This suggests that the receptor-specific regulation we report is not simply driven by the different ligand dose used or specific to the time point examined. Nonetheless, we cannot rule out that some of the differences in binding and regulation that we observe might be due to differences in the kinetics of gene activation by both receptors or specific to the ligand concentration used. Moreover, the parental U2OS cell line expresses low levels of endogenous GR, which may have primed certain genes for activation and contribute to the GR-specific regulation of some genes. The differential gene regulation by AR and GR can be explained, in part, by divergence in DNA binding specificity. In line with previous studies showing AR-specific binding in vitro, the DR3 motif was more enriched at AR-specific binding sites (61). However, when we examined ChIP-exo for both AR and GR, we found a footprint for both AR and GR indicating that despite the more pronounced enrichment of this motif for AR ( Figure 4A), both receptors might bind to DR3 sequences in vivo ( Supplementary Figure S9). Another difference we observed was that the canonical AR/GR consensus motif was more enriched for GR than for AR suggesting that AR might be able to bind to more degenerate sequences something that has also been described by others (75). In addition, our findings corroborate a recent study showing that AR binding is distinguished from GR by a preference for poly(A) sequences directly flanking the consensus binding site (7). However, whereas this in vitro study shows that this preference is restricted to the 3 to 4 base pairs immediately flanking the motif, our in vivo binding studies reveal a more global dif- ference in sequence composition between regions that are selectively occupied ( Figure 4C). GR-specific regions are more GC-rich that AR-specific regions and this difference is even more pronounced when we control for chromatin accessibility which correlates with sequence composition (76). Moreover, when we analyzed published ChIP-seq data for AR and GR in LNCaP and VCap cells respectively (4), we again found that GR-specific regions have higher GC content than their AR-specific counterpart (Supplementary Figure S4D). Interestingly, earlier work also reported global differences in sequence composition that extend far beyond the core binding site when comparing TFs across diverse families (77). This suggests that the local environment may help direct TFs from different TF families to distinct genomic loci by yet unknown mechanisms. Our finding that the sequence environment differs between paralogous TFs argues that the sequence environment might also play a role in directing TFs with very similar sequence preferences to distinct binding sites. This might not just apply to AR and GR, but could also play a role in directing paralogous TFs from the homeodomain family to distinct loci given that the global motif environment differs between paralogs (77).
Genome-wide TF occupancy is influenced by chromatin environment, with the majority of TF binding occurring at accessible DNA regions (6,63,78,79). Here, we report distinct abilities to bind 'closed' chromatin for AR and GR as a potential mechanism that directs receptor-specific binding and gene regulation. Moreover, the shape of the signal for H3K9me3 and H3K27me3 at GR-specific peaks suggests that the sequence motif is embedded in a nucleosomal context (Supplementary Figure S2). Thus, similar to so-called pioneering factors (80) and consistent with previous studies, GR appears to interact with nucleosomal motifs (81)(82)(83). Different abilities to interact with relatively inaccessible chromatin among paralogous TF has also been reported by others. For example, whereas Oct4 can bind 'closed' chromatin (14), its homolog from the Pou family of TFs, Brn2, does not (13). Similarly, the ability to interact with 'closed' chromatin differs between members of the Hox family of TFs (11,12). One explanation for the paralogspecific binding to 'closed' chromatin could be differences in protein expression levels. In support of this hypothesis, a larger fraction of GR peaks maps to 'closed' chromatin at saturating hormone concentrations than at hormone concentrations below the K D at which only a fraction of GR is hormone-occupied (84). Similarly, we find that GR binding at 'closed' loci is weaker at hormone concentrations below the K D (Supplementary Figure S3). However, even at low hormone concentrations GR binding at these 'closed' loci is observable. Accordingly, chromatin accessibility at these loci increases when cells were treated with low hormone concentrations (Supplementary Figure S3). Together, these findings indicate that GR binding at 'closed' chromatin might not simply be explained by higher expression levels for GR than for AR. A further indication that the lack of AR binding at GR-specific peaks is a consequence of the chromatin context in which they are embedded comes from our STARR-seq experiments. An advantage of the ectopic STARR-seq reporter assay is its ability to assess the enhancer activities of DNA sequences that are silenced endogenously at the chromatin level (21). For GR, increased STARR activity upon hormone treatment is seen for GRspecific peaks, whereas no such increase is observable for AR-specific peaks ( Figure 5B) consistent with the receptorspecific occupancy we observed. This is different for AR, which shows an increase in STARR activity upon hormone treatment for AR-specific peaks ( Figure 5B). However, an even stronger activation is seen for the GR-specific peaks arguing that AR is capable of binding and activating transcription from these GR-specific regions when they are removed from their endogenous chromatin context. Thus, a complementary explanation for the paralog-specific binding could be a receptor-specific intrinsic ability to interact with 'closed' chromatin. This could be a consequence of receptor-specific interactions with coregulators, e.g. chromatin remodelers, that facilitate the binding and or opening of chromatin by GR to induce a feed-forward loop to establish robustly occupancy at these loci. These receptorspecific cofactors might interact with less conserved parts of the receptors, for example the N-terminal part of the receptor which shows little conservation between AR and GR (85).
Numerous studies have shown that differential genomic targeting is a mechanism that can generate functional diversification among paralogous TFs. However, paradoxically when comparing the genome-wide binding patterns of paralogous TFs, a large fraction of peaks typically overlaps. For example, when comparing AR and GR, receptorspecific binding only explains receptor-specific regulation for a subset of genes with many receptor-specific target genes only harboring nearby binding sites that are shared ( Figure 2D, (19)). Here, we present evidence that shared binding sites also play a role in directing functional diversification among paralogous TFs. This raises the question: How can genomic loci that are occupied by both AR and GR direct receptor-specific transcription responses? Based on our studies, the explanation is that the downstream con-sequences of binding differ between AR and GR in several ways. For instance, even though both AR and GR induce robust changes in chromatin accessibility, GR binding induces robust changes in H3K27ac levels whereas for AR the increase is much more subtle ( Figure 3B). This is even true for enhancers near genes that are AR-specific and show AR-specific activity in the STARR-seq assay ( Figure  7A, Supplementary Figure S5E) and argues that increases in chromatin accessibility, enhancer activation, gene regulation and H3K27 acetylation can be uncoupled. Shared binding sites show a receptor-specific potential to activate transcription in reporter assays, and based on our genome editing studies for the AQP3 and GILZ genes ( Figures 5C  and 6). The different downstream consequences of binding could in turn be a result of receptor-specific interactions with cofactors, e.g. the mediator complex, as observed in our RIME assays. Notably, cofactors display specificity for distinct types of core promoters indicating that 'compatibilities' between cofactors and promoters can influence if a recruited cofactor influences gene expression or not (86). Moreover, depending on the target gene examined, different cofactors are required for GR-dependent regulation (87)(88)(89). For instance, a previous study (Chen et al. 2006) using the U2OS-GR cell line, showed that GRdependent activation of IGFBP1 (a GR-specific target gene) was blunted upon siRNA-mediated knockdown of Med14, whereas GILZ activation was unaffected by the knockdown of either Med1 or Med14. Thus, the GR-specific interaction with the mediator complex we found in our study might play a role in facilitating the regulation of some GRspecific target genes, whereas other mechanisms are in play at other GR-specific target genes. This indicates that for individual genes, different cofactors address the rate-limiting step of the Pol 2 transcription cycle for gene activation which could either be limited by the level of PolII recruitment, initiation of elongation. In this scenario, receptorspecific gene activation from shared binding sites would occur when different cofactors are recruited by AR and GR.
In summary, our study suggests that both divergence in genomic occupancy and diversity in the events that occur downstream of binding contribute to functional diversification among TF paralogs (Supplementary Figure S10).