Genomic transcription factor binding site selection is edited by the chromatin remodeling factor CHD4

Abstract Biologically precise enhancer licensing by lineage-determining transcription factors enables activation of transcripts appropriate to biological demand and prevents deleterious gene activation. This essential process is challenged by the millions of matches to most transcription factor binding motifs present in many eukaryotic genomes, leading to questions about how transcription factors achieve the exquisite specificity required. The importance of chromatin remodeling factors to enhancer activation is highlighted by their frequent mutation in developmental disorders and in cancer. Here, we determine the roles of CHD4 in enhancer licensing and maintenance in breast cancer cells and during cellular reprogramming. In unchallenged basal breast cancer cells, CHD4 modulates chromatin accessibility. Its depletion leads to redistribution of transcription factors to previously unoccupied sites. During cellular reprogramming induced by the pioneer factor GATA3, CHD4 activity is necessary to prevent inappropriate chromatin opening. Mechanistically, CHD4 promotes nucleosome positioning over GATA3 binding motifs to compete with transcription factor–DNA interaction. We propose that CHD4 acts as a chromatin proof-reading enzyme that prevents unnecessary gene expression by editing chromatin binding activities of transcription factors.


Introduction
During the cell fate transitions integral to development or transcription factor-dependent cellular reprogramming, lineage-determining transcription factors (TFs) must contend with chromatin to nucleate active enhancers.Some transcrip-tion factors such as pioneer factors utilize their intrinsic ability to bind nucleosomal DNA in closed chromatin as a first step in the induction of chromatin opening ( 1 ,2 ).Establishment of an active enhancer is hypothesized to be a multi-step process involving alterations to local chromatin by chromatin remod-eling enzymes, histone replacement, and editing of histone and DNA modification ( 3 ,4 ).All downstream processes flow from the initial event-recognition of DNA sequence by a sequencespecific DNA binding transcription factor.In most cases, TFs are present on the order of thousands to tens of thousands of molecules per cell ( 5 ,6 ).By contrast, many transcription factor motifs are present on the order of millions of copies per genome (7)(8)(9).How TFs are directed to activate enhancers in all the correct locations and only the correct locations is critical, as both failure to activate appropriate sets of genes as well as inappropriate gene activation is typically deleterious to the biological program.Chromatin accessibility is one of the strong indicators for selective transcription factor binding ( 10 ,11 ).However, pioneer factors are capable of binding to inaccessible chromatin ( 1 ).Therefore, other chromatin contexts must be involved in selective enhancer formation.Cooperative binding by multiple TFs, partial motif recognition, nucleosome positioning, and chromatin remodeling factors are thought to be involved in this process ( 3 ,12-14 ).BRG1 or BAF complex has been shown to be involved in the establishment of de novo open chromatin regions in various cell contexts (15)(16)(17)(18)(19)(20).However, the mechanisms underlying the specific gene targeting and regulation in different genomic contexts are still not fully understood.
CHD4 (Chromodomain Helicase DNA Binding Protein 4) is a catalytic core component of the NuRD (Nucleosome Remodeling and Deacetylase) chromatin remodeling complex and is known to regulate gene expression and DNA damage responses ( 21 ,22 ).CHD4 is involved in multiple developmental processes including neural development and cardiac development (23)(24)(25).Similar to other chromatin remodeling factors, genetic and epigenetic data in cancer patients detected frequent alterations in the CHD4 gene suggesting key roles of CHD4 during tumorigenesis and tumor progression ( 21 ,26-29 ).Particularly in breast cancer, chromosome amplification and mRNA up-regulation of CHD4 were observed, and higher expression of CHD4 is associated with poorer patient outcomes in triple negative (or basal) breast cancer cells ( 30 ,31 ).CHD4 knockdown has been shown to inhibit MDA-MB-231 basal breast cancer cell growth in mouse xenograft model ( 31 ).Because the NuRD complex contains histone deacetylases such as HDAC1 or HDAC2, it has been thought to act as a gene silencer (32)(33)(34)(35).However, the genomic distribution of CHD4 is predominantly enriched at promoters or open chromatin regions, and the CHD4 function appears to be cell context specific (36)(37)(38)(39).It is also largely unknown how this gene silencing remodeling complex contributes to cellular reprogramming mediated by pioneer factors ( 3 ).
In this study, we measured the impacts of CHD4 depletion in MDA-MB-231 basal breast cancer cells.Characterization of chromatin features by multi-omics methodologies revealed differential impacts of CHD4 knockdown in steadystate cells vs cellular reprogramming processes.CHD4 knockdown in steady-state MDA-MB-231 cells tended to increase chromatin accessibility at promoter-distal regions.Redistribution of AP1 family transcription factors was observed in the absence of CHD4.During the GA T A3-induced mesenchymalto-epithelial transition (MET) cell reprogramming processes ( 40 ), CHD4 maintains chromatin architecture mainly at intergenic regions.In the absence of CHD4, chromatin accessibility was increased at closed chromatin, especially at GA T A3 binding peaks.Abnormal chromatin opening led to increased expression of genes unrelated to MET.High-resolution nucleosome mapping suggested that CHD4 prevents MET-unrelated gene expression by mediating nucleosome formation over the GA T A3 binding sites.These results demonstrate that CHD4, and by extension NuRD complex, monitors transcription factor / chromatin interactions and modulates gene regulation by transcription factors.

Cell line and cell culture
MDA-MB-231 and T47D cells were originally obtained from ATCC.Both cells were cultured in DMEM high-glucose medium with 10% FBS (Thermo Fisher Scientific or R&D Systems).Doxycycline-inducible GA T A3 expression system in MDA-MB-231 cells was developed by lentiviral transduction.Ty1-tagged GA T A3 gene was inserted into the pIN-DUCER20 vector, and the lentivirus was generated by the second generation lentiviral plasmids system using 293T cells.pINDUCER20 was a gift from Guang Hu (NIEHS / NIH).psPAX2 and pMD2.G were gifts from Didier Trono (Addgene plasmid #12260, #12259).After antibiotic selection with G418, cell colonies were collected, and GA T A3 expression levels were investigated by western blot.The cell clone that has low basal GA T A3 expression (without DOX treatment) was used in this study.
The pGIPZ vectors and lentiviruses encoding CHD4 shRNA and control shRNA were provided by the NIEHS Epigenomics Core and Viral Vector Core.400 000 cells were plated on 6 cm dishes, and infected with these shRNA lentiviruses at 24 and 32 h after plating.After overnight incubation, the medium was replaced with 4 ml of fresh medium and further incubated for 2 days.The GA T A3 expression was initiated by adding DOX (1 μg / ml at final) in fresh medium.Twelve hours after induction, cells were harvested and resuspended in PBS.

ChIP-seq
ChIP-seq libraries were prepared as previously described ( 41 ).Briefly, DOX treated cells were fixed at 12 h with 1% formaldehyde.Fixed cells were treated with hypotonic buffer containing 10 mM HEPES-NaOH pH 7.9, 10 mM KCl, 1.5 mM MgCl2, 340 mM sucrose, 10% glycerol, 0.5% Triton X-100 and protease inhibitor cocktail (Thermo Fisher Scientific).Chromatin was digested by sonication with Covaris S220 in the lysis buffer containing 20 mM Tris-HCl pH 8.0, 2 mM EDT A, 0.5 mM EGT A, 0.5 mM PMSF, 5mM sodium butyrate, 0.1% SDS and protease inhibitor cocktail.2.5 μg of each antibody was added to each chromatin solution (1 million cells / reaction).After overnight incubation, protein A / G mixed Dynabeads were added, and the samples were rotated for 2 h.Eluted DNAs were reverse crosslinked at 65 • C for 4 h, followed by the incubation with proteinase K for 1 hour and purified by AMPure XP (Beckman Coulter).For the CHD4 ChIP case, 4 units of Micrococcal nuclease (MNase) were added to the cell lysates, followed by a 3-min incubation at 37 • C before sonication.
ChIP-seq libraries were generated by the NEXTflex Rapid DNA-seq kit (PerkinElmer) and sequenced on NextSeq 500 (Illumina, paired-end) at the NIEHS Epigenomics Core Facility.The same data processing protocol to A T AC-seq was used.Mapped reads were converted to a single fragment and used to generate genome coverage tracks on the UCSC Genome Browser and for metaplot analyses.GA T A3 ChIP-seq peaks were defined by HOMER v4.1 with default parameters ( 42 ).
CHD4 ChIP-seq peaks were defined by PeaKDEck ( 43 ) using the following parameters: -sig 0.0001 -bin 300 -back 3000 -npBack 2500000.To carry out correlation and differential analyses, we used read counts from the reference peak sets and processed the data using the S AR Tools pipeline ( 44 ) with edgeR ( 45 ).

A T AC-seq
The A T AC-seq libraries were prepared as previously described ( 41 ,46 ). 25 000 cells (in 25 ul) were transferred to new tubes, and nuclei were isolated with CSK buffer (10 mM PIPES pH 6.8, 100 mM NaCl, 300 mM sucrose, 3 mM MgCl 2 , 0.1% Triton X-100).Nuclei were treated with 2.5 μl of Tn5 Transposase (Illumina) in the standard tagmentation reaction buffer (25 μl).A total of 8 PCR cycles were performed to amplify the DNA fragments, and the libraries were sequenced on NextSeq 500 at the NIEHS Epigenomics Core Facility.The raw sequence reads were filtered based on the mean base quality score > 20.Adapter sequences were removed by Trim Galorre! (Babraham Institute).Processed reads were mapped to hg19 genome using Bowtie 0.12.8 ( 47 ), and uniquely mapped reads (non-duplicate reads) were used for the subsequent analysis.
Peak classification was conducted based on the following criteria.Newly accessible peaks are defined as the GA T A3 peaks that show (i) > 2-fold increase in A T AC-seq signals at GA T A3 peaks ( ±200 bp from peak center) compared to the control (time 0 h) condition and (ii) > 30 normalized reads at the peak flanking ( ±1 kb) region.
Constitutively accessible peaks are defined as the GA T A3 peaks that show > 30 normalized reads at the flanking regions ( ±1 kb).Constitutively inaccessible peaks are defined as the GA T A3 peaks that show < 30 normalized reads at the flanking regions.

Capture MNase-seq
Capture MNase-seq was performed as previously described ( 48 ).Biotinylated RNA probes were designed and purchased via SureSelect Custom DNA Target Enrichment Probes system with the following probe design specification (tiling density 2 ×, masking: least stringent, boosting: balanced).The target regions are listed on Supplementary Table S1 .Nucleosomal fragments were prepared by digesting nuclei with Micrococcal nuclease (MNase).Sequencing libraries were prepared by NEXTflex Rapid DNA-Seq kit (PerkinElmer).Libraries from different conditions were pooled, and 750 ng DNAs were used to perform nucleosomal DNA fragment enrichment at a subset of GA T A3 peaks with the SureSelect Enrichment kit.After the fragment enrichment by RNA probe hybridization and streptavidin pull-down, captured DNAs were amplified by PCR (12 cycles) and sequenced on NextSeq 500 at the NIEHS Epigenomics Core Facility.
The same data processing protocol to A T AC-seq was used for the capture MNase-seq data but duplicate reads were retained.To generate heatmaps, midpoints only from mononucleosomal fragments (120-170 bp) were collected.The midpoint frequency (considered as dyad frequency) was normalized by the data from Time 0.

Migration assay
The control shRNA or CHD4 shRNA expressing MDA-MB-231 cells were prepared as described above using the lentiviruses.The GA T A3-induced cells were cultured for at least 2 days.Four million were seeded in each 6-well plate and grown overnight.The confluent cells were scratched with a 1 ml micropipette tip.After scratching, the cells were washed with 2 ml of pre-warmed serum-free medium, and the same volume of the serum-free medium was added to each well.Cell images were taken at multiple time points (0, 8, 16, 24 and 33 h) by Olympus IX71 microscope, and imageJ (Version: 2.1.0/ 1.53c) was used to quantify wound areas.

Statistical analysis
To calculate adjusted P -values in differential peak and gene expression analysis, the Benjamini-Hochberg method was used (Figures 1 C, 2 B, 4 A, 5 C and D, Supplementary Figure S2 A, B).For the cell migration assay shown in Figure 6 D, the t -test was used to calculate P -values.The Mann-Whitney test was used for box plot comparison ( Supplementary Figure S2 E-G).

CHD4 depletion mediates altered chromatin accessibility and gene expression
To explore the range of chromatin features regulated by CHD4 in somatic cells, we depleted it in MDA-MB-231 basal breast cancer cells using shRNA ( Supplementary Figure S1 A) followed by genome-wide analysis of chromatin accessibility using A T A C-seq ( 46 , 54 ).CHD4 has been shown to regulate tumor growth in MDA-MB-231 cells ( 31 ).The physical location of most A T AC-seq peaks in the genome was unchanged, while approximately 1 / 3 of detected peaks were either lost or gained following CHD4 depletion (Figure 1 A, B).Comparisons at the level of individual biological replicates indicated that loss of CHD4 rather than inherent variability drives the outcome of this comparison ( Supplementary Figure S1 B, C).
In addition to loss or gain of individual peaks, we performed edgeR differential peak analysis to assess whether individual loci had changes in the level of accessibility with or without a change in location.Somewhat paradoxically, most altered A T AC peaks exhibit an increase in transposition following depletion of CHD4 (Figure 1 C, D).We further confirmed that these changes are specific to CHD4 depletion rather than technical or biological variations ( Supplementary Figure S1 D-F).The loci with increased accessibility were largely confined to peaks located greater than 1 kb from an annotated transcription start site (TSS), and this distribution differed from random ( P -value = 0.00001, chi square test) (Figure 1 E, Supplementary Figure S1 G).Peaks with decreased accessibility were associated with transcription start sites at a frequency similar to a random peak set (Figure 1 E).These results suggest that CHD4 may have a different impact on chromatin architecture at promoters than at distal regulatory elements.
We asked whether the alteration in peak intensity reflected changes in the binding behavior of individual transcription factors by assessing the enrichment of these loci for known transcription factor binding motifs using HOMER ( 42 ).Surprisingly, there was considerable overlap in the binding motifs enriched in the peaks with increased accessibility and the motifs enriched in the peaks with decreased accessibility.Motifs for the AP1 family, the ETS family and the RUNX family were present in both enriched motif sets, with an AP1 motif being the most enriched in both sets (Figure 1 F, G).To further access AP1 motif enrichment, we performed the HOMER motif enrichment analysis using the gained (12 868 A T AC-seq peaks, uniquely observed in CHD4 KD cells) and lost peaks (12 735 A T AC-seq peaks, only observed in control cells) defined by peak overlap analysis shown in Figure 1 B. Similarly, motifs for the AP1 family members were significantly enriched at both gained and lost peaks ( Supplementary Figure S1 H, I).This outcome was unexpected, as it suggested that loss of CHD4 leads to change in binding site selection by transcription factors.
To assess whether individual transcription factors were, in fact, relocalized in the genome, we performed ChIP-seq for AP1 family members, JUNB, FRA1 and ATF3 with and without depletion of CHD4.All tested AP1 family members showed thousands of differential bindings upon CHD4 depletion (Figure 2 A).For the case of JUNB, 2190 increased and 3301 decreased peaks were observed (Figure 2 B, C).Slightly smaller numbers of differential binding peaks were observed for FRA1 and ATF3 ( Supplementary Figure S2 A, B).Metaplot analysis at A T AC-seq differential peaks suggested differential impacts of CHD4 depletion on JUNB, FRA1 and ATF3 binding (Figure 2 C, Supplementary Figure S2 C).JUNB binding was higher in CHD4 KD cells at both increased and unchanged A T AC-seq peaks, whereas A TF3 binding was lower in these cells at both decreased and unchanged peaks.Subtle changes were observed in FRA1 ChIP-seq data.These results suggest that although AP1 family members may not be primary factors in modulating chromatin accessibility upon CHD4 depletion, CHD4 still regulates the chromatin binding activities of AP1 family transcription factors.
To ask whether these alterations in chromatin were associated with CHD4 binding, we performed ChIP-seq for CHD4 in MDA-MB-231 cells.While CHD4 ChIP-seq showed reduced signal-to-noise ratios compared to A T AC-seq, the data had sufficient quality for 16 780 peaks to be detected by the PeaKDEck peak calling ( 43 ).CHD4 peaks were frequently observed at open chromatin regions (Figure 2 D), and approximately 80% of CHD4 peaks (13 419 peaks out of 16 780 peaks) overlapped with A T AC-seq peaks in MDA-MB-231 cells (Figure 2 E).The frequency of overlap between CHD4 and A T AC-seq peaks, as well as the A T AC-seq signals at CHD4 peaks, were significantly higher than those in randomly selected genomic regions ( Supplementary Figure S2 E, F).
Differential chromatin accessibility and redistribution of multiple AP1 family members upon CHD4 depletion suggested that CHD4 is important for target site selection by transcription factors.AP-1 proteins have a high affinity to the palindromic sequence 5 -TGA G / C TCA-3 , but transcription factors are known to possess nonspecific or non-consensus DNA binding.Since AP1 motifs are enriched at both increased and decreased A T AC-seq peaks, it is possible that CHD4 regulates the sequence-specific binding activities (motif sampling) of transcription factors.Our peak-centered analyses in Figure 2 may have excluded weaker binding events.To account for all potential binding events, including those with weaker binding, we performed metaplot analyses of JUNB and FRA1 at all potential binding sites identified based on sequence match (Figure 3 A).In the HOMER database, we found > 1 million AP-1 potential binding sites.We first tested if we could detect consensus motif binding of JUNB and FRA1 outside of ChIPseq peaks.We first excluded the observed ChIP-seq peaks to minimize the bias in the ChIP-seq data resulting from motifindependent binding or indirect binding caused by cell fixation (Figure 3 A).Both JUNB and FRA1 exhibited significantly stronger enrichment at loci containing the consensus motif compared to randomly selected genomic regions (Figure 3 B, C, Supplementary Figure S2 F), suggesting that this analysis allows us to capture motif sampling by JUNB and FRA1.When CHD4 was depleted, JUNB and FRA1 binding signals were significantly decreased at the consensus motif sites (Figure 3 D, E, Supplementary Figure S2 G).These results suggest that CHD4 potentially modulates motif sampling activities of JUNB and FRA1, potentially permitting a more promiscuous sampling process.
To examine the functional consequences to gene expression of CHD4 depletion, we performed RNA-seq.FDR < 0.05 and fold change criteria (|fold change| > 1.5) were used to identify significantly altered transcripts.1880 genes were upregulated and 1071 genes were downregulated in steady state MDA-MB-231 cells (Figure 4 A).The number of genes altered in steady state transcript level was also skewed towards gene activation, but not to the extent as alterations in transposition.Integration of A T AC data and gene expression indicated that genes associated with increased transposition had higher steady state transcript levels following CHD4 depletion while genes with decreased accessibility had lower levels of transcript (Figure 4 B).To further assess whether the observed changes are associated with CHD4 chromatin binding, we analyzed the overlap between differentially expressed genes (DEGs) and the peaks identified in CHD4 ChIP-seq data.For this analysis, we selected the gene closest to each CHD4 peak within a 100 kbp range ( Supplementary Table S3 ).Of the 1880 up-regulated genes identified, 506 overlapped with CHD4-associated genes.Similarly, among the 1071 downregulated genes, 302 were found to be associated with CHD4.These findings suggest a potential direct regulatory influence of CHD4 on approximately 20-30% of the DEGs.Gene ontology analysis suggested that altered transcript levels were enriched at genes involved in plasma membrane processes and interaction with the extracellular space (Figure 4 C).Somewhat surprisingly, upregulated transcripts were associated with tissues other than breast (Figure 4   of CHD4 leads to loss of cell-type specificity in the transcriptional program.Direct inspection of the RNA-seq data revealed that the expression levels of multiple transcription factors were altered upon CHD4 depletion including AP1 family members (Figure 4 E).Therefore, the observed changes in AP1 family member expression could contribute to the differential chromatin binding of those proteins shown in Figures 2  and 3 .

CHD4 antagonizes enhancer formation by GA T A3
We observed alterations in chromatin accessibility and transcription factors' binding upon depletion of CHD4.However, we also observed changes in the expression of transcription factor family members, potentially complicating the analysis and data interpretation.Therefore, we moved to a more defined system with a temporal component, transcriptionfactor dependent cellular reprogramming.We established a doxycycline-inducible GA T A3 expression system in MDA-MB-231 mesenchymal breast cancer cells.GA T A3 expression in MDA-MB-231 cells has been shown to induce mesenchymal to epithelial transition (MET) ( 40 ,55 ).We selected a cell clone that shows minimal GA T A3 expression in the absence of doxycycline (hereafter DOX) but expresses biologically relevant amounts of GA T A3 protein upon DOX treatment.In this cell clone, GA T A3 protein expression was observed within 3 h after the addition of DOX to the media and was saturated by 12 h ( Supplementary Figure S3 A).Stable GA T A3 expression was observed for at least 48 h after DOX induction.W e collected A T AC-seq, CHD4 and GA T A3 ChIP-seq and RNA-seq 12 h after DOX treatment to characterize GA T A3 and CHD4-dependent changes in chromatin architecture and gene expression.By comparing transposase accessibility at GA T A3 peaks before and after GA T A3 induction, we characterized three predominant types of loci: loci that transition from inaccessible to accessible (newly accessible), loci where GA T A3 binding occurs within transposase accessible chromatin (constitutively accessible) and loci where GA T A3 binds to inaccessible chromatin that remains inaccessible (constitutively inaccessible) following GA T A3 expression (Figure 5 A, B, green curves).CHD4 was clearly recruited at GA T A3 binding sites ( Supplementary Figure S3 B, C).While we observed an increase in CHD4 binding across all three peak groups, the overlap between CHD4 and GA T A3 at newly accessible and constitutively accessible GA T A3 peaks was substantial ( Supplementary Figure S3 B).RNA-seq analysis revealed that 926 genes were altered in steady state abundance following GA T A3 expression, with slightly more genes being activated (524) than decreased (402) (Figure 5 C).Consistent with previous reports ( 40 ), transcripts with altered levels were enriched in categories involved in mesenchymal to epithelial transition ( Supplementary Figure S3 D).To further confirm the overlap between GA T A3 and CHD4 in breast cancer cells, we performed CHD4 ChIP-seq in T47D cells, where both CHD4 and GA T A3 are endogenously expressed ( Supplementary Figure S3 E).Among the 34 380 GA T A3 peaks that we had previously defined ( 41 ,56 ), CHD4 signals were observed at 16 225 GA T A3 peaks (47%), confirming the frequent overlap between CHD4 and GA T A3 in luminal breast cancer cells.
When CHD4 was depleted in this system, we observed striking alterations in several features.Of loci exhibiting significant alterations in A T AC sensitivity, 82% (4548 of 5558) demonstrated an increase in accessibility (Figure 5 D), similar to control MB-MDA-231 cells (Figure 1 C).Unlike the case in control cells where increased accessibility was overwhelmingly at distal elements and decreased accessibility was somewhat more balanced, changes in transposition in the reprogramming system were overwhelmingly found distant from annotated transcription start sites (Figure 5 E).Globally, altered A T AC loci were enriched in AP1 and RUNX motifs, with GA T A motifs enriched at loci that gain accessibility (Figure 5 F).When we focused analysis on loci with GA T A3 peaks, we found that depletion of CHD4 led to a substantial increase in accessibility at newly accessible peaks where GA T A3 binding leads to enhancer licensing (Figure 5 A) ( 40 ).Surprisingly, peaks where GA T A3 fails to induce chromatin opening in the presence of CHD4 frequently display increased accessibility in its absence (Figure 5 A), suggesting that CHD4 acts to oppose the chromatin-opening ability of GA T A3.
RNA-seq performed following CHD4 knockdown revealed MET-unrelated gene expression at loci linked to constitutively inaccessible sites, exemplar genes are depicted in Figure 6 A and Supplementary Figure S4 .At such loci, GA T A3 binding is not associated with transposase accessibility in the presence of CHD4, in its absence accessibility is evident along with increased transcript level.Gene ontology analysis revealed that the up-regulated genes associated with the constitutively inaccessible peaks were significantly enriched with tissue-specific genes related to GA T A3 function in cell contexts other than breast (Figure 6 B).For instance, GA T A3 is known to be important for trophoblast differentiation and placental development ( 57 ,58 ).Placenta related genes such as VSTM5, NDNF and HAPLN1 were up-regulated in the CHD4 knockdown cells (Figure 5 A, Supplementary Figure S4 ).
CHD4 depletion also impacted the biological outcome of GA T A3-mediated cell reprogramming, mesenchymal to epithelial transition.MET-related gene expression was exacerbated in the CHD4 knockdown cells (Figure 6 C), which is consistent with the increased A T AC-seq signals at newly accessible peaks after CHD4 knockdown.Cell migration assays indicate that GA T A3 expression in the CHD4 knockdown cells still showed MET phenotypes at the cellular level, but the degree of cell migration was modestly impacted by CHD4 knockdown (Figure 6 D).These results suggest that CHD4 acts to constrain the ability of GA T A3 to bind its motif and elicit alterations in gene expression and cellular phenotype.

CHD4 promotes nucleosome formation over transcription factor motifs
We previously reported that nucleosome remodeling patterns are associated with the chromatin opening activities of GA T A3 ( 40 ,48 ).At loci that become accessible following GA T A3 induction (newly accessible sites), nucleosome depletion was observed at the center of GA T A3 binding peaks, while nucleosome repositioning and accumulation were observed at the constitutively inaccessible GA T A3 binding peaks.To understand the impact of CHD4 depletion on nucleosome remodeling following induction of GA T A3, we performed MNase-seq to map nucleosomes during the GA T A3-mediated cellular reprogramming.Newly accessible peaks exhibit the characteristic pattern of MNase-resistant, positioned nucleosomes flanking a moderately resistant region centered over the transcription factor binding motif (Figure 7 A, left panel).When we deplete CHD4, the pattern observed remains unchanged while the amplitude of nucleosome peaks flanking the GA T A3 binding site increases.This result suggests that a larger number of alleles within the population sampled have productively bound GA T A3, creating a phased array of nucleosomes.To gain further insight into GA T A3-induced nucleosome remodeling, we turned to a higher resolution technique, capture MNase-seq, which provides deep mapping of nucleosome positions at individual GA T A3 binding loci ( Supplementary Figure S5 A).In this technique, custom designed biotinylated RNA probes are used to enrich defined loci.We designed the RNA probes against ∼2700 genomic intervals, which contain ∼1800 GA T A3 binding sites and ∼940 negative control loci where GA T A3 fails to accumulate.Mono-nucleosomal fragment midpoint frequency (dyad frequency) was calculated to monitor nucleosome remodeling during GA T A3-mediated reprogramming.In the absence of CHD4, we observed a substantial increase in nucleosomal dyads flanking the GA T A motif with no alteration in the final position of the nucleosomes (Figure 7 B, Supplementary Figure S5 B).These results were consistent with the data from the conventional MNase-seq and suggest that in the absence of CHD4, more alleles within the population sampled exhibit productive binding of GA T A3 and create a phased nucleosomal array flanking the binding site.This remodeling is not accompanied by a notable increase in the width of the nucleosome-depleted region.
At constitutively inaccessible GA T A3 binding sites, we observed a completely different outcome.In conventional  MNase-seq, these GA T A3-bound loci do not exhibit the phased nucleososmes flanking the binding site that are evident in newly accessible peaks.However, upon CHD4 depletion, we observe a clear pattern of phased nucleosomes flanking the GA T A3 binding site that resembles the pattern observed in newly accessible sites (Figure 7 C).Capture MNaseseq confirms this observation, showing movement of nucleosomal dyads away from sites where the GA T A3 motif is lo-cated within the confines of the nucleosome to sites distant from the GA T A motif (Figure 7 D, Supplementary Figure S5 C).At these loci, CHD4 alters the outcome of GA T A3 interaction with the chromatin fiber.The data suggested that loss of CHD4 changes the outcome from GA T A3 bound to the surface of a nucleosome to GA T A3-mediated nucleosome eviction from the binding site and establishment of a phased array of flanking nucleosomes.

Discussion
TFs face multiple challenges in establishing new gene regulatory networks during development, in response to physiologic or environmental signals, and during in vitro cellular reprogramming.These proteins must find appropriate recognition motifs within the genome and they also must contend with physical barriers including chromatin (59)(60)(61).In eukaryotes with large genomes, including humans, TFs with short recog-nition motifs must find the correct loci, and only the correct loci, amongst the potentially millions of matches to their binding motif.In theory, degenerate hexameric binding motifs, such as the WGA T AR consensus binding motif for GA T A3 ( 62 ), should be present about every 500 bp.Based on the human reference genome sequence (hg19), more than 7 million loci contain the GA T A3 consensus motif ( 8 ).In most cases, TFs are present on the order of thousands to tens of thousands of molecules per nucleus, meaning there are roughly 100fold more potential binding sites than TFs ( 5-7 ).In fact, the number of GA T A3 consensus motifs identified in the GA T A3 ChIP-seq data from MDA-MB-231 cells ( 40) is approximately 52 000, which represents < 1% of the potential binding motifs.
Chromatin represents a first-line barrier to inappropriate binding of TFs, and it presents a barrier in multiple ways.A subset of TFs cannot read DNA sequences and bind productively to DNA wrapped around a histone octamer ( 1 ,60 ).Within the context of a nucleosome, the rotational and translational phasing of DNA necessarily obscures some chemical information where it closely approximates the histone octamer surface, making it unavailable for sequence readers ( 12 ,60 ).Biochemical and structural data indicate that the location of binding motifs near the center versus near the periphery of a nucleosome has a strong influence on binding and stability ( 48 , 60 , 63 , 64 ).Higher order structural features of chromatin, including linker histones, assembly into heterochromatin, or partitioning into low contact frequency nuclear compartments are likely to further restrict the available sequence space to be searched by TFs ( 61 ,65 ).Clearly, other factors must contribute to narrowing the search space and increasing the probability that cellular signals will result in activation (or repression) of the correct subset of genes ( 19 ).
CHD4 is important for maintenance of chromatin architecture and is known to suppress gene expression during tissue development, cell differentiation and cell reprogramming ( 25 , 32 , 33 , 66 , 67 ).However, genome-wide mapping finds frequent localization of CHD4 at active gene promoters and enhancers ( 36 , 39 , 68 , 69 ), clouding understanding of mechanistic roles by which CHD4 regulates gene expression.In this study, we investigated the roles of CHD4 in steady state basal breast cancer cells and during mesenchymal-to-epithelial transition (MET).In the steady state, CHD4 depletion resulted in a primarily increased chromatin accessibility leading to abnormal gene expression unrelated to breast cancer cell identity.While the loci with decreased chromatin accessibility by CHD4 depletion are enriched at promoters, increased chromatin accessibility regions are enriched at intergenic regions.Motif analysis at differential peaks suggested that CHD4 modulates chromatin binding of multiple AP1 family members.In fact, when CHD4 was depleted, at least three AP1 family proteins, JUNB, ATF3 and FRA1, were redistributed.This finding is consistent with the previous observation in the mouse embryonic stem cells where Mbd3 restoration led to altered chromatin binding by pluripotency-associated transcription factors ( 70 ).
During GA T A3-mediated MET cell reprogramming, CHD4 knockdown again resulted largely chromatin opening.At both newly accessible and constitutively inaccessible GA T A3 bound loci, we observed evidence for alterations in local nucleosome positioning, consistent with active roles for chromatin remodelers.Further investigation revealed that depletion of the chromatin remodeling enzyme CHD4, a core subunit of the NuRD complex, dramatically altered this outcome leading to new accessibility at previously inaccessible loci and increasing accessibility to transposition at accessible loci.This abnormal chromatin opening, not observed in the presence of CHD4, was associated with altered gene expression and affected cell fate transition at the cellular level.We speculate that our observations reflect a general property attributable to CHD4, and by extension to NuRD complex.NuRD is found with high frequency at open chromatin or enhancers ( 36 , 39 , 68 , 69 ) where it is integral to the process of enhancer decommissioning during development ( 23 , 32 , 68 ) and reprogramming ( 66 ,67 ).We propose that NuRD acts, in part, to antagonize transcription factor driven increases in chromatin accessibility -regardless of the ultimate outcome.It seems plausible that local translational motion of nucleosomes relative to transcription factor binding motifs leads to architectural obstacles to motif recognition.At loci that become accessible, co-binding of multiple TFs or recruitment of other chromatin modification / remodeling enzymes generates a competition between factors promoting and opposing DNA accessibility that ultimately reaches a dynamic equilibrium in which some alleles within the population are accessible to structural probes.At loci that fail to become accessible, failure to recruit activating co-factors leads to generation of a very different type of equilibrium, one in which nucleosome translational position relative to the GA T A motif permits GA T A3 binding to nucleosomal DNA in the absence of detectable accessibility.In this manner, choice of binding sites within the genome can be 'proof-read' by chromatin remodelers simply through enforcing a highly dynamic state ( 71 ,72 ).In principle, such chromatin dynamics would promote two critical outcomes: they would provide assurance against inappropriate transcriptional activation following transcription factor binding at incorrect sites and they would provide an opportunity for rapid decommissioning of enhancers to enable progression to new patterns of gene expression consistent with cellular needs.It is still unclear how chromatin remodeling factors can distinguish 'appropriate' or 'inappropriate' enhancers in steady state cells or during cell reprogramming.A limitation of our study is the potential for indirect effects arising from the loss of CHD4, especially given the complex nature of the CHD4 or NuRD complex, which includes multiple components such as histone deacetylases HDAC1 / 2. Further studies are necessary to understand the fundamental mechanisms underpinning the action of CHD4 action including its recruitment to specific regulatory regions.

Figure 1 .
Figure 1.CHD4 modulates chromatin accessibility at promoters in MDA-MB-231.( A ) Genome browser tracks showing A T AC-seq signals in control or CHD4 knockdown (KD) cells.Unchanged, increased, and decreased regions are selected.( B ) Venn diagram showing the ATAC-seq peak o v erlap between control or CHD4 KD cells.( C ) edgeR differential A T AC-seq peak analysis in CHD4 KD cells.FDR < 0.01 and |fold change| > 2 are applied to define differential peaks.Heatmap showing A T AC-seq signals at increased, decreased, or unchanged A T AC-seq peaks.( D ) Metaplot showing normalized A T AC-seq reads / peak at differential peaks.( E ) Pie chart showing the frequency of TSS and non-TSS peaks in each peak group.(F, G) HOMER de novo motif analysis.Decreased ( F ) or increased A T AC-seq peaks ( G ) are used as input.

Figure 2 .
Figure 2. AP1 family transcription factors are redistributed following CHD4 depletion.( A ) Genome browser tracks showing JUNB, ATF3, and FRA1 ChIP-seq data.Differential peaks between control and CHD4 KD data are highlighted in y ello w .A T AC-seq data are also sho wn as a reference f or open c hromatin regions.( B ) Scat ter plot showing increased (red), decreased (blue), and unchanged (gre y) JUNB ChIP-seq peaks.FDR < 0.01 and |f old change| > 2 are applied to define differential peaks by edgeR.( C ) Metaplot showing normalized ChIP-seq reads / peak at A T AC-seq differential peaks.JUNB (top).ATF3 (middle), and FRA1 (bottom) ChIP-seq signals in control (blue) or CHD4 KD cells (green) are plotted.( D ) Genome browser tracks showing the frequent o v erlap betw een CHD4 and A T AC-seq peaks.CHD4 ChIP-seq w ere perf ormed in control MDA-MB-231 cells.( E ) Venn diagram sho wing the o v erlap betw een A T AC-seq peaks and CHD4 peaks.

Figure 3 .
Figure 3. Genome-wide motif sampling of AP1 family proteins is affected by CHD4 knockdown.( A ) Scheme for defining potential binding target loci.HOMER AP1 or FRA1 motif-containing sites are obtained from the HOMER database.For the downstream analysis, each consensus motif locus was extended to 200 bp, and the o v erlaps within the consensus motif loci and with the observed JUNB or FRA1 ChIP-seq peaks were removed.The retained loci were used for detecting motif sampling activities of JUNB and FRA1.( B ) Metaplot showing JUNB ChIP-seq signals at AP1 consensus motif (blue) or randomly selected genomic (red) regions.( C ) Metaplot showing FRA1 ChIP-seq signals at FRA1 consensus motif (blue) or randomly selected genomic (red) regions.(D, E) Metaplot showing JUNB ( D ) or FRA1 ( E ) ChIP-seq signals at consensus motif regions in control (blue) or CHD4 KD (green) cells.

Figure 4 .
Figure 4. CHD4 knockdown results in aberrant gene expression unrelated to breast cancer program.( A ) Volcano plot showing differential gene expression upon CHD4 depletion.FDR < 0.05 and |fold change| > 1.5 were applied to define differentially expressed genes.( B ) Gene expression of A T AC-seq differential peak associated genes.B o x plots show log 2 fold changes from each peak group.Increased, decreased, and randomly selected unchanged peaks are assigned to closest genes.( C ) Pathw a y enrichment analysis.Most significantly altered genes (fold change ± 2) were used for Go term Cellular Component (CC) analysis by D A VID ( 51 , 52 ).Top 10 pathways are indicated.( D ) Functional annotation of up-regulated genes in CHD4 KD cells.D A VID tissue e xpression annotation w as perf ormed using up-regulated genes.( E ) Gene e xpression of AP1 f amily proteins.A T AC-seq differential peak associated genes.Heatmap shows log 2 fold changes in CHD4 KD cells compared to control shRNA condition.

Figure 5 .
Figure 5. CHD4 knockdown leads to abnormal chromatin opening.( A, B ) Metaplots showing A T AC-seq signals in the control shRNA or CHD4 shRNA transduced cells.Averaged A T AC-seq signals before (b) or after (a) GA T A3 expression are plotted in each peak group.( C ) Volcano plot showing differential gene expression 12 h after GA T A3 expression.Up-and down-regulated genes (FDR < 0.05, |log 2 (fold change)| > 0.5) are highlighted in red and blue, respectively.( D ) ATAC-seq differential peak analysis upon CHD4 knockdown in MET.RNAs were collected 12 h after GA T A3 expression.FDR < 0.01 and |fold change| > 2 are applied to define differential peaks.Heatmap shows A T AC-seq signal intensity in control or CHD4 KD cells at increased, decreased, and randomly selected unchanged peaks.A T AC-seq signals at increased, decreased or randomly selected unchanged A T AC-seq peaks.( E ) Pie chart showing peak annotation defined by HOMER.Increased or decreased A T AC-seq peaks are classified into 8 peak categories.( F ) HOMER de novo motif analysis.Decreased (top) or increased (bottom) A T AC-seq peaks upon CHD4 KD in MET condition are used as input.

Figure 6 .
Figure 6.CHD4 regulates enhancer activities during MET cell reprogramming.( A ) An example of aberrant gene expression at a constitutively inaccessible GA T A3 peak.Genome browser tracks sho w an e xample of up-regulated genes associated with the constitutively inaccessible GA T A3 peaks.A T AC-seq and RNA-seq was performed 12 h after GA T A3 induction.The de novo open chromatin site upon CHD4 KD is highlighted in yellow.VSTM5 is known to be expressed in brain and placenta.( B ) Functional annotation of the constitutively inaccessible peak associated genes.D A VID tissue expression annotation was performed using the up-regulated genes that are associated ( ±100 kb) with the constitutively inaccessible GA T A3 peaks.( C ) Heatmap showing expression levels of MET associated genes.Fold changes were calculated based on the DESeq2 gene counts in the GA T A3-expressed (12 h after DOX treatment) control or CHD4 knockdown cells compared to DOX minus condition.( D ) Bar graphs showing the relative wound closure in the wound healing assay.The wound healing assay was performed in the control or CHD4 knockdown cells before and after GA T A3 expression (16 h).The average values are shown with SDs ( N = 18, 3 biological replicates x 6 technical replicates).

Figure 7 .
Figure 7. High-resolution nucleosome mapping re v eals aberrant nucleosome remodeling induced by CHD4 knockdown.( A ) Metaplot showing averaged dyad frequency at newly accessible peaks in control or CHD4 knockdown cells.The conventional MNase-seq was performed in the control or CHD4 knockdown cells.( B ) Capture MNase-seq results at the 750 selected newly accessible peaks.Heatmap shows dyad frequency at newly accessible sites relative to time 0 h.Ellipses indicate the most enriched nucleosome positioning at the GA T A3 motif flanking region.( C ) Metaplot sho wing a v eraged dy ad frequency at constitutively inaccessible peaks in control or CHD4 knockdown cells.The nucleosome fragments were collected by the conventional MNase-seq.( D ) Capture MNase-seq data at the 750 selected constitutively inaccessible peaks.Heatmap shows dyad frequency at constitutively inaccessible sites relative to time 0 h.Ellipses indicate the most enriched nucleosome positions in control (top) or CHD4 knockdown (bottom) cells.