Epigenetic program and transcription factor circuitry of dendritic cell development

Dendritic cells (DC) are professional antigen presenting cells that develop from hematopoietic stem cells through successive steps of lineage commitment and differentiation. Multipotent progenitors (MPP) are committed to DC restricted common DC progenitors (CDP), which differentiate into specific DC subsets, classical DC (cDC) and plasmacytoid DC (pDC). To determine epigenetic states and regulatory circuitries during DC differentiation, we measured consecutive changes of genome-wide gene expression, histone modification and transcription factor occupancy during the sequel MPP-CDP-cDC/pDC. Specific histone marks in CDP reveal a DC-primed epigenetic signature, which is maintained and reinforced during DC differentiation. Epigenetic marks and transcription factor PU.1 occupancy increasingly coincide upon DC differentiation. By integrating PU.1 occupancy and gene expression we devised a transcription factor regulatory circuitry for DC commitment and subset specification. The circuitry provides the transcription factor hierarchy that drives the sequel MPP-CDP-cDC/pDC, including Irf4, Irf8, Tcf4, Spib and Stat factors. The circuitry also includes feedback loops inferred for individual or multiple factors, which stabilize distinct stages of DC development and DC subsets. In summary, here we describe the basic regulatory circuitry of transcription factors that drives DC development.

* To whom correspondence should be addressed. Tel: +49-241-8080760; Fax: +49-241-8082008; Email: martin.zenke@rwth-aachen.de. † These authors contributed equally to this work.  were excluded from further analysis. A peak was considered differential if it produced a significant fold change and p value < 0.01 after Benjamini-Hochberg multiple test correction (2). De novo motif detection for PU.1 was performed with MEME-ChIP (3) by providing 200bp regions around the summits of differential peaks. The motif with highest number of binding sites was reported.

Identification of PU.1 co-binding transcription factors
First, differentially expressed genes upon DC commitment (MPP versus CDP) and specification (cDC versus pDC) were detected described as above (Supplementary Figure   S5A). Second, transcription factor motifs were collected from Jaspar (4), Uniprobe (5) and Homer (6) (Supplementary Figure S5B). AICS Irf8 motif was obtained by applying the tool MEME-ChIP with default parameters (3) to Irf8 ChIP-seq peaks (7). The Irf8 ISRE and IEACS motifs were obtained from the sequences provided in the literature (8). The motifs of transcription factors with low gene expression or low variation upon DC development were excluded from further analysis.
Next, peaks were assigned to genes if they were in the proximal promoter (1kb upstream of the TSS) and in the gene body (Supplementary Figure 5C). To detect distal peaks, we also associated peaks when they were 50kb around the TSS and there was no other gene in between the TSS and the peak. Binding site detection was then performed within PU.1 differential peaks close to differentially expressed genes on the same cell type. All differential peaks were corrected to have uniform size, i.e., 250bp +/-the peak summit. Motif search was based on Biopython (9), utilizing the distribution of the information content of each motif to define a bit score threshold on the basis of a false discovery rate (FDR) test (Supplementary Figure S5D). We used the FDR value of 0.1 for all binding sites.
We selected random genomic regions with replacement from the mouse genome (mm9) and devised a method to make random regions to have the same proportion of CG content and mappability characteristic as DNA regions inside PU.1 peaks. For that, the whole genome was split into bins of 1000 bps. Then, the proportions of CpGs (ratio between the number of CpGs and the sum of the number of Cs and Gs) and unmappable regions (number of base pairs that overlap with regions that occur four or more times in the genome) were evaluated for each bin. Finally, the filtered random data set is built by randomly selecting bins with up to 1% difference of CpG and unmappable regions proportion from the regions of differential PU.1 binding. The unmappable regions were obtained upon processing the alignability data set (50bp window) from the mappability track of the ENCODE repository and blacklisted genomic regions (10). The final number of regions equals 10 times the number of regions in the largest differential PU.1 peaks data set.
Finally, we employed a one-tailed Fisher's exact test to measure if the proportion of differential PU.1 peaks close to differentially expressed genes with at least one transcription factor binding site is higher than the proportion of binding sites in random regions. The test was repeated for all motifs and cell-specific differential peaks. Final p values were corrected using the Benjamini-Hochberg method (2). The corrected p values (or enrichment scores) were visualized in heat map format ( Figure 4A; Supplementary Figure S5E). The transcription factors with p value < 0.05 were predicted as PU.1 co-binding partners.

Construction of DC regulatory networks
The PU.1 co-binding transcription factors identified above and the key DC regulators i.e., enriched transcription factors in red; non-enriched transcription factors in gray; selected key genes in white. Networks were generated by Cytoscape (19).

Detection of H3K4me1 footprints
Transcription factor binding sites are likely to occur between two regions with high levels of active histone marks (20,21), referred to as footprints. We employed a modified version of

Dynamics of gene expression during DC development in vivo in FACS sorted progenitors and DC subsets.
Heat map representation of gene expression (mRNA) in MPP, MDP, CDP, cDC and pDC obtained by FACS sorting from mice using the same 3194 differentially expressed genes as in Figure 1A. MPP, MDP and CDP, 2, 3 and 3 replicates, respectively; cDC, a panel of CD8 + , CD8 -CD4 + , CD8 -CD4 -CD11b + subsets in 5, 4 and 5 replicates, respectively; pDC in 5 replicates (GSE15907; (22)) .Red, high expression; blue, low expression.      We identify differential peaks (DP; i.e. PU.1 peaks) close to differentially expressed (DE) genes of (A) in the same cell type, i.e. cDC differential peaks close to cDC differential genes. (D) Motif search from (B) within identified peak regions of (C). (E) Enrichment analysis of transcription factor motifs within differential peaks (DP) of (C) indicates transcription factors co-binding with PU.1 in a cell type-specific manner. (F) Construction of cell type-specific transcription factor regulatory networks by connecting enriched transcription factors to its putative targets. For example, Ap1 has an edge to Bcl6 because: Bcl6 is up-regulated in cDC, Bcl6 has a Ap1 binding site on a cDC differential peak close to its promoter and Ap1 binding is enriched within cDC differential peaks.    Stat3/Stat5 Figure S7