Enhancer networks revealed by correlated DNAse hypersensitivity states of enhancers

Mammalian gene expression is often regulated by distal enhancers. However, little is known about higher order functional organization of enhancers. Using ∼100 K P300-bound regions as candidate enhancers, we investigated their correlated activity across 72 cell types based on DNAse hypersensitivity. We found widespread correlated activity between enhancers, which decreases with increasing inter-enhancer genomic distance. We found that correlated enhancers tend to share common transcription factor (TF) binding motifs, and several chromatin modification enzymes preferentially interact with these TFs. Presence of shared motifs in enhancer pairs can predict correlated activity with 73% accuracy. Also, genes near correlated enhancers exhibit correlated expression and share common function. Correlated enhancers tend to be spatially proximal. Interestingly, weak enhancers tend to correlate with significantly greater numbers of other enhancers relative to strong enhancers. Furthermore, strong/weak enhancers preferentially correlate with strong/weak enhancers, respectively. We constructed enhancer networks based on shared motif and correlated activity and show significant functional enrichment in their putative target gene clusters. Overall, our analyses show extensive correlated activity among enhancers and reveal clusters of enhancers whose activities are coordinately regulated by multiple potential mechanisms involving shared TF binding, chromatin modifying enzymes and 3D chromatin structure, which ultimately co-regulate functionally linked genes.


Figure 2. Fraction of significantly correlated enhancer pairs decreases monotonically with increasing distance between the enhancers, even when a) an FDR test conducted on a common pooled background is used. (Bin-wise fractions reflect post-test partitioning of enhancer pairs based on genomic distance), and b) a background of trans-chromosomal pairs is used.
This cluster includes 12 strong (blue ticks) and 54 weak enhancers (red ticks) as annotated by ChromHMM. DHS (black ticks) in 5 representative cell types are shown for all enhancers. The figure clearly illustrates the correlated activity of these enhancers across the cell types. This cluster, which was constructed without regard to motif co-occurrence, in fact also broadly shares 2 motifs (magenta ticks).

Supplemental Tables
Supplementary Table 1. 73 cell types were sorted into 37 clusters. One cell type from each cluster (first in row) was used as the representative for the cluster. See text for how the representative was selected. Supplementary Table 2. Gene Ontology (GO) annotation terms for the clusters of target genes corresponding to correlated enhancer clustering with the highest ratio of enrichment terms between itself and a background gene cluster. In this list are GO terms separated by targeted gene cluster with adjusted p-values < 0.0005 and that are supported by three or more genes in the cluster. 7 of 52 clusters were enriched for at least one term that met this highly stringent standard. There were 149 separate instances of enrichment. This enhancer cluster was identified using the following parameters: min mean mutual information = 0.2, minimum cluster size = 20, minimum percent occupancy for most enriched motif = 0.0. Background clusters are matched for chromosome, the number of enhancers and signature of inter-enhancer distances, but consist of otherwise random enhancers. GO enrichment analysis performed with R's GOstats package. Adjusted p-value = 0.05*pvalue/ q-value. ________________ cluster size: 65 genes  Figure 6 (see legend in Figure  6 for more information). In result section 5, we sought evidence of co-regulation by comparing the frequency of co-occurrence for each motif in correlated enhancer pairs to its expected frequency. To make these tests more conservative, instances of co-occurrence in a pair were only counted when there was at least one tissue in which both pair members were active and the cognate TF expressed. Motifs were not considered for which binding TF information was not available or that bind to TFs coded for by two or more genes.

Gene Symbol Gene Description GO Slim Terms
Approximately one-half (509) of the 981 motifs qualified. TFs were considered expressed in a given tissue if the normalized tag count density exceeded 0, where 0 was chosen due to the lack of any discontinuity in the distribution of tag count densities. (Based on this criterion, on average < 30% of TFs are expressed in each tissue). Under these conditions, there were a total of 67 motifs that cooccurred significantly more often than expected (FDR 5%, based on p-values from Fisher Exact Test) and present in at least 20 pairs, compared to zero motifs that occurred more often than expected in uncorrelated pairs. 20 of the 52 motifs previously found to co-occur out of 981 motifs were among the set of 67, in spite of the reduced test set of motifs. When thresholds of expression higher than 0 were used similar, if fewer, sets of significant motifs resulted (while still no motifs in random pairs significantly co-occurred).

Correlated enhancer clusters share common regulatory motifs
We extended the pair-wise motif co-occurrence analyses to clusters of correlated enhancers. Disjoint clusters with at least 10 enhancers were greedily identified such that mean MI for all pairs within the cluster was at least 0.2 (other thresholds do not change the conclusion). Each TRANSFAC motif was assessed for enrichment in each cluster relative to other clusters based on a Fisher Exact Test, and significance was corrected for multiple testing. At a FDR threshold of 5%, for the 415 clusters, there were 44 instances of cluster-specific enrichments. In contrast, for a background set of 415 clusters using randomly chosen enhancers (mean pairwise I within a cluster << 0.1) sampled to match total motif occupancy, mean GC content, and the cluster size of the foreground, there were only 2 instances of cluster-specific enrichment.

Correlated enhancer pairs are potentially co-regulated
Co-regulated enhancers tend to share common motifs {Berman, 2004 #7}. To investigate whether the enhancer pairs with correlated activity are potentially co-regulated, next we tested whether correlated enhancers share significantly greater numbers of motifs than expected. We quantified motif overlap between the two enhancers using Jaccard index, defined as the ratio of the sizes of the intersection and the union of the two motifs sets. Separately for each distance-bin we compared Jaccard index values for the highly correlated enhancer pairs with those for pairs in the background using a Wilcoxon rank-sum test. The foreground and the background enhancer pairs were selected as for Result section 5 above. We found that in every distance bin the foreground pairs have a significantly greater fraction of shared motifs, with p-values ranging from 1.6e-04 to 6.1e-33 (Table 3a). The result remains highly significant when we repeated the analysis at the level of motif clusters instead of individual motifs (see M&M). As expected, the difference between foreground and background is amplified when only 52 significantly co-occurring motifs (section 5) were used to calculate Jaccard index (Table 3b). These results suggest that enhancer pairs highly correlated in their chromatin state share multiple motifs and are likely to be co-regulated.

Presence of shared motifs is predictive of enhancer DHS correlation
We assessed, using machine learning, whether the presence of common motifs can predict correlated activity of a pair of enhancers. For each enhancer pair we assigned one attribute per motif. The value of the attribute was set to 1 if both enhancers had a motif instance and 0 otherwise. We then trained and tested a support vector machine (SVM) to discriminate between the foreground (FDR 0.01% was used for computational tractability) and the background enhancer pairs, using 10-fold cross validation. When using all 981 motifs as attributes, the SVM achieved an overall average classification accuracy of 73%.
Importantly, there was very little reduction in performance (70%) when the model used only the 52 significantly co-occurring motifs (section 5). However, when we used 52 random motifs, the SVM accuracy was reduced to 55%, not much greater than random expectation of 50%. This result suggests that shared occurrence of a specific set of motifs is predictive of correlated enhancer activity.

Interactions between enhancer motifs and chromatin modification enzymes
To further probe the potential involvement of chromatin modification enzymes (CME) in regulating correlated enhancer activities, we assessed CMEs for their preferential interactions with the 52 motifs (  (Table 2). In contrast, there was no CME that preferentially interacted with nonsignificant TF. This result is especially interesting given that overall, the 146 significant TFs do not interact with CMEs any more than the other 2227 TFs.

Targets of correlated enhancer clusters have correlated expression and shared function
Next we extended our analyses in section 9 to clusters of correlated enhancers. We identified clusters of five or more enhancers that were mutually correlated (various thresholds from 0.2 to 0.5 were used), while enriched for at least one of the previously identified significantly enriched motif cluster. For each enhancer cluster a control cluster was created from non-correlated enhancers that mirrored the former's size and genomic footprint (i.e. intra-cluster genomic distances). As was true for correlated enhancer pairs, putative targets of correlated clusters (i.e., the set of genes nearest to each enhancer), were more highly correlated in their normalized RNA-seq transcript counts than were background clusters. For each triplet of thresholds for (i) minimum cluster size (5-20), (ii) minimum pairwise I (0.2-0.5) within a cluster, and (iii) minimum fraction of cluster members (0.7-0.8) harboring the most enriched meta-motif, the genes targeted by enhancers in clusters had higher Spearman correlation of transcription levels than the matching set of background enhancer clusters. For each parameter triplet, we compared the foreground and background for mean pair-wise correlation of expression within clusters. For the entire range of parameters, mean expression correlation within foreground clusters was consistently greater that for corresponding expression correlations within background clusters. Due to the variability in cluster counts for different parameters, p-values ranged from 0.02 to 4.1e-15 (Wilcoxon rank-sum test). These results suggest that gene targets of correlated enhancer clusters with shared motifs are co-expressed and presumably co-regulated.
Next we assessed enrichment of GO biological processes amongst the targets of an enhancer cluster using R's GOstats package. Enhancer clusters also revealed consistently greater GO functional enrichment than the background clusters. Across 10 parameter settings, the ratio of enriched GO terms (at FDR 0.01) per foreground cluster to enriched GO terms per background cluster ranges from 1.3-fold to 4.8-fold. On average, there is almost 3-fold higher GO term enrichment in the foreground (19.1 terms per cluster). When the FDR threshold is set to ~0 (i.e., p < 1e-8), there is 5-fold higher enrichment, on average, in the foreground (7.5 terms per cluster). As an example, for the parameter setting with the greatest fold enrichment of GO terms, the enriched terms are shown, separated by cluster, in Supplementary Table 3. These terms are consistently revealed across all parameters settings.
Together, the GO enrichment and gene expression results illustrate that co-expression of genes with shared function is coordinately regulated across tissues by enhancers that share motifs and are epigenetically correlated across the same tissues.