-
PDF
- Split View
-
Views
-
Cite
Cite
Zhan Zhou, Jingqi Zhou, Zhixi Su, Xun Gu, Asymmetric Evolution of Human Transcription Factor Regulatory Networks, Molecular Biology and Evolution, Volume 31, Issue 8, August 2014, Pages 2149–2155, https://doi.org/10.1093/molbev/msu163
- Share Icon Share
Abstract
Changes in cis or trans regulatory regions are the major driving forces that underlie the evolution of gene expression. Transcription factors (TFs) are the main trans factors involved in transcriptional regulation. Here, we studied the divergence of upstream and downstream regulatory networks between duplicate TFs in light of the Encyclopedia of DNA Elements project. We found that the divergence of upstream regulatory networks was generally smaller than the divergence of downstream regulatory networks. Further analysis showed that the downstream regulatory circuits of duplicate TFs evolve faster in the early stage than the late stage after gene duplication. Upstream regulatory circuits are generally more conserved than downstream regulatory circuits in the early stage and in small TF families. Our results indicate the asymmetric evolution of upstream and downstream regulatory circuits between duplicate TFs, which suggest that after gene duplication, human TF families tend to evolve asymmetrically between coding regions and promoter regions.
Sequence-specific transcription factors (TFs) are proteins that can bind to specific DNA sequences (Latchman 1997). In all living organisms, TFs are the main trans factors involved in transcriptional regulation, and these factors can function either positively or negatively; therefore, TFs play central roles in development, stimuli responses (Spitz and Furlong 2012), and evolutionary innovation (Wray et al. 2003). In the hierarchy of gene regulation, TFs control transcription rates to regulate the expression of target genes, and TFs themselves are regulated by other TFs. These cross-regulatory interactions among TFs form a number of core transcriptional regulatory networks that are cell specific (Neph, Stergachis, et al. 2012), which adds to the complexity of gene regulatory networks in living organisms.
The complexity of gene regulatory networks increases with an increase in the numbers of TFs in the genome (van Nimwegen 2003). In the human genome, there are as many as 1,700–1,900 tentative TF coding genes (Vaquerizas et al. 2009), which can be classified into nine superclasses based on their DNA-binding domains (Wingender et al. 2013). The large number of TFs in the human genome has been generated by the continuous expansion of gene families over the course of evolution (Gu et al. 2002; Dehal and Boore 2005; Magadum et al. 2013). Gene duplication is a major mechanism for the generation of new genes. Duplicate genes can accumulate mutations in both coding and regulatory regions after a gene duplication event and thereby acquire divergent expression and protein functions. The expansion and divergence of TF gene families may play a key role in the evolution of gene regulatory networks (Wray et al. 2003; Teichmann and Babu 2004; Zhang et al. 2004; Tuch et al. 2008; Sellerio et al. 2009; Lang et al. 2010).
A problem in understanding human regulatory evolution is the lack of information about DNA-binding targets for each TF in different cell types. With the development of the Encyclopedia of DNA Elements (ENCODE) project (Dunham et al. 2012), we have had opportunities to explore TF regulatory networks in the human genome. Recently, DNaseI footprinting and chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) methods have been used to detect the DNA-binding regions of human TFs on a genome-wide level (Gerstein et al. 2012; Neph, Vierstra, et al. 2012; Wang et al. 2012). The DNaseI footprint data provide the cross-regulatory information for 475 sequence-specific TFs across 41 cell and tissue types (Neph, Stergachis, et al. 2012). The ChIP-seq data provided DNA-binding information for 119 human TFs across five cell lines (Gerstein et al. 2012). These outstanding works provide precious data resources to further investigate the evolution of TF regulatory networks.
In this study, we compared the evolutionary distance between upstream and downstream regulatory networks of duplicate TFs by employing the cross-regulation data set of 475 human TFs in 41 cell and tissue types (Neph, Stergachis, et al. 2012). We first analyzed the divergence of regulatory networks between members of the upstream stimulatory factor (USF) gene family, which belongs to the basic helix-loop-helix leucine zipper (bHLH-zip) superfamily (Moriuchi et al. 1999). We then identified 1,304 duplicate TF pairs among the 475 TFs and calculated the evolutionary distance to define the divergence of regulatory networks between duplicate genes. Our results indicate that human TFs tend to evolve asymmetrically between upstream and downstream regulatory networks after gene duplication.
Divergence of the USF Gene Family
We first used the USF gene family to study the divergence between duplicate TFs. The USF gene family belongs to the bHLH-zip superfamily and negatively regulates cellular transformation and proliferation by primarily binding to the promoter regions of target genes (Moriuchi et al. 1999). There are two members of USF gene family in the human genome, USF1 and USF2 (Luo and Sawadogo 1996), which shared the most recent common ancestor at the early stage of vertebrate development (supplementary fig. S1, Supplementary Material online). At the sequence level, USF1 and USF2 proteins share the conserved C-terminal bHLH-zip region but have highly divergent N-terminal transcriptional activation domain (Luo and Sawadogo 1996). We used the software DIVERGE 3.0 to analyze the functional divergence of USF1 and USF2 (Gu and Vander Velden 2002; Gu et al. 2013). There are two basic types of functional divergence that can occur between duplicate genes. Type I functional divergence represents amino acid configurations that are conserved in one gene cluster but highly variable in the other, which result in different evolution rates between the duplicate genes. In contrast, type II functional divergence represents amino acid configurations that are conserved in both gene clusters but have different amino acid properties (e.g., hydrophobicity and charge) (Gu 2001; Wang and Gu 2001).We found a significant type I functional divergence between USF1 and USF2 (θI = 0.532 ± 0.099, z-score test, P < 2.5 × 10−10) (Gu 1999). In contrast, we found a nonsignificant type II functional divergence for these genes (supplementary fig. S2, Supplementary Material online) (Gu 2006), which indicates that USF1 and USF2 may have different evolutionary rate at the sequence level.
We further investigated the cross regulatory circuits of the USF gene family with other TFs by using the DNase I footprint data from 41 different tissue and cell types (Neph, Stergachis, et al. 2012; Neph, Vierstra, et al. 2012). In each cell type, the number of TFs that regulate USF1 ranges from 34 to 68 out of a total 475 TFs, whereas only 25–53 TFs regulate USF2 in each cell type (supplementary table S1, Supplementary Material online). We found that USF1 has more upstream TFs than USF2, and all the TFs that regulate USF2 also bind to the promoter region of USF1 in all cell types. This finding shows asymmetric evolution of cis regulatory elements between USF1 and USF2 (fig. 1a). In contrast, in downstream USF regulatory circuits, the downstream target genes show a significant divergence between USF1 and USF2 regulatory circuits (fig. 1b). We used the Czekanowski–Dice distance to describe the divergence of the regulatory circuit between duplicate TFs, which range from 0 to 1 (i.e., from the most conserved [D = 0] to the most divergent [D = 1]) between duplicate genes (Martin et al. 2004; Zou et al. 2012). From footprint data across the 41 cell types, the mean Czekanowski–Dice distance between USF1 and USF2 in the upstream regulatory circuits (D-upmean = 0.133 ± 0.004) was significantly smaller than that of the downstream regulatory circuits (D-downmean = 0.707 ± 0.017) (Wilcoxon signed rank test, P = 4.5 × 10−13). We further examined the proximal targets of USF1 and USF2 using ChIP-seq data from five cell lines (Gerstein et al. 2012). USF1 and USF2 have 449 and 268 proximal targets, respectively, and share 208 proximal targets (fig. 1c). The mean distances of downstream regulatory networks are also larger than those of the upstream regulatory circuits (Wilcoxon rank sum test, P = 1.6 × 10−4) (fig. 1d). These results show that the downstream regulatory circuit is more divergent than the upstream regulatory circuit between USF1 and USF2, which indicate a significant functional divergence between these genes since the gene duplication event occurred.

Gene regulatory networks of the USF gene family. Upstream (a) and downstream (b) regulatory networks of the USF gene family among 475 TFs according to footprint data across 41 human cell types and tissues; (c) downstream regulatory network with proximal targets of the USF gene family according to ChIP-seq data from five main human cell lines; and (d) mean Czekanowski–Dice distances of upstream and downstream regulatory networks between USF1 and USF2 from (a), (b), and (c). P values were determined according to the Wilcoxon signed rank test and Wilcoxon rank sum test.
Divergence of Regulatory Circuits After the Duplication of TFs
Because the upstream regulatory network is more conserved than the downstream regulatory network between USF1 and USF2, we tested whether this pattern is common in human TF gene families. There are 475 TFs with DNase I footprint data across 41 diverse cell and tissue types. According to the Ensembl database (Release 71) (Kinsella et al. 2011; Flicek et al. 2013), 393 out of the 475 TFs form 1,304 duplicate gene pairs and can be clustered into 81 gene families. The most recent ancestors of these TFs cover Opisthokonta to Mammalia. To estimate the regulatory divergence between duplicate TFs, we calculated the Czekanowski–Dice distance for each duplicate gene pair in upstream and downstream regulatory circuits across 41 cell types. The mean upstream distance (D-up) and the mean downstream distance (D-down) of each duplicate TF pairs in 41 cell types are plotted in figure 2a. After excluding 128 out of the total 1,304 (9.8%) duplicate TF pairs, which lack necessary information in either upstream or downstream regulatory circuits, we found that much more duplicate TF pairs have small Czekanowski–Dice distance (D < 0.5) in upstream regulatory circuits (255, 19.6%) than in downstream regulatory circuits (16, 1.2%). Next, we examined the mean D-up and mean D-down of the 81 TF families (fig. 2b), 35 (43%) TF families had small upstream distance (D-upmean < 0.5), whereas only one TF family (the Sp family) had a small downstream distance (D-downmean < 0.5).

Czekanowski–Dice distances of upstream and downstream regulatory networks among duplicate TF pairs (a) and TF families (b). (a) The shapes of the nodes are correspond to the level of identity between duplicate TFs accordingly: 0–30% (circle), 30–50% (triangle point up), and 50–100% (plus). The colors depict the duplication times: Black (before Bilateria), red (Chordata), and green (after Vertebrata and Euteleostomi). (b) The nodes are colored according to TF family size: 2–3 (black), 4–5 (red), 6–10 (green), and 11–27 (blue). The shapes of the nodes correspond to the DNA-binding domains of each TF family: Basic domain (circle), zinc-coordinating DNA-binding domain (triangle point up), helix-turn-helix domain (plus), other all-alpha-helical DNA-binding domain (point up square), alpha-helices exposed by beta-structure (diamond), immunoglobulin fold (triangle point down), beta-hairpin exposed by an alpha/beta-scaffold (square cross), beta-sheet binding to DNA (star), and not yet defined DNA-binding domain (square). The sidebars show the number of TF pairs (a) or families (b) in each interval of Czekanowski–Dice distances (divided by 0.1).
We further calculated the mean Czekanowski–Dice distances of upstream regulatory circuits (D-up) and downstream regulatory circuits (D-down) among all 1,304 duplicate TF pairs. The overall upstream distance (D-upmean = 0.748 ± 0.008) was significantly smaller than the downstream distance (D-downmean = 0.838 ± 0.004) (Wilcoxon signed rank test, P = 9.1 × 10−10), which indicates that the divergence of upstream regulatory circuits may be larger than that of downstream regulatory circuits between duplicate TF pairs. In short, our analysis shows that the upstream and downstream regulatory divergence values are relatively large between a majority of duplicate TF pairs. In general, among duplicate TF pairs, upstream regulatory circuits are more conserved than downstream regulatory circuits, and this pattern is even more significant at the TF family level (Wilcoxon signed rank test, P = 1.1 × 10−08).
We further used the data from the work of Gerstein et al. (2012) to extract the promoter-proximal targets of the 119 TFs across five cell lines. We compared the whole genome-wide promoter-proximal targets of duplicate TF pairs, of which 46 duplicate TF pairs were also covered by footprint data. The mean Czekanowski–Dice distances of the regulatory networks between these 46 duplicate TF pairs are shown in figure 3a. From the footprint data, the mean upstream distance (D-upmean = 0.616 ± 0.046) was significantly smaller than the mean downstream distance (D-downmean = 0.782 ± 0.021) (Wilcoxon signed rank test, P = 5.0 × 10−4) and was also significantly smaller than the downstream distance from the ChIP-seq data (D-down-ChIP-seqmean = 0.805 ± 0.029) (Wilcoxon signed rank test, P = 3.4 × 10−6). These results verified that the promoter regions might be more conserved than the coding sequences of human TF gene families.

Czekanowski–Dice distances of upstream and downstream regulatory networks of duplicate TF pairs. (a) With both footprint and ChIP-seq data; (b) relationship with similarity; (c) relationship with gene age; (d) relationship with TF family sizes, and the number of TF families in each size: 2–3: 21, 4–5: 19, 6–10: 20, and 11–27: 21. P value was determined according to the Wilcoxon signed rank test.
Downstream Regulatory Circuits Diverge Faster in the Early Stage After Gene Duplication
We investigated the relationship between the divergence of regulatory circuits and the sequence similarity of duplicate TFs (fig. 3b). According to their sequence identities, the duplicate TF pairs were classified into three groups: 0–30%, 30–50%, and 50–100%. As shown in figure 3b, the mean D-up was smaller than that of the D-down for each group (Wilcoxon signed rank test). Moreover, we observed that the D-up decreased faster than the D-down, along with an increase in sequence identity between duplicate TF pairs. Different classifications of duplicate TF pairs show similar patterns (supplementary fig. S3a and Supplementary Data, Supplementary Material online).
Additionally, we dated the gene duplication time for each duplicate TF pair back to Bilateria, Chordata, and Vertebrata stages, respectively. As shown in figure 3c, the mean D-up was smaller than that of the D-down at each stage (Wilcoxon signed rank test). The vertebrate duplicate TFs have smaller upstream and downstream regulatory circuit distances than the remaining ancient duplicate groups (Wilcoxon rank sum test). However, the distance of downstream regulatory circuits varies much more slowly than that of the upstream regulatory circuits for these three groups of duplicate TF pairs.
The relationship between TF family size and regulatory circuit divergence is depicted in figure 3d and supplementary figure S3c and Supplementary Data, Supplementary Material online. The mean Czekanowski–Dice distances of both upstream and downstream regulatory circuits increase with increasing TF family size. In particular, the upstream distance increases drastically when the family size is larger than five (Wilcoxon rank sum test, P = 5.0 × 10−3). In addition, the difference between D-up and D-down is much smaller in large TF families than small TF families.
These results indicate that the downstream regulatory circuits of duplicate TFs evolve faster in the early stage than in the late stage after gene duplication. The upstream regulatory circuits are generally more conserved than downstream regulatory circuits in the early stage and in small TF families. The difference between D-up and D-down of bilaterian duplicate genes is much smaller than that of vertebrate duplicate genes.
Concluding Remarks
Transcriptional regulatory networks have been shown to evolve very rapidly, as a result of changes in cis elements and trans factors (Wray et al. 2003; Wray 2007; Tuch et al. 2008; Villar et al. 2014), which may experience different selection pressures (Emerson et al. 2010; Schaefke et al. 2013). Notably, in some cases, cis and trans mutations may experience co evolution (Kuo et al. 2010), whereas some phenotypic changes in closely related species are more likely to result from mutations in cis regulatory regions rather than in trans factors (Wray 2007).
Applying high-throughput technologies to comparative analyses of the binding profiles between duplicate TFs has yielded novel insights into the complexity of TF regulatory networks and their underlying evolutionary mechanisms. Our work focuses on the divergence of the cross-regulatory circuits of 475 human duplicate TFs using the ENCODE data. Our results show that the upstream and downstream regulatory circuits are dramatically divergent among a majority of duplicate TF pairs and that the downstream regulatory circuits are more divergent than the upstream regulatory circuits. The dramatic levels of divergence of upstream regulators and downstream targets between duplicate TFs implies that a rapid rewiring of both upstream and downstream regulatory circuits likely occurs; therefore, the TF regulatory networks may evolve rapidly after TF gene duplication. This finding indicates that the functional divergence between duplicate TFs may play an important role in the complexity and novelty of TF regulatory networks. After gene duplication, duplicate TFs may diverge in both regulatory regions and coding sequences. A divergence of promoter regions may result in a difference between upstream regulators, which may in turn lead to a divergence of gene expression. The divergence of coding regions, especially that of the DNA-binding domains or trans-activating domains, may lead to recognizably different cis elements or different regulatory interactions between duplicate TFs; these scenarios could in turn result in the divergence of downstream targets. Our results show that mutations in coding regions that result in divergent downstream regulation may contribute more to the functional divergence between duplicate TFs than mutations in promoter regions, which lead to divergent upstream regulation. This finding provides novel clues for the investigation of the relationship between changes in cis elements and trans factors as well as highlights the importance of TF gene family expansion and how they contribute to biological complexity.
Materials and Methods
Data
The DNase I footprints data of 475 human TFs were extracted from the work of Neph, Stergachis, et al. (2012). This data set includes 225,625 combinations of TF-TF regulatory interactions within promoter-proximal regions among 475 human TFs across 41 different cell and tissue types, including epithelial cells, endothelial cells, blood cells, cancer cells, fetal cells, and embryonic stem cells (Neph, Stergachis, et al. 2012). The ChIP-seq data of 119 human TFs were extracted from the work of Gerstein et al. (2012). This data set includes genome wide binding profiles of 119 human transcription-regulated factors over five main cell lines, including GM12878, K562, HepG2, HeLa-S3, and H1-hESC (Gerstein et al. 2012).
Identification of Duplicate Genes and Gene Families
Human paralogous genes and gene family information were downloaded from BioMarts in the Ensembl database (Release 71) (Kinsella et al. 2011; Flicek et al. 2013). The gene similarity and most recent ancestor of each duplicate gene pair were also included.
Defining Regulatory Divergence between Human Duplicate TFs

The value Δ12 is the number of upstream regulators or downstream targets that differ between one duplicate TF pair; is the number of upstream regulators or downstream targets of at least one of the duplicate genes; and
is the number of shared upstream regulators or downstream targets between a duplicate pair. The Czekanowski–Dice distance ranges from 0 to 1, that is, from sharing all (D = 0) or none (D = 1) of the upstream regulators or downstream targets between duplicate genes.
Tools
R was used for the statistical analysis and plotting (http://www.R-project.org, last accessed May 23, 2014). Clustal X 2.1 was used for the multiple sequence alignment (Larkin et al. 2007). MEGA 6 was used for the phylogenetic analysis and tree reconstruction (Tamura et al. 2013). Cytoscape 3.0 was used for the network visualization (Cline et al. 2007; Saito et al. 2012).
Acknowledgments
The authors thank Yangyun Zou and Libing Shen for their critical reading of this manuscript and Gangbiao Liu for technical support. They also thank the reviewing editor and two anonymous reviewers for their constructive comments and valuable suggestions. This work was partly supported by grants from Fudan University and Iowa State University, a grant from the Ministry of Science and Technology China (2012CB910101), a grant from the National Science Foundation of China (31272299), the Shanghai Pujiang Program (13PJD005) to Z.S., and a General Financial Grant from the China Postdoctoral Science Foundation (2013M531117) to Z.Z.
References
Author notes
Associate editor: Gregory Wray