YHMI: a web tool to identify histone modifications and histone/chromatin regulators from a gene list in yeast

Abstract Post-translational modifications of histones (e.g. acetylation, methylation, phosphorylation and ubiquitination) play crucial roles in regulating gene expression by altering chromatin structures and creating docking sites for histone/chromatin regulators. However, the combination patterns of histone modifications, regulatory proteins and their corresponding target genes remain incompletely understood. Therefore, it is advantageous to have a tool for the enrichment/depletion analysis of histone modifications and histone/chromatin regulators from a gene list. Many ChIP-chip/ChIP-seq datasets of histone modifications and histone/chromatin regulators in yeast can be found in the literature. Knowing the needs and having the data motivate us to develop a web tool, called Yeast Histone Modifications Identifier (YHMI), which can identify the enriched/depleted histone modifications and the enriched histone/chromatin regulators from a list of yeast genes. Both tables and figures are provided to visualize the identification results. Finally, the high-quality and biological insight of the identification results are demonstrated by two case studies. We believe that YHMI is a valuable tool for yeast biologists to do epigenetics research.


Introduction
Histone modification and chromatin remodelling play an important role in DNA replication, transcription and DNA repair (1)(2)(3). The N-terminal and C-terminal tails of histones are subject to post-translational modifications, including acetylation, methylation, phosphorylation and ubiquitination (4,5). Several lines of evidence have shown that these modified histones provide the binding sites for effector proteins to elicit specific and selective effects on these biological processes (1,6). To date, a large number of domains within these effector proteins that can be associated with acetylated, methylated and phosphorylated histones have been characterized. For example, the bromodomain binds to acetylated histones (7); the BRCA1 C Terminus (BRCT) domain binds to the phosphorylated histones (8); and the plant homeodomain (PHD), chromodomain and tudor domains associate with methylated histones (9)(10)(11)(12)(13)(14). A recent study further revealed that tandem PHD fingers of MORF/MOZ acetyltransferases display selectivity for acetylated histone H3 (15). Many chromatin-associated proteins themselves contain these domains or are partnered with effector proteins containing these domains. Therefore, multivalent and combinatorial interactions are likely to be an important aspect of how these chromatin-associated proteins worka concept known as the histone code hypothesis (1,6). However, how these multivalent and combinatorial interactions contribute to various biological processes in response to environmental stimuli remains incompletely understood.
In yeast Saccharomyces cerevisiae, histones are modified by various histone modification complexes, including Spt-Ada-Gcn5-Acetyltransferase (SAGA), NuA4, COMPASS, Set2 and Dot1 (3,4). SAGA and NuA4 complexes are able to acetylate histone H3 and histone H4 (H4ac), respectively (3,16,17). The COMPASS complex, Set2 and Dot1 are able to methylate lysine 4 of histone H3 (H3K4me), lysine 36 of histone H3 (H3K36me) and lysine 79 of histone H3 (H3K79me), respectively (3). Information about the other histone modifications and their related enzymes is listed in Table 1. The chromatin remodelling complexes include SWI/SNF, RSC, ISWI, CHD, INO80 and SWR1 that are able to slide, displace and exchange histones (3). The histone modification and chromatin remodelling complexes often form a large complex with many subunits containing bromodomain, chromodomain or PHD domain. Therefore, it is likely that the combination of these domains plays an important role in selectivity and specificity of these complexes to their target genes. There are about 400 proteins that regulate gene transcription (18). Therefore, many open questions remain. How many regulators and specific histone modifications are needed to transcribe a subset of genes in response to environmental stimuli? Are there combination patterns of these domains and their corresponding modified histones in order to transcribe a subset of genes?
Previous studies have produced several valuable genomewide ChIP-chip/ChIP-seq datasets of histone modifications and binding occupancy of histone/chromatin regulators in yeast Saccharomyces cerevisiae to facilitate epigenetics research (18)(19)(20)(21)(22)(23)(24). Since these datasets are scattered across the literature, several resources have been developed to provide histone modification information in yeast. For example, the Saccharomyces Genome Database (SGD) comprehensively collects the yeast histone modification datasets from the literature and allows users to visualize various histone modifications using JBrowse (a genome browser) (25). The Yeast Nucleosome Atlas (YNA) implements a tool for users to retrieve a list of yeast genes whose promoters and/or coding regions contain a specific combination of histone modifications (26). The ChromatinDB implements a tool for users to analyze specific histone modifications from the input gene list (27). These three resources altogether greatly help yeast biologists to do epigenetics research. Unfortunately, ChromatinDB is no longer available since 2014. Yeast biologists now are lacking a convenient tool to identify the enriched/depleted histone modifications in their gene lists routinely generated from   high-throughput experimental technologies (e.g. microarray or next-generation sequencing).
To fill this gap, we developed a web tool called Yeast Histone Modification Identifier (YHMI). YHMI uses ChIP-chip/ChIP-seq datasets of 32 histone modifications (15 histone acetylation, 13 histone methylation, 2 histone phosphorylation, 1 histone ubiquitination and 1 histone variant) and 83 histone/chromatin regulators (18)(19)(20)(21)(22)(23)(24). When a user inputs a gene list, YHMI will identify the enriched/depleted histone modifications in the promoters/coding regions and the enriched histone/chromatin regulators in the promoters of the genes in the input list. The identification results are shown both in tables and figures. Therefore, YHMI can be used to shed light on what is unknown in a gene list of interest. Several possible biological questions could be answered by YHMI. For example, what are the enriched/depleted histone codes in a gene list of a specific property (e.g. highly transcribed genes, stress-responsive genes or genes in a specific pathway)? What are the enriched/depleted histone codes in a gene list associated with a specific factor (e.g. target genes of a transcription factor, lipid-binding proteins or hexose transporter genes)?

Construction and contents
Collection of ChIP-chip/ChIP-seq datasets of histone modifications, histone regulators and chromatin regulators All the ChIP-chip/ChIP-seq data (18)(19)(20)(21)(22)(23)(24) used in YHMI were downloaded from SGD ( Figure 1). SGD (25) collected the raw data of ChIP-chip/ChIP-seq from GEO (28) and ArrayExpress (29), mapped the raw data to the latest yeast reference genome sacCer3 (R64) and allowed everyone to download the processed data. Therefore, we directly downloaded the ChIP-chip/ChIP-seq datasets of 32 histone modifications (Table 2 and Supplementary Table 1 for more details about the strain, reference genome, original data source, etc.) and 83 histone/chromatin regulators (Supplementary Table 2) from SGD.
Defining genes whose promoters/coding regions contain a specific histone modification Following previous studies (19,23), a gene's promoter is defined as the region between 500 bp upstream and 100 bp downstream of the start codon. A gene's coding region is defined as the region between the start codon and the stop codon. The procedure of defining a set of genes whose promoters/coding regions contain a specific histone modification (e.g. H3K4ac) is as follows. First, for each of the 6572 genes in the yeast genome, we extracted the maximal data value (log 2 (H3K4ac/H3) in this case) in its promoter/coding region from the corresponding ChIPchip/ChIP-seq dataset (Table 2). Second, the promoter/coding region of a gene is said to contain H3K4ac if it satisfies log 2 (H3K4ac/H3) ≥ threshold, where the threshold is set by the user. For example, when the threshold is set to one, 1656 genes' promoters and 977 genes' coding regions are said to contain H3K4ac.
Defining genes whose promoters are bound by a specific histone/chromatin regulator Venters et al. (18) identified high-confident [less than 5% false discovery rate (FDR)] interactions between a specific  (18) and based on the definition of a gene's promoter region (see the previous subsection), we can determine the genes whose promoters are bound by a specific histone/chromatin regulator. Supplementary Table 2 provides the names of these 83 histone/chromatin regulators and their target genes.

Identifying the enriched/depleted histone modifications and the enriched histone/chromatin regulators for the user's input genes
The main functionality of YHMI is to identify the enriched/depleted histone modifications and the enriched histone/chromatin regulators for the user's input genes. The procedure for checking whether a specific histone modification (e.g. H3K4ac) is enriched/depleted in the promoters of the user's input genes is as follows. Let S be the set of genes whose promoters contain the histone modification H3K4ac (see the subsection before the previous one for details), R be the set of the user's input genes, T = S ∩ R be the set of genes whose promoters contain H3K4ac and are also in the set of the user's input genes and F be the set of all genes in the yeast genome. Then H3K4ac is said to be enriched/depleted in the promoters of the user's input genes if the observed ratio (|T|/|R|) in the input genes is significantly higher/lower than the expected ratio (|S|/|F|) in the yeast genome. |S| stands for the number of genes in the set S and |F| = 6572. The statistical significance is calculated using the hypergeometric testing (30) as follows. The P value (enrichment) and P value (depletion) for rejecting the null hypothesis (H 0 : H3K4ac is not enriched/depleted in the promoters of the user's input genes) are calculated as where S means the number of genes in set S and is a binomial coefficient. The P value (enrichment) and P value (depletion) are then corrected by the Bonferroni correction or the FDR to represent the true alpha level in the multiple hypotheses testing. Finally, H3K4ac is said to be enriched/depleted in the promoters of the user's input genes if the corrected P value (enrichment) or corrected P value (depletion) is less than the user-defined threshold. Note that Bonferroni correction and FDR are two statistical methods for multiple hypotheses correction. Bonferroni correction is more conservative than FDR. That is, Bonferroni correction has a smaller type I error rate, resulting in a smaller power, than FDR does. The procedure for checking whether a specific histone modification is enriched/depleted in the coding regions of the user's input genes is the same as mentioned above except for the definitions of two terms. Now S becomes the set of genes whose coding regions contain the histone modification H3K4ac and T = S ∩ R becomes the set of genes whose coding regions contain H3K4ac and are also in the set of the user's input genes.
The procedure for checking whether a specific histone/chromatin regulator (e.g. Esa1) is enriched in the promoters of the user's input genes is the same as mentioned above except for the definitions of two terms. Now S becomes the set of genes whose promoters are bound by Esa1 and T = S ∩ R becomes the set of genes whose promoters are bound by Esa1 and are also in the set of the user's input genes. Implementation and maintenance of the web interface of YHMI Figure 1 illustrates the overall configuration of YHMI. The web interface of YHMI was developed in Python using the Django MTV framework. The processed histone modification data were deposited in MySQL. All tables, volcano plots and bar charts were produced by the JavaSscript and feature-rich JavaScript libraries (jQuery, DataTables and Plotly.js) to visualize data on the webpage. We also provide the command line version of YHMI (written in Python) for users who want to run YHMI on their local computers (see the Help page of YHMI website). YHMI will be maintained by our lab's research assistants and we have a backup site (http://cosbi5.ee.ncku.edu.tw/YHMI/). Therefore, the long-term stability of YHMI is guaranteed. In the future, we will keep updating YHMI once new histone modification datasets become available in the literature.

Utility and discussion
The usage of YHMI YHMI is a web tool for identifying the enriched/depleted histone modifications and the enriched histone/chromatin regulators in the input gene list. To use YHMI, users have  Histone variant H2AZ depleted in promoter Guillemette et al. (23) to go through a three-step process ( Figure 2). [Step 1] Users need to input a list of N genes, which will be analyzed by YHMI. Standard names, systematic names or aliases are all acceptable. [Step 2] Users need to define the sets of genes (in the yeast genome) whose promoters/coding regions contain specific histone modifications by setting the thresholds. For example, by setting log 2 (H3K9ac/H3) ≥ 1 (meaning 2-fold enrichment over the background) in the promoters, a set of 2129 yeast genes whose promoters contain H3K9ac could be defined. Then the expected ratio of promoters having H3K9ac in the yeast genome is equal to 0.32 (2129/6572). Further, by intersecting the input list of N genes and the set of 2129 genes, the number (denoted as M) of input genes whose promoters having H3K9ac can be calculated. Then the observed ratio of promoters having H3K9ac in the input list of genes is equal to M/N. Finally, the input list of N genes is said to be enriched with H3K9ac in the promoters if the observed ratio (M/N) is much larger than the expected ratio (2129/6572). The statistical significance is calculated using hypergeometric testing (30). [Step 3] Since YHMI tests the enrichment/depletion of many histone modifications (i.e. multiple hypotheses testing), users have to select a statistical method (Bonferroni correction or FDR) for multiple hypotheses correction and set the P-value threshold (0.01 is used as the default).
The P-value threshold determines the statistical significance of the identified enriched/depleted histone modifications. The more stringent the P-value threshold, the higher the statistical significance of the identified enriched/depleted histone modifications. Therefore, if a user cannot find any enriched/depleted histone modifications for their gene list, they may want to loosen the P-value threshold to find some less statistically significant enriched/depleted histone modifications. Note that if a user wants to see all the results without defining the statistical significance level, he/she can choose 'No P-value cutoff'. After submission, YHMI will return the identification results that can be divided into two parts. First, the information of the user's settings is shown. We allow users to download all the sets of genes containing specific histone modifications defined by the users for further investigation (Figure 3). Second, the identified enriched/depleted histone modifications (acetylation, methylation, phosphorylation, ubiquitination and histone variant) in the promoters/coding regions of the input gene list are shown as tables and figures. The table contains the following information: the name of the histone modification, trend (enriched/depleted), Pvalue, fold enrichment, observed ratio and expected ratio (Figure 4). Two kinds of figures (bar charts and volcano plots) are also provided for visualization (Figure 4). Two different bar charts (a fold enrichment bar chart and a P-value bar chart) are available. In the fold enrichment bar chart, the identified histone modifications are sorted by the fold enrichment values. In the P-value bar chart, the identified histone modifications are sorted by the Pvalues. The other kind of figure (the volcano plot) is a nice combination of the fold enrichment (x-axis) and P-value (yaxis). Moreover, the enriched histone/chromatin regulators in the promoters of the input gene list are provided as tables and figures ( Figure 5).

Two case studies
We use two case studies to show the high-quality and biological insight of YHMI's identification results for the user's input genes. Since several histone modifications are already known to be correlated with highly transcribed genes (19)(20)(21)(22)(23), we would like to check (in the first case study) whether YHMI can successfully identify these known histone modifications and their related enzymes. To do that, we retrieved a list of 174 highly transcribed (>50 mRNA/hr) genes under YPD condition from Holstege et al. (31). Strikingly, YHMI did successfully identify most histone modifications known to be enriched/depleted in highly transcribed genes (Table 3). For example, H3K4ac, H3K9ac and H3K14ac are identified to be enriched both in the promoters and the coding regions of the highly transcribed genes, consistent with existing knowledge (19). Moreover, YHMI successfully identifies the enrichment of Gcn5 [the known acetyltransferase which targets H3K4, H3K9 and H3K14 (19)] in the promoters of the highly transcribed genes, further supporting the biological significance of identifying the enrichment of H3K4ac, H3K9ac and H3K14ac. More examples could be found in Table 3. All these examples validate the high quality of YHMI's identification results.
In the second case study, we used a list of 538 genes whose promoters are bound by Esa1 at 25 • C (retrieved from Venters et al.'s study (18)) as the input to investigate the signature of histone modifications and histone/chromatin regulators of these Esa1-targeting promoters. Esa1 is a histone acetyltransferase specifically targeting histone H2A, H2AZ and H4 (17). Esa1 together with the other 12 subunits forms the NuA4 complex (17). YHMI identified several histone modifications and histone/chromatin regulators that are enriched in these Esa1-targeting promoters; in other words, the signature of histone modifications and regulators of these Esa1-targeting promoters.
For example, YHMI found significant enrichment of H3K4ac, H3K9ac, H3K14ac and H4ac in these Esa1targeting promoters (Figure 6a). The enrichment of H4ac at these promoters is consistent with the function of Esa1 (17). Additionally, H3K4ac, H3K9ac and H3K14ac are also enriched in these promoters, indicating that the SAGA complex (which targets and acetylates H3K4, H3K9 and H3K14) might also act in these promoters along with Esa1 (32). Therefore, YHMI provides a testable hypothesis that SAGA might bind to promoters along with Esa1. This hypothesis awaits further experimental validation. Interestingly, YHMI found that H3K4me3 and H3K36me3 are also enriched in these Esa1-targeted promoters (Figure 6b). Therefore, YHMI provides another testable hypothesis that H3K4me3 and H3K36me3 might be important for regulating the acetyltransferase activity of NuA4. Indeed, several studies have shown that the methylated H3K4 and H3K36 interact with PHD and  chromodomain within the NuA4 complex by using the GST pull-down, immunoprecipitation and protein array assays (9,11,12). Furthermore, the set1Δ set2Δ mutants or mutants with combined mutations of the PHD and chromodomains show defective acetyltransferases activity of NuA4 (33,34). Additionally, an experiment has shown that the interaction of H3K4me3 with Tudor domain in Sgf29 within the SAGA complex is also essential for the acetyltransferase activity of SAGA (35). These experiments suggest the usefulness of YHMI.
YHMI also found that H2BK123ub is enriched in these Esa1-targeting promoters (Figure 6c). H2BK123ub is a transient histone mark, which is established by the Rad6/Bre1 ubiquitin ligase complex during transcription initiation and elongation (36). Previous studies have shown that H2BK123ub is essential for establishing H3K4me3 and H3K79me3 (37). In that sense, H3K4me3 and H3K79me3 are expected to be enriched in these Esa1-targeting promoters. YHMI successfully identified H3K4me3 enriched but did not see H3K79me3 enriched in these Esa1-targeting promoters (Figure 6b). Surprisingly, YHMI identified H3K79me2 depleted in these Esa1targeting promoters (Figure 6b). The reason underlying H3K79me2 depletion is unclear. This awaits further experimental investigation. These examples illustrate how YHMI's identification results can provide testable hypotheses for experimental investigation.

Robustness of the identification results
Several histone modifications (e.g. H3K4me, H3K4me3 and H3K36me3) have ChIP-chip data in different sources and can be used to test whether YHMI's identification results are robust against different data sources. For example, we tested whether the coding regions of the 174 highly transcribed genes (same input as in the case study 1) are enriched/depleted in H3K4me3. Strikingly, no matter which one of three ChIP-chip data sources of H3K4me3 is used, YHMI always returns the same trend (enrichment of H3K4me3). This observation is also true for H3K4me (depletion) and H3K36me3 (enrichment) shown in Table 5. These examples illustrate that the YHMI's identification results are robust against different data sources.

Comparison with our previously published YNA website
In 2014, we published YNA (26) which allows users to retrieve a list of yeast genes whose promoters and/or coding regions contain a user-specified combination of histone modifications (e.g. H3K4ac and H3K4me3). Since we published YNA, we received many requests for a reverse use (i.e. identifying the enriched histone modifications for a user's gene list). This motivated us to develop YHMI. Compared to YNA, YHMI includes additional 17 histone modifications (9 histone acetylation, 6 histone methylation and 2 histone phosphorylation) from Weiner et al.'s ChIP-seq data (24). Moreover, users can download the lists of genes with a specific histone modification in YHMI. However, only YNA (but not YHMI) allows users to download a list of genes having a specific combination of histone modifications (e.g. H3K4ac and H3K4me3 in the promoters). To inform YHMI users about the related database YNA, we have provided a link and an introduction of YNA on the homepage of YHMI.

Conclusion
In this study, we developed a web tool called YHMI. YHMI uses the ChIP-chip/ChIP-seq datasets of 32 histone modifications (15 histone acetylation, 13 histone methylation, 2 phosphorylation, 1 histone ubiquitination and 1 histone variant) and 83 histone/chromatin regulators. When a user inputs a gene list, YHMI will identify the enriched/depleted histone modifications in the promoters/coding regions and enriched histone/chromatin regulators in the promoters of the genes in the input list. The identification results are shown both in figures and tables. The high quality of YHMI's results is validated by identifying most known histone modifications enriched/depleted in highly transcribed genes. The biological insight of YHMI's results is demonstrated by generating experimentally testable hypotheses of novel histone modifications and their enzymes enriched in the target genes of the histone acetyltransferase Esa1. We believe that YHMI is a valuable tool for yeast biologists to do epigenetics research.

Supplementary Data
Supplementary data are available at Database Online.