Abstract

DNA methylation is a key functional regulatory mechanism in human genome, which plays critical roles in development, differentiation and many diseases. With rapid progress of large-scale projects (e.g. ENCODE), many DNA methylation data across diverse cell lines have been produced. However, common methylation patterns, cell lineage- and cell line-specific DNA methylation patterns across multiple cell lines have not yet been explored completely. Using the DNA methylation data across 54 human cell lines, we identified 35 276 local DNA methylation regions called local clusters of CpG sites (LCCSs). We constructed an LCCS co-methylation network and investigated the common DNA methylation patterns across all cell lines, which reveal two distinct groups in terms of their methylation level and genomic characteristics. We further detected diverse sets of cell lineage-specific high- and low-methylation patterns, which were depleted in promoter, CpG island (CGI) and repeat regions but enriched in gene body and non-CGI regions, especially the CGI shore regions. We discovered that the cell lineage-specific low-methylated LCCSs were enriched with functional transcriptional factor binding motif regions. Moreover, the detected cell line-specific high- and low-methylated patterns show distinct enrichments in cell line-specific chromatin states and functional relevance with the corresponding cell lines.

Introduction

DNA methylation is a very important functional epigenetic modification of mammalian DNA, which has been implicated in embryonic development, chromosomal stability, transcription, X-chromosome inactivation (1,2) and complex diseases such as cancers (3,4). DNA methylation enables cells to control gene expression in a cell type-specific manner, which further leads to different phenotypes (5,6). It is essential to understand the roles of DNA methylation in regulating how to develop different cell types. With the rapid development of biological techniques [e.g. array-based HumanMethylation450 BeadChip (1), whole-genome bisulfite sequencing (7) and reduced representation bisulfite sequencing (RRBS) (8)], a number of studies have investigated the methylation profiles across diverse cell types. Most of such studies have focused on a set of special cell types, such as immune system cells (9), high-grade soft tissue sarcomas (10) and breast-associated cell lines (11,12), which have shown that DNA methylation plays important roles in the acquisition of cell type-specific features (11,12), especially in the early stage of lineage specification (9).

Moreover, increasingly epigenomic studies have explored tissue-specific methylation profiles across multiple cell types (1,13,14), demonstrating the correlations of tissue-specific differentially methylated regions (tDMRs) with gene expression, the enrichment of motifs in promoter tDMRs (13) and the functions of hypomethylated tDMRs (1). However, a number of questions remained to be answered, e.g. What the global correlation patterns of methylated regions across multiple cell lines or tissues are? What the function of specific DNA methylation patterns in different cell lineages and cell lines are? Which genes are specifically high or low methylated in different cell lineages and cell lines.

In this study, we aimed to explore the common, cell lineage-specific and cell line-specific DNA methylation patterns in 54 diverse cell lines obtained from ENCODE project. We first defined and identified local DNA-methylated regions named local clusters of CpG sites (LCCSs) across the human genome of all cell lines. We detected seven LCCS co-methylation modules through weighted LCCSs co-methylation network analysis to explore the common DNA methylation patterns across all cell lines. We further explored cell lineage- and cell line-specific methylated LCCSs. We found that cell lineage-specific high- or low-methylated LCCSs were depleted in promoter, CpG island (CGI) and repeat regions and enriched in gene body and non-CGI regions. More interestingly, the cell lineage-specific low-methylated LCCSs were shown to be enriched in functional motifs of different transcription factors, which were reported to be involved in regulating corresponding cell lineage development and differentiations. We noted that the cell line-specific high- and low-methylated patterns showed different enrichments in functional chromatin states. Finally, we explored the cell lineage- or cell line-specific high- and low-methylated genes individually and found most (>80%) of them were related to certain functions of their corresponding cell lineages or cell lines.

Results

We downloaded all the DNA methylation data of 54 normal cell lines and defined a local cluster of CpG sites named LCCS (Materials and Methods). In total, we obtained 425 115 CpG sites with at least 10× coverage across 54 cell lines and identified 35 276 LCCSs across all cell lines with a median length of 79 bp and a median number of CpGs of 7. For the majority of the LCCSs (21 417, 60.7%), their standard deviations of the percent methylated (PM) values are <5 (Supplementary Material, Fig. S1). Among them, 54.2% (19 115) of LCCSs locate in CGI, 37.7% (13 294) of LCCSs locate in non-CGI and 8.1% (2867) LCCSs cover both CGI and non-CGI sites. In addition, 35% (12 347) of LCCSs show low methylation (methylation level ≤10) across all cell lines. As expected, 80.8% (9977) of them locate in promoter regions of genes with housekeeping function (P < 9.52e-18), whereas only 6.2% (768) of them locate in gene body regions of genes with regulation of transcription function (P = 3.18e-4).

By clustering cell lines with the methylation levels of their LCCSs, we can see that the methylation level, which tends to be similar across all cell lines, is relatively low for a majority of LCCSs (Supplementary Material, Fig. S2). Several groups of cell lines with the same tissue of origins such as the blood cell lines (GM12878, GM12878-Ximat), smooth muscle cell lines (Myometr, AoSMC) and the skeletal muscle cell lines (BC_Skeletal_Muscle_H12817N, BC_Skeletal_Muscle_01-11002, HSMMtube, HSMM) were clustered together, implying the existence of cell lineage-specific patterns.

We quantified the relationships between LCCS methylation level and gene expression level. Similar to the previous observations on individual CpG sites (14), the methylation profiles of LCCSs located in promoter region regardless of in CGI or not are negatively correlated with gene expression level (median r = −0.1235) (Supplementary Material, Fig. S3A and B). The methylation profiles of CGI LCCSs located in gene body regions display a bimodal distribution of correlation with gene expression (Supplementary Material, Fig. S3C), whereas the methylation profiles of non-CGI LCCSs located in gene body regions are positively correlated with gene expression level (median r = 0.2114) (Supplementary Material, Fig. S3D).

Weighted LCCS co-methylation network analysis reveals distinct biological relevant co-methylation modules

To characterize the common methylation patterns among different cell lines, we constructed a weighted LCCS co-methylation network that shows a well-known scale-free property (Supplementary Material, Fig. S4A). Based on a pre-processing evaluation procedure, we detected seven LCCS modules via a typical hierarchical clustering method (Fig. 1A). We observed two distinct groups with diverse methylation level (Fig. 1B). One group with Modules 1 and 5 (named HM modules) shows high methylation level globally, and the other one including Modules 2, 3, 4, 6 and 7 (named LM modules) shows low methylation level. We found that the HM modules were significantly enriched in the non-coding, gene body and open sea, whereas the LM modules were enriched in promoter and CGI regions and depleted in open sea (Fig. 1C and Supplementary Material, Table S1). In addition, we also found that Module 2 was significantly depleted [fold change (FC) = 0.69, P-value = 3.34e − 16] in repeat elements, whereas Module 5 was significantly enriched (FC = 1.6, P-value = 4.42e − 12) in repeat elements (Supplementary Material, Table S1).

Modules in the weighted LCCS co-methylated network. (A) Heatmap of topological overlap and modules defined based on the hierarchical dendrogram of the LCCS co-methylated network. Each row and column corresponds to a LCCS, and each module covering their member LCCSs was represented by a color bar. The range of topological overlap similarity from low to high was represented with light yellow to dark red. These modules were detected after we deleted a very large redundant module with very weak inter-correlations (Supplementary Material, Fig. S4 and Supplementary Data). (B) The methylation profiles of each module across 54 cell lines. (C) Heatmap of –log10(p) of the significance of the genomic features enrichment in each module. We truncated the –log10(p) by 15. (D) The enrichment of LCCSs of each module in the chromatin states defined in (15). The number on the bottom of each column represents the module id.
Figure 1.

Modules in the weighted LCCS co-methylated network. (A) Heatmap of topological overlap and modules defined based on the hierarchical dendrogram of the LCCS co-methylated network. Each row and column corresponds to a LCCS, and each module covering their member LCCSs was represented by a color bar. The range of topological overlap similarity from low to high was represented with light yellow to dark red. These modules were detected after we deleted a very large redundant module with very weak inter-correlations (Supplementary Material, Fig. S4 and Supplementary Data). (B) The methylation profiles of each module across 54 cell lines. (C) Heatmap of –log10(p) of the significance of the genomic features enrichment in each module. We truncated the –log10(p) by 15. (D) The enrichment of LCCSs of each module in the chromatin states defined in (15). The number on the bottom of each column represents the module id.

We further observed that HM modules (Modules 1 and 5) were enriched in intergenic-, repressive-, spliced- and heterochromatin-associated chromatin states (Fig. 1D). Particularly, Module 1 was enriched in intergenic-associated states, and Module 5 was enriched in spliced exons/GC rich, heterochromatin-nuclear lamina-most A/T rich states (Fig. 1D), indicating that they play different roles in biological processes. In contrast, LM modules (Modules 2, 3, 4, 6 and 7) were enriched in promoter- and specific repression-associated chromatin states (Fig. 1D). Interestingly, unlike Module 7, Modules 2–4 were significantly enriched in specific repression state (P < 1.9e − 12).

Moreover, considering the fact that LCCSs of HM modules were enriched in gene body regions and LCCSs of LM modules were enriched in promoter regions, we compared the methylation level of LCCSs in both modules and the corresponding gene expression levels. We observed that the methylation level of LCCSs of HM modules was positively correlated with the expression level of their located genes, and the methylation level of LCCS of LM modules (except Module 7) was negatively correlated with their expression level of their located genes (Fig. 2), which were in line with previous reported results (14,16). These observations highlight the importance of intragenic methylation in the regulation of gene expression.

Distributions of the Pearson correlation coefficients between methylation level of LCCSs and expression level of associated genes in each module. The first two dark gray histograms are for HM modules, and the remaining ones are for LM modules.
Figure 2.

Distributions of the Pearson correlation coefficients between methylation level of LCCSs and expression level of associated genes in each module. The first two dark gray histograms are for HM modules, and the remaining ones are for LM modules.

In addition, five of the seven modules were significantly enriched in known functions (Benjamini corrected P < 0.05, and Supplementary Material, Fig. S5) and showed diverse functional relevance. Notably, the HM Module 1 was specifically characterized with ion transport, cell adhesion, etc., implying that hypermethylation may be common for genes present in the extracellular space to enable them affect cellular function. While LM modules (e.g. Modules 2 and 3) were mainly enriched in regulation of transcription, regulation of gene expression, etc. These findings suggested that the modular structure of the co-methylation network reveals common and distinct signatures of methylation profiles across diverse cell lines.

Cell lineage-specific LCCS methylation patterns

We identified cell lineage-specific high- and low-methylated LCCSs (Fig. 3). Nine of 12 cell lineages have at least one specific high-methylated LCCS, and all 12 cell lineages have at least one specific low-methylated LCCS (Fig. 3A and B). Globally, the mean PM values of the cell lineage-specific high-methylated LCCSs range from 55.7 to 78.6 and that of the low-methylated LCCSs range from 21.1 to 35.7 of all lineages (except epithelium with a mean PM of only 5.1) (Table 1). Interestingly, we observed that blood lineage has the most specific high- or low-methylated LCCSs (809 and 829, respectively), whereas brain and epithelium lineages have the fewest ones (2 and 3, respectively) (Table 1). Moreover, the distribution of average cell line specificity of each LCCS in a cell lineage shows that there is no LCCS that is cell line-specific high for a given cell line but low for another one within the same cell lineage (Supplementary Material, Fig. S6).

Table 1.

The basic statistics of cell lineage-specific high- and low-methylated LCCS patterns

LineagesCell lineage-specific high-methylated LCCS patterns
Cell lineage-specific low-methylated LCCS patterns
LCCSamPMAGaCSMEaLCCSamPMAGaCSMEa
Blood80977.85351982921.645917
Brain255.715123.4401
Breast6725.950
Epithelium35.11
Heart463.244821.5261
Liver7174.9457530.5615
Lung1578.674423.724
Skeletal muscle32775.9171648835.73017
Smooth muscle573.6280335.120
Pancreas1723.213
Skin1672.856027.132
Reproductive477.722521.12331
LineagesCell lineage-specific high-methylated LCCS patterns
Cell lineage-specific low-methylated LCCS patterns
LCCSamPMAGaCSMEaLCCSamPMAGaCSMEa
Blood80977.85351982921.645917
Brain255.715123.4401
Breast6725.950
Epithelium35.11
Heart463.244821.5261
Liver7174.9457530.5615
Lung1578.674423.724
Skeletal muscle32775.9171648835.73017
Smooth muscle573.6280335.120
Pancreas1723.213
Skin1672.856027.132
Reproductive477.722521.12331

aNumber. mPM, the mean of the PM values of all the specific LCCSs in a given cell lineage; AG, the LCCSs-associated genes determined based on whether such LCCSs locate in the promoter region or gene body; CSME, the overlapping genes between cell lineage-specific LCCS-associated genes and cell type-specifically expressed genes.

Table 1.

The basic statistics of cell lineage-specific high- and low-methylated LCCS patterns

LineagesCell lineage-specific high-methylated LCCS patterns
Cell lineage-specific low-methylated LCCS patterns
LCCSamPMAGaCSMEaLCCSamPMAGaCSMEa
Blood80977.85351982921.645917
Brain255.715123.4401
Breast6725.950
Epithelium35.11
Heart463.244821.5261
Liver7174.9457530.5615
Lung1578.674423.724
Skeletal muscle32775.9171648835.73017
Smooth muscle573.6280335.120
Pancreas1723.213
Skin1672.856027.132
Reproductive477.722521.12331
LineagesCell lineage-specific high-methylated LCCS patterns
Cell lineage-specific low-methylated LCCS patterns
LCCSamPMAGaCSMEaLCCSamPMAGaCSMEa
Blood80977.85351982921.645917
Brain255.715123.4401
Breast6725.950
Epithelium35.11
Heart463.244821.5261
Liver7174.9457530.5615
Lung1578.674423.724
Skeletal muscle32775.9171648835.73017
Smooth muscle573.6280335.120
Pancreas1723.213
Skin1672.856027.132
Reproductive477.722521.12331

aNumber. mPM, the mean of the PM values of all the specific LCCSs in a given cell lineage; AG, the LCCSs-associated genes determined based on whether such LCCSs locate in the promoter region or gene body; CSME, the overlapping genes between cell lineage-specific LCCS-associated genes and cell type-specifically expressed genes.

Cell lineage-specific LCCSs analysis. Cell lineage-specific (A) high- and (B) low-methylated LCCS patterns were clustered. Each row denotes one LCCS, each column denotes one cell line in the heatmap, and each cell lineage is represented by a color bar at the top of the heatmap. Enrichments of cell lineage-specific (C) high- and (D) low-methylated patterns in different genomic regions were calculated. Different cell lineages were represented by different color bars (the bar of Shelf in brain lineage was truncated to 3.8 for visualization, and its real number is 11.82, the bar of Open sea in reproductive lineage was truncated to 3.8 for visualization and its real number is 4.36). The black dash lines indicate FC = 1, ‘*’ represents P < 1.0e − 3 and ‘+’ represents 0.05 > P ≥ 1.0e − 3.
Figure 3.

Cell lineage-specific LCCSs analysis. Cell lineage-specific (A) high- and (B) low-methylated LCCS patterns were clustered. Each row denotes one LCCS, each column denotes one cell line in the heatmap, and each cell lineage is represented by a color bar at the top of the heatmap. Enrichments of cell lineage-specific (C) high- and (D) low-methylated patterns in different genomic regions were calculated. Different cell lineages were represented by different color bars (the bar of Shelf in brain lineage was truncated to 3.8 for visualization, and its real number is 11.82, the bar of Open sea in reproductive lineage was truncated to 3.8 for visualization and its real number is 4.36). The black dash lines indicate FC = 1, ‘*’ represents P < 1.0e − 3 and ‘+’ represents 0.05 > P ≥ 1.0e − 3.

It has been shown that most variations of DNA methylation in different tissues occur more often in gene bodies than in promoters (17) and occur more often in non-CGI regions than in CGI regions (9). We wonder whether these cell lineage-specific LCCSs are related to specific genomic features such as genic regions, CGI regions and repeat regions. We noted that both cell lineage-specific high- and low-methylated LCCSs were significantly depleted in promoter regions and enriched in gene body regions (Fig. 3C and D and Supplementary Material, Table S2). For example, both blood-specific high- and low-methylated LCCSs were depleted in promoter regions (FC = 0.59, P = 4.65e − 33 for high-methylated LCCSs and FC = 0.16, P = 1.17e − 155 for low-methylated LCCSs) (Fig. 3C and D and Supplementary Material, Table S2). In terms of CGI-associated regions, we found that most of the cell lineage-specific low-methylated LCCSs were significantly depleted in CGI regions and enriched in non-CGI regions, including shores, shelves and open seas (Fig. 3D and Supplementary Material, Table S2), which reflecting the cell lineage-associated potential regulatory hypomethylated regions were located either partially or completely outside CGIs. Moreover, the cell lineage-specific high-methylated LCCSs show different preferences with CGI regions. For instance, blood-specific high-methylated LCCSs were significantly enriched in non-CGI (FC = 1.58, P = 1.38e − 8 for shores; FC = 1.52, P = 0.004 for shelves; and FC = 1.34, P = 3.05e − 7 for open seas) and depleted in CGI (FC = 0.7, P = 1.7e-20), whereas skeletal muscle-specific high-methylated LCCSs were significantly depleted in open seas (FC = 0.39, P = 4.69e − 11) and enriched in CGI regions (FC = 1.12, P = 0.02) and shores (FC = 1.66, P = 3.37e − 5) (Fig. 3C and Supplementary Material, Table S2). In addition, we observed significant depletion in repeat regions for both cell lineage-specific high- and low-methylated LCCSs (Fig. 3C and D and Supplementary Material, Table S2). These results indicated that cell lineage-specific methylation regions were mostly enriched in gene body and non-CGI regions (especially in CGI shores) but depleted in promoter, CGI and repeat regions, implying the importance of exploring the emerging intragenic and CGI shore methylations.

We found that a set of transcription factor binding motifs were enriched in the cell lineage-specific low-methylated LCCSs, indicating potential implications of such LCCSs to corresponding cell lineages. Interestingly, most of the transcription factors were functionally associated with the corresponding cell lineages, and six examples were shown in Table 2 (Supplementary Material, Table S3). For instance, motifs of the top two factors EBF1 and E2A were significantly overrepresented in blood-specific low-methylated LCCSs (P = 1e-29 and P = 1e − 24, respectively). We noted that EBF1 is associated with B cell commitment and pro-B cell development (18,24), and E2A is a key regulator of the differentiation of lymphocyte (19). Furthermore, hypomethylation of the binding sites of EBF1 and E2A has been proved to be closely associated with B cell development (25). In another example, motifs of T-box and MYF5 were significantly enriched in skeletal muscle-specific low-methylated LCCSs (P = 1e − 20 and P = 1e − 19, respectively). T-box was reported to be required for the skeleton of limb development (21), and MYF5 was shown to regulate the skeletal myogenesis (22,26). Moreover, the hypomethylation of MYF5 was associated with human skeletal muscle myoblasts (27). Similarly, GATA4 were highly enriched in heart lineage and ZEB1 in smooth muscle lineage, which were all known to be functional in their respective cell lineages (20,23). These observations indicated that motifs enriched in cell lineage-specific low-methylated LCCSs were relevant to the development of corresponding cell lineages.

Table 2.

Examples of motifs enriched in cell lineage-specific low-methylated LCCSs

Cell lineageMotifP-valueTF(s)Brief descriptionReference
Bloodgraphic 1e − 29EBF1EBF1 is required for B-cell commitment, pro-B cell development, subsequent transition to the pre-B cell stage and B cell development(18)
graphic 1e − 24E2AE2A plays a wide variety of roles in directing lymphocyte differentiation and activation(19)
Heartgraphic 1e − 14GATA4GATA4 is a critical regulator of cardiac morphogenesis and a necessary regulator of cardiac gene expression, hypertrophy, stress compensation and myocyte viability(20)
Skeletal musclegraphic 1e − 20T-boxTbx15, a member of Tbox family, plays a key role in the development of the skeleton of the limb, vertebral column and head by controlling the number of mesenchymal precursor cells and chondrocytes(21)
graphic 1e − 19MYF5MYF5 is one of four MRFs (MYF5, MyoD, myogenin and MRF4) required for regulating skeletal myogenesis(22)
Smooth musclegraphic 1e − 24ZEB1Overexpression of deltaEF1 inhibited neointima formation and promoted SMC differentiation, whereas heterozygous deltaEF1 knockout mice exhibited exaggerated neointima formation(23)
Cell lineageMotifP-valueTF(s)Brief descriptionReference
Bloodgraphic 1e − 29EBF1EBF1 is required for B-cell commitment, pro-B cell development, subsequent transition to the pre-B cell stage and B cell development(18)
graphic 1e − 24E2AE2A plays a wide variety of roles in directing lymphocyte differentiation and activation(19)
Heartgraphic 1e − 14GATA4GATA4 is a critical regulator of cardiac morphogenesis and a necessary regulator of cardiac gene expression, hypertrophy, stress compensation and myocyte viability(20)
Skeletal musclegraphic 1e − 20T-boxTbx15, a member of Tbox family, plays a key role in the development of the skeleton of the limb, vertebral column and head by controlling the number of mesenchymal precursor cells and chondrocytes(21)
graphic 1e − 19MYF5MYF5 is one of four MRFs (MYF5, MyoD, myogenin and MRF4) required for regulating skeletal myogenesis(22)
Smooth musclegraphic 1e − 24ZEB1Overexpression of deltaEF1 inhibited neointima formation and promoted SMC differentiation, whereas heterozygous deltaEF1 knockout mice exhibited exaggerated neointima formation(23)

TF, transcription factor.

Table 2.

Examples of motifs enriched in cell lineage-specific low-methylated LCCSs

Cell lineageMotifP-valueTF(s)Brief descriptionReference
Bloodgraphic 1e − 29EBF1EBF1 is required for B-cell commitment, pro-B cell development, subsequent transition to the pre-B cell stage and B cell development(18)
graphic 1e − 24E2AE2A plays a wide variety of roles in directing lymphocyte differentiation and activation(19)
Heartgraphic 1e − 14GATA4GATA4 is a critical regulator of cardiac morphogenesis and a necessary regulator of cardiac gene expression, hypertrophy, stress compensation and myocyte viability(20)
Skeletal musclegraphic 1e − 20T-boxTbx15, a member of Tbox family, plays a key role in the development of the skeleton of the limb, vertebral column and head by controlling the number of mesenchymal precursor cells and chondrocytes(21)
graphic 1e − 19MYF5MYF5 is one of four MRFs (MYF5, MyoD, myogenin and MRF4) required for regulating skeletal myogenesis(22)
Smooth musclegraphic 1e − 24ZEB1Overexpression of deltaEF1 inhibited neointima formation and promoted SMC differentiation, whereas heterozygous deltaEF1 knockout mice exhibited exaggerated neointima formation(23)
Cell lineageMotifP-valueTF(s)Brief descriptionReference
Bloodgraphic 1e − 29EBF1EBF1 is required for B-cell commitment, pro-B cell development, subsequent transition to the pre-B cell stage and B cell development(18)
graphic 1e − 24E2AE2A plays a wide variety of roles in directing lymphocyte differentiation and activation(19)
Heartgraphic 1e − 14GATA4GATA4 is a critical regulator of cardiac morphogenesis and a necessary regulator of cardiac gene expression, hypertrophy, stress compensation and myocyte viability(20)
Skeletal musclegraphic 1e − 20T-boxTbx15, a member of Tbox family, plays a key role in the development of the skeleton of the limb, vertebral column and head by controlling the number of mesenchymal precursor cells and chondrocytes(21)
graphic 1e − 19MYF5MYF5 is one of four MRFs (MYF5, MyoD, myogenin and MRF4) required for regulating skeletal myogenesis(22)
Smooth musclegraphic 1e − 24ZEB1Overexpression of deltaEF1 inhibited neointima formation and promoted SMC differentiation, whereas heterozygous deltaEF1 knockout mice exhibited exaggerated neointima formation(23)

TF, transcription factor.

Finally, we found that a set of associated genes of cell lineage-specific LCCS were cell type-specifically expressed (28–31) (Table 1). These genes were examined individually with respect to their functions, and many of them had been proved to be functionally associated with the corresponding lineages (Supplementary Material, Table S4). For example, one such high-methylated gene of blood BLK coding the B lymphoid tyrosine kinase was reported to play key roles during B cell signaling (32). One such low-specific gene of blood LSP1 was known to mediate the apoptosis of B lymphocytes (33). The only such low-methylated gene of brain SOX8 was documented as a modulator of the specification and differentiation of glial in the peripheral nervous system (34). These evidences imply that the cell lineage-specific high- and low-methylated genes play functionally important roles in corresponding cell lineages.

Cell line-specific LCCS methylation patterns

We further examined cell line-specific LCCS methylation patterns in a more detailed view. We first found that cell lines belong to the same cell lineage could be clustered together based on the number of shared cell line-specific high- and low-methylated LCCSs (Supplementary Material, Figs S7 and S8). For example, several skin cell lines including NHDF-neo, BJ and Fibrobl, which shared a significant number of cell line-specific high-methylated LCCSs (P < 0.02), were clustered together (Supplementary Material, Fig. S7). Similarly, several skeletal muscle cell lines including HSMM, HSMMtube, BC_Skeletal_Muscle_01-11002 and BC_Skeletal_Muscle_H12817N were clustered together with a significant number of overlapping cell line-specific low-methylated LCCSs (P < 3.72e − 16) (Supplementary Material, Fig. S8). Thus, we believed that cell lines within the same cell lineage tend to share common patterns.

We then detected cell line-specific methylated LCCSs that are unique to a cell line. Six cell lines have the unique low-methylated ones, and other 48 have both high- and low-methylated unique ones. To better visualize the cell line-specific methylation pattern across all the cell lines, we binarized the methylation profiles of LCCSs (Fig. 4A and B; Supplementary Material, Figs S9 and S10). It is easy to see that blastula cell line HTR8svn has the most (1960) unique specific high-methylated LCCSs (Fig. 4A), and testis cell line BC_Testis_N30 has the most (695) unique specific low-methylated LCCSs (Fig. 4B). It was reported that HTR8svn, a trophoblast cell line which belonged to the blastula tissue, was formed during an early stage of embryonic development in animals, and it was heavily methylated in both Xenopus tropicalis (35) and zebrafish (36). Testis is an important component of male germ system, and it showed eightfold number of hypomethylated loci compared with somatic tissues of adult male mice (37). Thus, the identified cell line-specific LCCSs may relate to the underlying mechanisms of tissue development.

Cell line-specific LCCSs analysis. Cell line-specific (A) high- and (B) low-methylated LCCS patterns were plotted. Each row denotes a LCCS, and each column denotes a cell line. (C) Enrichment analysis of cell line-specific high- and low-methylated LCCSs in cell line-specific chromatin states. The lower triangle and the upper triangle correspond to the cell line-specific high or low ones, respectively. The state numbers are showed in front of the states. Txn represents Transcription here. (D) Scatter plot of methylation changes and gene expression changes of H1-hESC-specific high- and low-methylated genes between the H1-hESC and other cell lines. Gene symbols were labeled for the selected genes.
Figure 4.

Cell line-specific LCCSs analysis. Cell line-specific (A) high- and (B) low-methylated LCCS patterns were plotted. Each row denotes a LCCS, and each column denotes a cell line. (C) Enrichment analysis of cell line-specific high- and low-methylated LCCSs in cell line-specific chromatin states. The lower triangle and the upper triangle correspond to the cell line-specific high or low ones, respectively. The state numbers are showed in front of the states. Txn represents Transcription here. (D) Scatter plot of methylation changes and gene expression changes of H1-hESC-specific high- and low-methylated genes between the H1-hESC and other cell lines. Gene symbols were labeled for the selected genes.

More interestingly, we observed that cell line-specific LCCSs were enriched in distinct chromatin states. Specifically, cell line-specific high-methylated LCCSs were enriched in weak transcriptional state, whereas the cell line-specific low-methylated LCCSs were enriched in active promoter and strong enhancer states (Fig. 4C). Interestingly, H2A.Z has been reported to be enriched in active promoter and strong enhancer states (38), confirming the potential antagonism between DNA methylation and H2A.Z in eukaryotic genomes (39,40).

Comparing the difference of DNA methylation in promoter and difference of gene expression between the considered specific and non-specific cell lines, we observed that genes containing cell line-specific LCCSs tend to be differentially expressed and functionally associated with corresponding cell lines (Fig. 4D and Supplementary Material, Fig. S11). For example, one of the top H1-hESC-specific high-methylated genes, LGALS1, was low expressed compared with other cell lines, and the high expression of LGALS1 is associated with embryonal carcinoma in murine (41,42). Meanwhile, the top H1-hESC-specific low-methylated gene TRIM71 was high expressed, and the hypomethylation of TRIM71 can be considered as a DNA marker that associated with the fetus growth and development (43). These results indicated the functional relevance of detected cell line-specific LCCSs to downstream biological process.

Discussion

In this study, we aimed to reveal common and specific DNA methylation patterns across 54 cell lines. We first identified seven LCCS modules in a weighted LCCS co-methylated network, demonstrating two types of common methylation patterns with distinct methylation profiles and genomic characteristics (Fig. 1). It clearly showed that LCCSs located in different genomic regions would have very similar methylation patterns and those in the same module might have specific biological functional implications. Although this analysis showed that the majority of cell lines in different modules were either high methylated or low methylated, there must still be some differences among the modules in terms of how relatively much or little they were methylated on average in a given cell type. We calculated the average methylation level of all cell lines based on the methylation level of all LCCSs in a given module (Supplementary Material, Fig. S12A) and did a bi-clustering analysis for each module (Supplementary Material, Fig. S12B–D), which could characterize the hypo-, hemi- or hyper-methylation of cell lines in each module. For example, in Module 1, AoSMC (mean PM = 49.5) was the least methylated cell type, and GM12878 (mean PM = 56.07) was the hemi-methylated one (Supplementary Material, Fig. S12B). BC_Testis_N30 cell line (mean PM = 32.01) was the least methylated one in Module 5 (Supplementary Material, Fig. S12C). HTR8svn (mean PM = 75.94) was the most methylated one, and GM12878-Ximat (mean PM = 41.90) was the hemi-methylated one in Module 2 (Supplementary Material, Fig. S12D). We then identified cell lineage-specific and cell line-specific methylated LCCSs including both high- and low-methylated LCCSs (Figs 3 and 4) to investigate the functional diversity and specificity among them. Both cell lineage-specific and cell line-specific methylated LCCS-associated genes reflected distinct functions relating to corresponding cell lineage or cell lines. In addition, we also observed that similar cell lines within the same cell lineage would have more specific LCCSs, which highlighted the importance of examining the hierarchy structures of different cell lineages.

We detected cell lineage-specific high- and low-methylated LCCSs and found that the methylation level (PM) of specifically high-methylated LCCSs was generally higher than 55.7, whereas the one of specifically low-methylated ones were lower than 35.7. In contrast, a recent study, however, showed that loci with PM of ∼40–60 were the most informative predictors for determining cellular identity (44). It will raise the questions on what are the differences between the informative loci for cellular identity and the ones for determining specific cell lines, and how to combine these two different sets to better investigate the dynamic change of DNA methylation pattern across different cell lines and cell lineages, which would be one of our future works.

We explored the cell lineage- and cell line-specific LCCSs-associated genes individually, which were shown to be highly related to the corresponding cell lineage or cell lines. Notably, we reported a set of novel cell lineage- and cell line-specific genes from the view of DNA methylation specificity. For example, WT1 is specifically high methylated in heart lineage, and ZNF750 is specifically low methylated in epithelium-associated cell lines. In addition, we noted that the abnormal methylation statuses of some cell lineage- or cell line-specific methylated genes were associated with many complex diseases. For example, increased methylation of MYF6 was associated with Stage I non-small cell lung cancer (45), and frequently methylated RAX was associated with astrocytomas (46). These may provide new biomarkers for the studies of disease-related epigenetic mechanisms as revealed by differential chromatin modification analysis (47).

DNA methylation was thought as a biomarker of silencing gene expression by increasing methylation on CpGs in promoter regions. However, increasing studies have shown that DNA methylation events in enhancer and other genomic regions may also take important functions. In this study, we observed lots of cell lineage-specific and cell line-specific LCCSs outside of gene promoter regions and CGI regions, which confirming the importance of DNA methylation in enhancer and other regions such as non-coding regions (Supplementary Material, Fig. S13). Recently, it was also revealed that aberrant methylation patterns in lincRNAs might associate with cancer development and progression (48), bringing new clues for the potential function of DNA methylation on non-coding genes. Therefore, the methylation status, especially the tissue-specific methylation status of non-coding RNAs, will be important and valuable direction for further studies.

Materials and Methods

Data

We downloaded all the DNA methylation data of 54 normal cell lines profiled using RRBS and gene expression data of seven cell lines obtained using RNA-seq technique from ENCODE project (http://www.genome.ucsc.edu/ENCODE/) (Supplementary Material, Table S5). We obtained the CGI genomic regions, the genomic features including regions from 2000-base pair (bp) upstream to the transcription start sites (TSS2000), regions from 200 bp upstream to the TSS (TSS200), 5′UTR, 3′UTR, exons, introns, regions from 2000 bp downstream to the transcription end sites (TES) (TES2000) and repeat elements based on hg19 human reference genome from the UCSC Genome Browser (49). The non-CGI regions including CGI shores (up or down to 2 kb from CGI), CGI shelves (up or down to 2–4 kb from CGI) and open seas (>4 kb from CGI) were also defined. We also downloaded known single-nucleotide polymorphism (SNP) from dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/, BUILD_ID=138). We determined the tissue-specific expressed genes by four tissue-related expression datasets (28–31). We obtained inferred chromatin states of human genome (hg18) (15) from http://compbio.mit.edu/ChromatinStates/ and the cell line-specific chromatin states (38) from UCSC Genome Browser. Note that four cell lines have the cell line-specific chromatin states.

Quantifying and filtering DNA-methylated sites

In the ENCODE project, the DNA methylation level of RRBS data at CpG was quantified by the PM value, which means the percentage of molecules that were methylated (14). There are two or three replicates in each cell line, and the correlations of PM values between replicates are 0.987 when restricted to CpG sites with at least 10× coverage (14). For each cell line, we obtained the CpG sites with at least 10× read coverage in each replicate and merged them into one dataset. We then used the mean PM value of each CpG site in different replicates as the final PM value. We removed CpGs that are overlapped with known SNPs from dbSNP138.

Definition of LCCS

We defined a local cluster of CpG sites named LCCS as a region that includes at least three CpG sites, and the range (defined as the maximum distance of any two sites in the LCCS) was <500 bp. Specifically, we first calculated the distances of any two consecutive CpGs and sorted them in an ascending order. We selected a CpG pair with the minimum distance as an initial LCCS and repeatedly added CpGs into it until the size of LCCS was not smaller than 500 bp. We repeated this procedure to identify all the LCCSs and deleted those with less than three sites. We used the mean PM value of all the sites in an LCCS as the PM values of the LCCS in one cell line.

Correlations between LCCS methylation and gene expression

Seven of the 54 cell lines have corresponding RNA-Seq data. We quantified the expression level of each gene by RPKM (reads per kilobase of transcript per million reads) values (50). We computed the Pearson correlation coefficient between methylation level of LCCSs and the RPKM values of the mapped genes to quantify the relationships between gene expression and LCCS methylation.

Construction of weighted LCCS co-methylation network and module detection

We constructed a weighted LCCS co-methylation network using the ‘WGCNA’ R package (51) for N LCCSs with the top 50% variable methylation level (N = 17 638). The constructed co-methylation network was represented by the adjacency matrix A = (aij)N×N, where aij[0,1]. The element aij represents the connection strength between LCCS i and j in the network, and it was calculated by aij=sijβ, where sij=|cor(li,lj)| represents the absolute value of the correlation coefficient between the methylation level profiles of LCCS i and j, and β was a parameter to enable the network approximate the scale-free topology criterion (52). In the WGCNA package, the function adjacency was used to construct the adjacency matrix of the LCCS co-methylation network, and the pickSoftThreshold function was used to estimate the parameter β (β = 3).

We defined LCCS modules as clusters of densely interconnected LCCSs and adopted a hierarchical clustering method using topological overlap matrix and a dynamic branch cut method to obtain it (53,54). The topological overlap measures the similarity of two nodes in the network in terms of commonality of the nodes they connected to (53), which has shown promising performance in several applications (51). The measurement of topological overlap was defined as follows (53):
where ki=uiaiu. To assess the significance of intra-module correlation, we calculated the median of the correlations between any two LCCSs in each module and compared it with those of 1000 random-generated modules of the same size (Supplementary Material, Tables S6 and S7).

Identifying cell line-specific LCCSs

For the l-th LCCS, we denoted the corresponding PM value vector as MEl={PM1,PM2,,PM54}, where PMi represents the PM value of the i-th cell line. We calculated the z-score vector of MEl,z_mel={z1,z2,,z54}, where zi=(PMiμ(MEl))/σ(MEl). We considered that the l-th LCCS was i-th cell line-specific high one if PMi is larger than 50 and zi is larger than 1.7 and that it was i-th cell line-specific low one if PMi is smaller than 50 and zi is smaller than −1.7.

Identifying cell lineage-specific LCCSs

Based on the tissue to which one cell line belongs, we defined 12 cell lineages with at least 2 cell lines in each, covering 43 cell lines in total (Supplementary Material, Table S8). As AoSMC and Myometr both consist mainly of smooth muscle, we have considered them together as members of the smooth muscle lineage. We also considered the BC_Testis_N30 and BC_Uterus_BN0765 in the reproductive cell lineage. Given a cell lineage including li cell lines, we considered an LCCS was cell lineage-specific one if it is cell line specific in at least si=max(2,li/2) cell lines.

Analysis of overlapped specific LCCSs in different cell lines

We performed the hypergeometric test to see whether the number of overlapped LCCSs for both high-specific or low-specific ones of two cell lines i and j were significant:
where M is the total number of LCCSs, Ni and Nj are the number of high-specific (or low-specific) LCCSs in cell line i and j, respectively, and xij is the number of overlapped high-specific (or low-specific) LCCSs between cell line i and j. We performed an unsupervised hierarchical clustering on –log10(pij) using average linkage with Euclidean distance to see whether it can reflect the hierarchical relationship of cell lines.

Mapping LCCSs to genes and different genomic regions

We examined the potential functional relevance of LCCSs by mapping them to CGI, RefSeq genes and other function elements. We assigned an LCCS into CGI or non-CGI groups based on where it fully locates. We mapped an LCCS to chromatin states (15) and different genomic features including TSS2000, TSS200, 3′UTR, 5′UTR, exon, intron, TES2000 and repeat elements as well as non-coding regions by bedtools in a fully overlap mode (55). We converted the coordinates of chromatin states in hg18 to the ones in hg19 by ‘rtracklayer’ R package (56). For the cell line-specific LCCSs, we associated them with cell line-specific chromatin states (38). When examining LCCS-associated genes and their functions, we briefly mapped an LCCS to two different gene regions based on whether it locates in a promoter region (2-kb region upstream and downstream of an RefSeq TSS) or a gene body region (≥2 kb from TSS to TES). We adopted the two-tailed Fisher exact test for all the enrichment analysis and FC to indicate it is depleted (FC < 1) or enriched (FC > 1).

GO enrichment analysis

To assess the functional relevance of LCCSs, we performed the gene ontology (GO) enrichment analysis for the LCCS mapped gene lists via DAVID bioinformatics resources (57). We obtained enrichment lists with Benjamin corrected P-value of < 0.05.

Motif analysis in cell lineage-specific LCCSs

We adopted the HOMER software (http://homer.salk.edu/homer/) (58) to detect the sequence motifs that were enriched in the cell lineage-specific low-methylated LCCSs. The background sequences were generated from all the specific low-methylated LCCSs accordingly.

Cell line-specific LCCSs methylation changes against gene expression changes

For the cell line-specific methylated LCCSs mapped genes, we compared the difference of methylations at their promoter regions as well as gene expressions between the specific and non-specific cell lines. The methylation level of a gene was carried out by averaging all cell line-specific LCCSs within the promoter of the gene. The median value was used to summarize the methylation and expression of genes in non-specific cell lines. The FCs of the methylation and expression of the specific cell line to those of non-specific cell lines were adopted to indicate the difference.

Funding

National Natural Science Foundation of China (No. 61379092, 61422309 to S.Z.; 61432010, 61303118 to L.G.; 31201002 to X.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (XDB13040600), the Outstanding Young Scientist Program of CAS and the Key Laboratory of Random Complex Structures and Data Science at CAS to S.Z.; the Fundamental Research Funds for the Central Universities (No. BDZ021404 to L.G.).

Acknowledgements

X.Y. would like to thank the support of National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, CAS during his visit.

Conflict of Interest statement. None declared.

References

1

Lokk
K.
Modhukur
V.
Rajashekar
B.
Martens
K.
Magi
R.
Kolde
R.
Kolt Ina
M.
Nilsson
T.K.
Vilo
J.
Salumets
A.
et al. (
2014
)
DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns
.
Genome Biol.
,
15
,
R54
.

2

Horvath
S.
(
2013
)
DNA methylation age of human tissues and cell types
.
Genome Biol.
,
14
,
R115
.

3

Sørensen
K.D.
Borre
M.
Ørntoft
T.F.
Dyrskjøt
L.
Tørring
N.
(
2008
)
Chromosomal deletion, promoter hypermethylation and downregulation of FYN in prostate cancer
.
Int. J. Cancer
,
122
,
509
519
.

4

Yamashita
S.
Tsujino
Y.
Moriguchi
K.
Tatematsu
M.
Ushijima
T.
(
2006
)
Chemical genomic screening for methylation-silenced genes in gastric cancer cell lines using 5-aza-2′-deoxycytidine treatment and oligonucleotide microarray
.
Cancer Sci.
,
97
,
64
71
.

5

Mohn
F.
Schübeler
D.
(
2009
)
Genetics and epigenetics: stability and plasticity during cellular differentiation
.
Trends Genet.
,
25
,
129
136
.

6

Wagner
J.R.
Busche
S.
Ge
B.
Kwan
T.
Pastinen
T.
Blanchette
M.
(
2014
)
The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts
.
Genome Biol.
,
15
,
R37
.

7

Ziller
M.J.
Gu
H.
Müller
F.
Donaghey
J.
Tsai
L.T.-Y.
Kohlbacher
O.
De Jager
P.L.
Rosen
E.D.
Bennett
D.A.
Bernstein
B.E.
et al. (
2013
)
Charting a dynamic DNA methylation landscape of the human genome
.
Nature
,
500
,
477
481
.

8

Meissner
A.
Mikkelsen
T.S.
Gu
H.
Wernig
M.
Hanna
J.
Sivachenko
A.
Zhang
X.
Bernstein
B.E.
Nusbaum
C.
Jaffe
D.B.
et al. (
2008
)
Genome-scale DNA methylation maps of pluripotent and differentiated cells
.
Nature
,
454
,
766
770
.

9

Deaton
A.M.
Webb
S.
Kerr
A.R.
Illingworth
R.S.
Guy
J.
Andrews
R.
Bird
A.
(
2011
)
Cell type-specific DNA methylation at intragenic CpG islands in the immune system
.
Genome Res.
,
21
,
1074
1086
.

10

Renner
M.
Wolf
T.
Meyer
H.
Hartmann
W.
Penzel
R.
Ulrich
A.
Lehner
B.
Hovestadt
V.
Czwan
E.
Egerer
G.
et al. (
2013
)
Integrative DNA methylation and gene expression analysis in high-grade soft tissue sarcomas
.
Genome Biol.
,
14
,
r137
.

11

Maruyama
R.
Choudhury
S.
Kowalczyk
A.
Bessarabova
M.
Beresford-Smith
B.
Conway
T.
Kaspi
A.
Wu
Z.
Nikolskaya
T.
Merino
V.F.
et al. (
2011
)
Epigenetic regulation of cell type-specific expression patterns in the human mammary epithelium
.
PLoS Genet.
,
7
,
e1001369
.

12

Bloushtain-Qimron
N.
Yao
J.
Snyder
E.L.
Shipitsin
M.
Campbell
L.L.
Mani
S.A.
Hu
M.
Chen
H.
Ustyansky
V.
Antosiewicz
J.E.
et al. (
2008
)
Cell type-specific DNA methylation patterns in the human breast
.
Proc. Natl Acad. Sci. USA
,
105
,
14076
14081
.

13

Rakyan
V.K.
Down
T.A.
Thorne
N.P.
Flicek
P.
Kulesha
E.
Graf
S.
Tomazou
E.M.
Backdahl
L.
Johnson
N.
Herberth
M.
et al. (
2008
)
An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs)
.
Genome Res.
,
18
,
1518
1529
.

14

Varley
K.E.
Gertz
J.
Bowling
K.M.
Parker
S.L.
Reddy
T.E.
Pauli-Behn
F.
Cross
M.K.
Williams
B.A.
Stamatoyannopoulos
J.A.
Crawford
G.E.
et al. (
2013
)
Dynamic DNA methylation across diverse human cell lines and tissues
.
Genome Res.
,
23
,
555
567
.

15

Ernst
J.
Kellis
M.
(
2010
)
Discovery and characterization of chromatin states for systematic annotation of the human genome
.
Nat. Biotechnol.
,
28
,
817
825
.

16

Yang
X.
Han
H.
De Carvalho
D.D.
Lay
F.D.
Jones
P.A.
Liang
G.
(
2014
)
Gene body methylation can alter gene expression and is a therapeutic target in cancer
.
Cancer Cell
,
26
,
577
590
.

17

Consortium
E.P.
(
2012
)
An integrated encyclopedia of DNA elements in the human genome
.
Nature
,
489
,
57
74
.

18

Vilagos
B.
Hoffmann
M.
Souabni
A.
Sun
Q.
Werner
B.
Medvedovic
J.
Bilic
I.
Minnich
M.
Axelsson
E.
Jaritz
M.
et al. (
2012
)
Essential role of EBF1 in the generation and function of distinct mature B cell types
.
J. Exp. Med.
,
209
,
775
792
.

19

Aspland
S.E.
Bendall
H.H.
Murre
C.
(
2001
)
The role of E2A-PBX1 in leukemogenesis
.
Oncogene
,
20
,
5708
5717
.

20

Perrino
C.
Rockman
H.A.
(
2006
)
GATA4 and the two sides of gene expression reprogramming
.
Circ. Res.
,
98
,
715
716
.

21

Singh
M.K.
Petry
M.
Haenig
B.
Lescher
B.
Leitges
M.
Kispert
A.
(
2005
)
The T-box transcription factor Tbx15 is required for skeletal development
.
Mech. Dev.
,
122
,
131
144
.

22

Francetic
T.
Li
Q.
(
2011
)
Skeletal myogenesis and Myf5 activation
.
Transcription
,
2
,
109
114
.

23

Nishimura
G.
Manabe
I.
Tsushima
K.
Fujiu
K.
Oishi
Y.
Imai
Y.
Maemura
K.
Miyagishi
M.
Higashi
Y.
Kondoh
H.
et al. (
2006
)
DeltaEF1 mediates TGF-beta signaling in vascular smooth muscle cell differentiation
.
Dev. Cell
,
11
,
93
104
.

24

Nechanitzky
R.
Akbas
D.
Scherer
S.
Györy
I.
Hoyler
T.
Ramamoorthy
S.
Diefenbach
A.
Grosschedl
R.
(
2013
)
Transcription factor EBF1 is essential for the maintenance of B cell identity and prevention of alternative fates in committed cells
.
Nat. Immunol.
,
14
,
867
875
.

25

Lee
S.T.
Xiao
Y.
Muench
M.O.
Xiao
J.
Fomin
M.E.
Wiencke
J.K.
Zheng
S.
Dou
X.
de Smith
A.
Chokkalingam
A.
et al. (
2012
)
A global DNA methylation and gene expression analysis of early human B-cell development reveals a demethylation signature and transcription factor network
.
Nucl. Acids Res.
,
40
,
11339
11351
.

26

Haldar
M.
Karan
G.
Tvrdik
P.
Capecchi
M.R.
(
2008
)
Two cell lineages, myf5 and myf5-independent, participate in mouse skeletal myogenesis
.
Dev. Cell
,
14
,
437
445
.

27

Lee
J.H.
Gaetz
J.
Bugarija
B.
Fernandes
C.J.
Snyder
G.E.
Bush
E.C.
Lahn
B.T.
(
2009
)
Chromatin analysis of occluded genes
.
Hum. Mol. Genet.
,
18
,
2567
2574
.

28

Pan
J.B.
Hu
S.C.
Shi
D.
Cai
M.C.
Li
Y.B.
Zou
Q.
Ji
Z.L.
(
2013
)
PaGenBase: a pattern gene database for the global and dynamic understanding of gene function
.
PloS ONE
,
8
,
e80747
.

29

Chang
C.W.
Cheng
W.C.
Chen
C.R.
Shu
W.Y.
Tsai
M.L.
Huang
C.L.
Hsu
I.C.
(
2011
)
Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis
.
PloS ONE
,
6
,
e22859
.

30

Liu
X.
Yu
X.
Zack
D.J.
Zhu
H.
Qian
J.
(
2008
)
TiGER: a database for tissue-specific gene expression and regulation
.
BMC Bioinformatics
,
9
,
271
.

31

Yang
X.
Ye
Y.
Wang
G.
Huang
H.
Yu
D.
Liang
S.
(
2011
)
VeryGene: linking tissue-specific genes to diseases, drugs, and beyond for knowledge discovery
.
Physiol. Genomics
,
43
,
457
460
.

32

Bernal-Quirós
M.
Wu
Y.-Y.
Alarcón-Riquelme
M.E.
Castillejo-López
C.
(
2013
)
BANK1 and BLK act through phospholipase C gamma 2 in B-cell signaling
.
PloS ONE
,
8
,
e59842
.

33

Wong
M.J.
Malapitan
I.A.
Sikorski
B.A.
Jongstra
J.
(
2003
)
A cell-free binding assay maps the LSP1 cytoskeletal binding site to the COOH-terminal 30 amino acids
.
Biochim. Biophys. Acta
,
1642
,
17
24
.

34

Kordes
U.
Cheng
Y.C.
Scotting
P.J.
(
2005
)
Sox group E gene expression distinguishes different types and maturational stages of glial cells in developing chick and mouse
.
Brain Res. Dev. Brain Res.
,
157
,
209
213
.

35

Bogdanović
O.
Long
S.W.
van Heeringen
S.J.
Brinkman
A.B.
Gómez-Skarmeta
J.L.
Stunnenberg
H.G.
Jones
P.L.
Veenstra
G.J.C.
(
2011
)
Temporal uncoupling of the DNA methylome and transcriptional repression during embryogenesis
.
Genome Res.
,
21
,
1313
1327
.

36

Andersen
I.S.
Reiner
A.H.
Aanes
H.
Alestrom
P.
Collas
P.
(
2012
)
Developmental features of DNA methylation during activation of the embryonic zebrafish genome
.
Genome Biol.
,
13
,
R65
.

37

Oakes
C.C.
La Salle
S.
Smiraglia
D.J.
Robaire
B.
Trasler
J.M.
(
2007
)
A unique configuration of genome-wide DNA methylation patterns in the testis
.
Proc. Natl Acad. Sci. USA
,
104
,
228
233
.

38

Ernst
J.
Kheradpour
P.
Mikkelsen
T.S.
Shoresh
N.
Ward
L.D.
Epstein
C.B.
Zhang
X.
Wang
L.
Issner
R.
Coyne
M.
et al. (
2011
)
Mapping and analysis of chromatin state dynamics in nine human cell types
.
Nature
,
473
,
43
49
.

39

Zilberman
D.
Coleman-Derr
D.
Ballinger
T.
Henikoff
S.
(
2008
)
Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks
.
Nature
,
456
,
125
129
.

40

Conerly
M.L.
Teves
S.S.
Diolaiti
D.
Ulrich
M.
Eisenman
R.N.
Henikoff
S.
(
2010
)
Changes in H2A.Z occupancy and DNA methylation during B-cell lymphomagenesis
.
Genome Res.
,
20
,
1383
1390
.

41

Lu
Y.
Lotan
R.
(
1999
)
Transcriptional regulation by butyrate of mouse galectin-1 gene in embryonal carcinoma cells
.
Biochim. Biophys. Acta
,
1444
,
85
91
.

42

Lu
Y.
Lotan
D.
Lotan
R.
(
2000
)
Differential regulation of constitutive and retinoic acid-induced galectin-1 gene transcription in murine embryonal carcinoma and myoblastic cells
.
Biochim. Biophys. Acta
,
1491
,
13
19
.

43

Wang
H.D.
Hou
Q.F.
Guo
Q.N.
Li
T.
Wu
D.
Zhang
X.P.
Chu
Y.
He
M.
Xiao
H.
Guo
L.J.
et al. (
2014
)
DNA methylation study of fetus genome through a genome-wide analysis
.
BMC Med. Genomics
,
7
,
18
.

44

Bock
C.
Beerman
I.
Lien
W.H.
Smith
Z.D.
Gu
H.
Boyle
P.
Gnirke
A.
Fuchs
E.
Rossi
D.J.
Meissner
A.
(
2012
)
DNA methylation dynamics during in vivo differentiation of blood and skin stem cells
.
Mol. Cell
,
47
,
633
647
.

45

Zhao
Y.
Zhou
H.
Ma
K.
Sun
J.
Feng
X.
Geng
J.
Gu
J.
Wang
W.
Zhang
H.
He
Y.
et al. (
2013
)
Abnormal methylation of seven genes and their associations with clinical characteristics in early stage non-small cell lung cancer
.
Oncol. Lett.
,
5
,
1211
1218
.

46

Wu
X.
Rauch
T.A.
Zhong
X.
Bennett
W.P.
Latif
F.
Krex
D.
Pfeifer
G.P.
(
2010
)
CpG island hypermethylation in human astrocytomas
.
Cancer Res.
,
70
,
2718
2727
.

47

Chen
C.
Zhang
S.
Zhang
X.S.
(
2013
)
Discovery of cell-type specific regulatory elements in the human genome using differential chromatin modification analysis
.
Nucl. Acids Res.
,
41
,
9230
9242
.

48

Zhi
H.
Ning
S.
Li
X.
Li
Y.
Wu
W.
Li
X.
(
2014
)
A novel reannotation strategy for dissecting DNA methylation patterns of human long intergenic non-coding RNAs in cancers
.
Nucl. Acids Res.
,
42
,
8258
8270
.

49

Karolchik
D.
Barber
G.P.
Casper
J.
Clawson
H.
Cline
M.S.
Diekhans
M.
Dreszer
T.R.
Fujita
P.A.
Guruvadoo
L.
Haeussler
M.
et al. (
2014
)
The UCSC Genome Browser database: 2014 update
.
Nucl. Acids Res.
,
42
,
D764
D770
.

50

Mortazavi
A.
Williams
B.A.
McCue
K.
Schaeffer
L.
Wold
B.
(
2008
)
Mapping and quantifying mammalian transcriptomes by RNA-Seq
.
Nat. Methods
,
5
,
621
628
.

51

Langfelder
P.
Horvath
S.
(
2008
)
WGCNA: an R package for weighted correlation network analysis
.
BMC Bioinformatics
,
9
,
559
.

52

Zhang
B.
Horvath
S.
(
2005
)
A general framework for weighted gene co-expression network analysis
.
Stat. Appl. Genet. Mol. Biol.
,
4
,
Article17
.

53

Yip
A.M.
Horvath
S.
(
2007
)
Gene network interconnectedness and the generalized topological overlap measure
.
BMC Bioinformatics
,
8
,
22
.

54

Langfelder
P.
Zhang
B.
Horvath
S.
(
2008
)
Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R
.
Bioinformatics
,
24
,
719
720
.

55

Quinlan
A.R.
Hall
I.M.
(
2010
)
BEDTools: a flexible suite of utilities for comparing genomic features
.
Bioinformatics
,
26
,
841
842
.

56

Lawrence
M.
Gentleman
R.
Carey
V.
(
2009
)
rtracklayer: an R package for interfacing with genome browsers
.
Bioinformatics
,
25
,
1841
1842
.

57

Huang da
W.
Sherman
B.T.
Lempicki
R.A.
(
2009
)
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
.
Nat. Protoc.
,
4
,
44
57
.

58

Heinz
S.
Benner
C.
Spann
N.
Bertolino
E.
Lin
Y.C.
Laslo
P.
Cheng
J.X.
Murre
C.
Singh
H.
Glass
C.K.
(
2010
)
Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities
.
Mol. Cell
,
38
,
576
589
.

Supplementary data