Breast cancer is a highly heterogeneous disease that is characterized by genetic and epigenetic aberrations; however, our knowledge of epigenetic alterations of breast cancer subtypes remains limited. Here, we portrayed and compared the alterations of six types of histone modifications and DNA methylation between two breast cancer subtypes, luminal and basal. Widespread subtype-specific epigenetic alterations were observed in both subtypes, which preferentially occurred within CpG islands (CGIs) and promoter regions. Specifically, aberrant DNA methylation was mostly located inside CGIs in luminal subtype, whereas in basal subtype it was principally located within CGI shores. Moreover, different types and combinatorial patterns of epigenetic alterations were found to occupy at promoter regions between these two subtypes. And these epigenetic alterations regulated corresponding gene expression in a synergetic way in both subtypes. Functional enrichment analysis highlighted that epigenetically dysregulated genes were significantly involved in the hallmarks of cancers, most of which were subtype specific. Even genes involved in the same hallmarks associated biological processes were affected by various types of epi-modifications in different subtypes. Finally, we revealed distinct patterns of oncogenic pathways activation in different subtypes and provided novel insights into subtype specific therapeutic opportunities. In addition, genes in the key signaling pathways were able to discriminate between disease phenotypes, and subtype-specific progression associated genes were identified. This study presents the aberrant epigenetic patterns of breast cancer subtypes at a genome-wide level, which will be a highly valuable resource for investigations at understanding epigenetic regulation of breast cancer subtypes.

## INTRODUCTION

Breast cancer, a leading cause of cancer-related mortality in women worldwide (1), is considered to be a genetically heterogeneous group of diseases. Both genetic and acquired epigenetic abnormalities participate in the development of breast cancer, accurate understanding of subtype-specific epigenetic regulatory programs that depend on DNA methylation and chromatin (histone) modification, holds important implications for cancer diagnosis, and identification of subtype-specific new targets for therapy.

The classification of breast cancers on the basis of histological criteria is confounded by a number of factors [estrogen receptor (ER), progesterone receptor (PR) and HER2/ERBB2] (2). In recent years, the application of molecular profiling technologies, in particular DNA microarrays for measuring mRNA expression, has added to our knowledge of breast cancer subtypes. Breast cancers have been classified into distinguishing three molecular subtypes based on the levels of mRNA expression of specific genes-luminal (luminal A and luminal B), HER2 and basal (basal A and basal B) (35). The HER2 subtype is often associated with mutated or amplified ERBB2 (encoding HER2) and has had some degree of clinical success because of the effective therapeutics that can target ERBB2. The luminal subtype often shares expression markers with the luminal epithelial layer of cells lining normal breast ducts and is characterized by the expression of ER, which is not expressed in the basal subtype (6). The basal subtype is often ER negative, shares expression markers with the underlying basal (myoepithelial) layer of normal breast ducts, exhibits frequent chromosome segmental gains/losses and is associated with poor clinical outcome in most studies (6). This molecular approach of categorizing breast cancers represents a paradigm shift in how we consider the origins and categorization of breast cancer. However, we are far from having a complete picture of the diversity of breast tumors.

While numerous genetic mutations are linked to breast cancer (7), accumulating evidences have suggested that abnormal alterations of epigenetic modifications are hallmarks of human cancers (8). The epigenetic regulation by DNA methylation and post-translational modifications of histone are tightly associated with corresponding gene expression in human genome (9). Many studies have explored the mosaic patterns of DNA methylation and histone modification in cancer cells on a gene-by-gene basis (10). Among their results have indicated that cancers display perturbed epigenome compared with normal tissues. Moreover, increasing studies have shown that well-defined DNA methylation profiles enable breast cancer subtype prediction (11, 12). Using an array-based platform with more than 800 cancer-related genes, Holm et al. (13) have revealed that the methylation frequencies were significantly different between breast cancer subtypes, and a large fraction of genes reported as having subtype-specific expression patterns might be regulated through methylation. Besides DNA methylation, variations in global levels of histone marks in different tumor grades, morphologic types and subtypes of breast cancers have been observed and these differences have been shown to have clinical significance (14). All these observations point that broad epi-modification alteration events emerge in cancer cells and offer partially different visions of how DNA methylation and histone modifications are perturbed in various subtypes of breast cancer. However, the involvement of the epigenome in breast cancer and its contribution to the complexity of the disease are still poorly understood.

One practical way to approach these questions is through conducting an analysis that integrates multiple types of epigenomic and transcriptomic data from the same individual cancer cells. The development of high-throughput sequencing technology provides us an opportunity to investigate the epigenetic mechanism in breast cancer subtypes at a high resolution. In this study, we portrayed and compared the alterations of six types of histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3 and H3K27me3) and DNA methylation between two cell lines that have been shown to represent two different breast cancer subtypes—luminal and basal (1517). Studies have indicated that these cell lines, as a system, actually mirror many of the biological and genomic properties found within primary human tumors (16, 18). In order to keep the description compact, we referred these two cell lines as luminal and basal subtypes in our study, respectively. Comparative analysis of epigenetic alterations between the two subtypes revealed widespread distinct patterns of epi-modification alterations, particularly in promoter regions. In addition, we found a combinatory effect of different epigenetic modifications on gene expression in both subtypes. Finally, we showed that alterations of epi-modifications in the two subtypes not only affect genes involved in common cancer biology, but also regulate genes participating in subtype-specific functions. Through integrative analysis of different genetic and epigenetic data types, we revealed distinct patterns of oncogenic pathways activation in different subtypes and provided novel insights into subtype-specific therapeutic opportunities.

## RESULTS

### Breast cancer subtype-specific marker genes are characterized by epigenetic states

Breast cancers have been classified into three molecular subtypes—luminal, HER2 and basal (basal A and basal B); here, we mainly focused on two breast cancer cell lines (MCF-7 and HCC1954) that represent two subtypes, luminal and basal. As one of the typical representatives of luminal subtype, MCF-7 is characterized by ER+/PR+ and HER2-, and often associated with favorable outcome. While HCC1954 is ER-/PR- and HER2+ breast cancer, which is classified into basal subtype, and with poor clinical outcome. Although these two subtypes have been shown to be with different immunohistochemistry profiles, the underlying mechanism of this is largely unknown.

First, we examined the expression of ER, PR, ERBB2 genes in these two subtypes, and compared with one normal sample (HMEC cell line). In accordance with previous reports, ER and PR were found to highly express in luminal subtype than that of control (FCER = 855.5, FCPR = 1874.6), while ERBB2 was with similar expression between luminal subtype and control (Fig. 1A). In contrast, ERBB2 was highly expressed in basal subtype compared with control (FCERBB2 = 157.4), whereas ER and PR were with similar expression levels (Fig. 1A). Although some previous studies have observed genetic changes of ER, PR and ERBB2 in different breast cancer subtypes (19, 20), regulation of gene expression is a complex process controlled by several molecular mechanisms, including modification of chromatin structure and DNA methylation (21). Next, we explored the status of epi-modifications located within promoters of ER, PR and ERBB2 genes in these two subtypes, relative to control, respectively. In total, seven epigenetic modifications were investigated in our study, including six types of histone modification and DNA methylation. Of the seven epigenetic modifications, four (H3K4me1, H3K4me3, H3K27ac and H3K36me3) were epi-marks associated with active transcription while the other three (H3K9me3, H3K27me3 and DNA methylation) were associated with gene repression (22, 23). As a result, different patterns of alterations of epi-modifications between luminal and basal subtypes were observed (Fig. 1B and C). In line with the up-regulation of ER and PR in luminal subtype, we found that H3K4me3 and H3K27ac were up-regulated and H3K27me3 was down-regulated. However, the epigenetic patterns of ER and PR in this subtype were different from those in basal subtype (Fig. 1B). Consistent with ERBB2 high expression in basal subtype, epigenetic alterations were found within all the six types of histone modifications (Fig. 1B and C). However, only the level of H3K27ac for ERBB2 was minorly altered in luminal subtype, which might contribute to less change of gene expression (Fig. 1C). In addition, ERBB2 amplification was observed in basal subtype but not luminal (24). These observations demonstrated clear differences of epigenetic patterns in the promoters of known marker genes between breast cancer subtypes, and the expression of marker genes were regulated by genetics and epigenetics.

Figure 1.

Distinctive epigenetic alteration patterns of breast cancer marker genes. (A) The expression patterns of ER, PR and ERBB2 in control and two breast cancer subtypes. (B) The epi-modification alteration patterns of ER, PR and ERBB2 in luminal and basal subtypes. Red, green and black circles separately indicate up-regulation, down-regulation and no-difference of corresponding epi-modifications. (C) Examples of histone modification patterns of ER, PR and ERBB2. Y-axis shows tag counts across the promoter regions, and X-axis represents the promoter regions of each gene.

Figure 1.

Distinctive epigenetic alteration patterns of breast cancer marker genes. (A) The expression patterns of ER, PR and ERBB2 in control and two breast cancer subtypes. (B) The epi-modification alteration patterns of ER, PR and ERBB2 in luminal and basal subtypes. Red, green and black circles separately indicate up-regulation, down-regulation and no-difference of corresponding epi-modifications. (C) Examples of histone modification patterns of ER, PR and ERBB2. Y-axis shows tag counts across the promoter regions, and X-axis represents the promoter regions of each gene.

### Extensive subtype-related epigenetic dysregulation in breast cancer

To investigate subtype-specific epigenetic modification profiles, we employed a hidden Markov model to infer the states of histone modification changes at each genomic location in each subtype (details in Materials and Methods). As a result, thousands of differential histone modification sites (DHMSs) were identified in both luminal and basal subtypes, compared with control, respectively (Table 1). And then consecutive DHMSs with no gaps between them were merged into differential histone modification regions (DHMRs, details in Materials and Methods). In addition, 20 499 and 212 131 cytosines were differentially methylated (DMCs) in these two subtypes, respectively, when compared with control. These results suggested that epigenetic changes in breast cancers appeared to be considerably frequent events.

Table 1.

Numbers of differentially modified bins and regions of seven epi-modifications in breast cancer subtypes

Epi-modifications Luminal (ER+, PR+, HER2)

Basal (ER, PR, HER2+)

DHMSs DHMRs DHMSs DHMRs
H3K4me1 207101 (114038/93063) 74919 (34704/40215) 81743 (51575/30168) 29153 (16815/12338)
H3K4me3 39737 (21180/18557) 17248 (6935/10313) 22680 (10432/12248) 10607 (4051/6556)
H3K27ac 140932 (80289/60643) 41177 (16715/24462) 111450 (56243/55207) 43266 (19986/23280)
H3K36me3 68493 (51558/16935) 5795 (3028/2767) 77563 (67318/10245) 9250 (5858/3392)
H3K9me3 97522 (28713/68809) 13186 (3184/10002) 64537 (43138/21399) 6018 (3083/2935)
H3K27me3 168094 (98274/69820) 25636 (9587/16049) 162132 (101578/60554) 22349 (8143/14206)
Cm 20499 (13244/7255)  212131 (48050/164081)
Epi-modifications Luminal (ER+, PR+, HER2)

Basal (ER, PR, HER2+)

DHMSs DHMRs DHMSs DHMRs
H3K4me1 207101 (114038/93063) 74919 (34704/40215) 81743 (51575/30168) 29153 (16815/12338)
H3K4me3 39737 (21180/18557) 17248 (6935/10313) 22680 (10432/12248) 10607 (4051/6556)
H3K27ac 140932 (80289/60643) 41177 (16715/24462) 111450 (56243/55207) 43266 (19986/23280)
H3K36me3 68493 (51558/16935) 5795 (3028/2767) 77563 (67318/10245) 9250 (5858/3392)
H3K9me3 97522 (28713/68809) 13186 (3184/10002) 64537 (43138/21399) 6018 (3083/2935)
H3K27me3 168094 (98274/69820) 25636 (9587/16049) 162132 (101578/60554) 22349 (8143/14206)
Cm 20499 (13244/7255)  212131 (48050/164081)

Numbers in the bracket are the numbers of up-/down-regulated epi-modifications.

Although there were so many epigenetic changes in both subtypes, cancer subtype-specific epigenetic patterns were evident. Overall, we found that the luminal subtype displayed profound H3K4me1 changes, while the basal subtype tended to show prominent aberrant DNA methylation (Table 1). In particular, there were about three times as many H3K4me1 DHMRs in luminal as in basal subtype. In contrast, the number of DMCs in luminal subtype was just 1/10 of that in basal subtype. Another intriguing finding was that although there were more H3K27ac DHMSs in luminal subtype, the number of DHMRs was lower than that of basal subtype. This finding indicated that there was a more consecutive distribution of H3K27ac DHMSs in luminal subtype. We found that the average length of the H3K27ac DHMRs in luminal subtype is 3422.6 bp, which is significantly larger than that of basal subtype (average length is 2575.9 bp, P < 2.2E−16, rank sum test). In order to systematically understand how these epigenetic changes were distributed in these two subtypes, we interrogated genomic locations of DHMSs and DMCs in both subtypes. Notably, most of the DHMSs and DMCs were non-overlapping and thus unique to either luminal or basal subtype (Supplementary Material, Fig. S1). More specifically, 207 101 H3K4me1 DHMSs were identified in luminal subtype, of which the majority, or 84.1%, was unique and non-overlapping with basal subtype. Even more strikingly, 99.2% of the DMCs were specific to basal subtype, whereas 91.2% of cytosine sites were specifically aberrantly methylated in luminal subtype. Although the majority of DHMSs and DMCs in these two breast cancer subtypes affect different genomic regions, there remains a core set of commonly affected genomic regions. Of these, the vast majority of DHMSs (96.8–98.4%) were coordinately epigenetically changed in the same direction in both breast cancer subtypes. And among these concordant DHMSs, the majority of H3K36me3 DHMSs were up-regulated compared with control (74.0%, n = 12 841), whereas others were down-regulated (53.1–88.0%). In addition, of the 1803 common differentially methylated cytosine (DMC) sites in luminal and basal subtypes, 99.7% (n = 1798) were differentially methylated in the same direction in both subtypes, and the majority of these DMCs were aberrantly hypermethylated versus control (61.5%, n = 1106) (Supplementary Material, Fig. S1G).

Next, we examined the distributions of epigenetic changes across chromosomes. Specifically, we found that the luminal subtype displayed profound hypermethylation of cytosines distributed across all chromosomes detected (Supplementary Material, Fig. S2G). In contrast, comparison of the cytosine methylation profiles of the basal subtype with control identified a predominance of aberrantly hypomethylated sites. Moreover, the large difference of epigenetic changes of these two breast cancer subtypes was observed on chromosomes 8 and 20 (Supplementary Material, Figs S2A-F). Especially, most of the H3K4me1 and H3K4me3 DHMSs were up-regulated on chromosome 20 in luminal subtype, while most of the up-regulated DHMSs were located on chromosome 8 in basal subtype. Aberrations in chromosomes 8 and 20 are common in breast cancer, and numbers of genes (such as BCAS1 and RHOBTB2) located on these two chromosomes were found to be functionally dysregulated in breast cancer (25, 26). Of note, a previous study found that luminal breast cancers had more genomic alterations on chromosome 8 (27), indicating that genomic and epigenetic mechanisms may act in a complementary manner to some extent.

Altogether, these results suggest two layers of epigenetic programming in breast cancer; besides the common epigenetic defects representative of breast cancer, we demonstrated the existence of robust breast cancer subtype-specific epigenetic alterations, which may extend beyond distribution patterns to include regulation of genes in a subtype-specific way.

### Distinct genomic distribution of aberrant epigenetic modifications in breast cancer subtypes

Most of previous studies related to epigenetics in breast cancer were restricted to promoter microarrays (12) or locus specific assays (5) that do not provide wide-spread resolution. In this study, we integrated available epigenetic modifications, allowing us to explore distinct epigenetic patterns of these two breast cancer subtypes simultaneously. The pronounced inter-subtype differences of epi-modification alterations provoked the question that whether genomic features were involved in such differences. Then the distribution of the intensities for each epi-modification on each type of genomic feature was summarized (Fig. 2A). Our data revealed that although the consistency levels varied from modification to modification, the relative difference in epi-modification intensities on different genomic features are general across these two subtypes. Promoters and CpG islands (CGIs) represented the regions that are with the highest variability in chromatin modifications in both breast cancer subtypes. Besides these two genomic features, we found that most of the H3K36me3 DHMRs were mapped to coding sequences, which was consistent with its association with gene bodies (23). Although most of the epigenetically aberrant genomic features were with small overlap in breast cancer subtypes, the overlap of CGI associated features were relatively high in the H3K4me3, H3K27ac and H3K27me3. Specifically, about 8.54, 7.94 and 10.50% of the CGIs were covered by H3K4me3, H3K27ac and H3K27me3 DHMRs, respectively. In addition, most of the chromatin modifications of the overlap-CGIs were down-regulated in breast cancer.

Figure 2.

The epigenetic alteration patterns of genomic features in breast cancer subtypes. (A) The percent of genomic features with aberrant epi-modifications. Up-regulation and down-regulation were represented by red and green, respectively. ‘Overlap’ column shows proportion of genomic feature harboring altered epi-modifications shared by two subtypes. (B) Barplots representing the percentage of CGI, CGI shores overlapping with DMCs. Significantly higher proportion of CGIs were overlapping with DMCs in luminal subtype over CGI shores, while CGI shores were the most frequently affected regions in basal subtype. (C) Aberrant methylation targets a minimally overlapping set of CGIs in two subtypes. Venny diagram representing differentially methylated CGIs. The horizontal barplots compared the methylation status of CGIs in luminal (top) and basal subtypes (bottom). Although most CGIs are non-overlapping between the two subtypes, among the smaller set of CGIs that do overlap between the two breast cancer subtypes, the vast majority were concordantly changed, with a clear predominance of aberrant hypermethylation.

Figure 2.

The epigenetic alteration patterns of genomic features in breast cancer subtypes. (A) The percent of genomic features with aberrant epi-modifications. Up-regulation and down-regulation were represented by red and green, respectively. ‘Overlap’ column shows proportion of genomic feature harboring altered epi-modifications shared by two subtypes. (B) Barplots representing the percentage of CGI, CGI shores overlapping with DMCs. Significantly higher proportion of CGIs were overlapping with DMCs in luminal subtype over CGI shores, while CGI shores were the most frequently affected regions in basal subtype. (C) Aberrant methylation targets a minimally overlapping set of CGIs in two subtypes. Venny diagram representing differentially methylated CGIs. The horizontal barplots compared the methylation status of CGIs in luminal (top) and basal subtypes (bottom). Although most CGIs are non-overlapping between the two subtypes, among the smaller set of CGIs that do overlap between the two breast cancer subtypes, the vast majority were concordantly changed, with a clear predominance of aberrant hypermethylation.

When considering the relation of DMCs to genomic features, we found that the promoters and CGI-related features were the main target regions of aberrant methylation in both breast cancer subtypes (Fig. 2A). However, more detailed analysis identified markedly dissimilar distribution of DMCs at CGI-associated features between luminal and basal subtypes. In luminal subtype, CGIs were more frequently overlapping with DMCs, whereas CGI shores harbored more DMCs than CGIs did in basal subtype (Fig. 2B). This result demonstrated the preferential localization of DMCs in different breast cancer subtypes, where variability of methylation shifts its focus to different genomic features. Association between altered DNA methylation patterns and the CGIs has been found in many human tumors and other biological processes (22, 28, 29). Next, we comprehensively investigated the aberrantly methylated CGI-associated features in two breast cancer subtypes. Notably, 2136 (7.78%) CGIs were aberrantly methylated in luminal subtype, of which the majority of CGIs, or 89.5%, was unique. In the case of basal subtype, there were 600 (2.19%) CGIs covered by DMCs, 62.7% of which were unique and non-overlapping with luminal subtype (Fig. 2C). By examining the 224 common differentially methylated CGIs in both two subtypes, we found that 96.4% (n = 216) were coordinately differentially methylated in the same direction, and the majority of these CGIs were hypermethylated when compared with control (94%, n = 203). In addition, the results of aberrantly methylated 3′-shores and 5′-shores were similar as the CGIs (Supplementary Material, Fig. S3). The distribution of aberrantly methylated CGIs in these two breast cancer subtypes indicated that the methylation of these CGIs was not only perturbed in opposite patterns, but was also associated with an almost completely distinct set of CGIs and/or CGI shores.

### Subtype-specific epigenetic modification patterns of promoters

As a relatively high proportion of promoters had epigenetic alterations, we next dissected the epigenetic alterations in this special genomic region in details. First, promoter regions of genes overlapping with DHMRs or DMCs were identified. Using this approach, about 53.1 and 46.8% promoters that were covered by at least one type of epigenetic changes were identified as epigenetic-changing promoters in luminal and basal subtypes. Promoters with only one type of epi-modification alterations contributed a large splice (43.0 and 57.5% in luminal and basal subtype, respectively) of epigenetic-changing promoters in both breast cancer subtypes (Fig. 3A). The remaining large proportion of promoters was regulated by the combinations of two or more epi-modifications, indicating that a considerable amount of intricate ones that might regulate some special DNA segments.

Figure 3.

Distinctive co-occupancy patterns of epi-modification alterations in two breast cancer subtypes. (A) The proportion of promoters with different numbers of epi-genetic alterations in two subtypes. (B) The number of co-occupancy of the indicated two epi-modifications in two subtypes. (C) The number of co-occupancy of the indicated three epi-modifications in luminal and basal subtypes.

Figure 3.

Distinctive co-occupancy patterns of epi-modification alterations in two breast cancer subtypes. (A) The proportion of promoters with different numbers of epi-genetic alterations in two subtypes. (B) The number of co-occupancy of the indicated two epi-modifications in two subtypes. (C) The number of co-occupancy of the indicated three epi-modifications in luminal and basal subtypes.

Although some similar distributions of global epi-modification alterations within promoters were observed in luminal and basal subtypes, there still existed remarkably different patterns between them. In case of single epi-modification alterations, different types of modifications had distinctive extent of contributions to luminal or basal subtype. In particular, a large proportion of promoters were covered by H3K4me1, H3K27ac and H3K27me3 in luminal subtype, while in basal subtype, H3K27ac, DNA methylation and H3K4me3 made a chief contribution (Fig. 3A). It has been shown that chromatin features possess a certain level of redundancy and that certain chromatin features may work in a combinatorial fashion (30). We then asked whether the co-occupancy of two epi-marks is similar across these two breast cancer subtypes. About 32.6 and 26.3% of the epigenetic alteration promoters were bivalent alterations in luminal and basal subtypes, respectively (Fig. 3A). However, the bivalent alteration patterns in two subtypes were different. In luminal subtype, H3K4me1/H3K27ac, H3K4me1/H3K27me3 and H3K4me3/H3K27ac were the most frequently bivalent alterations. Whereas the most frequently bivalent alterations were H3K4me3/H3K27ac, H3K27ac/DNA methylation and H3K4me3/H3K27me3 in the basal subtype (Fig. 3B and Supplementary Material, Table S1). Moreover, H3K27ac was most likely to be altered together with other epi-modifications in basal subtype, while H3K4me1 co-altered with others most frequently in the luminal subtype (Fig. 3B and Supplementary Material, Table S1). In contrast, we found that some bivalent modification pairs rarely co-altered in both two subtypes, such as H3K4me3-H3K36me3. This observation is supported by various results of previous studies that H3K4me3 and H3K36me3 seem to have mutually exclusive localizations, and H3K4me3 might inhibit the formation of H3K36me3 (31). When we compared the distribution of three co-located epi-modification alterations, H3K4me1/H3K4me3/H3K27ac, H3K4me1/H3K27ac/H3K27me3 and H3K4me1/H3K4me3/H3K27me3 were the most frequently present in luminal subtype, while for basal subtype, the most frequently present ones were H3K4me1/H3K4me3/H3K27ac, H3K4me3/H3K27ac/DNA methylation and H3K4me1/H3K27ac/DNA methylation (Fig. 3C and Supplementary Material, Table S2).

However, these findings indicated that these two subtypes are under epigenetic regulation of discriminating types or patterns of epi-modification alterations, showing distinctive contributions to separate subtypes, which may lead to different biological consequences that possibly result in divergent progresses of these two subtypes.

### Coordinative epigenetic states correlate with transcriptional changes

The gene expression program is tightly managed by a dynamic chromatin environment, in which epigenetic factors, such as histone modifications and DNA methylation, play a crucial role in determining. Alterations of epi-modifications may lead to changes in gene expression (21, 22). Next, multiple approaches were used to assess the relationship between epigenetic alterations and gene expression. We found that correlations of global gene expression between two subtypes and the normal were elevated when genes with epi-modification alterations were not considered (Fig. 4A and B, in all instances except H3K9me3 the differences were significant-P < 0.05 by the Fisher R to Z transformation test), indicating that these epi-modification alterations might impact the transcriptional process of genes. Especially, the alterations of H3K4me3 showed the greatest impact on gene expressions in both subtypes. This result was consistent with the observation that H3K4me3 is one of the most important features for prediction of gene expression levels (32).

Figure 4.

Alterations of epi-modifications affect gene expression in a coordinative manner in both subtypes. (A) Global expression correlation coefficients between cancer and control for all genes and genes remaining following removal of each epi-alteration genes. In all instances except H3K9me3, the differences are significant (P < 0.05 by Fisher R to Z transformation test). (B) The expression alteration of genes with different chromatin pattern categories. X-axis indicates the genes with different numbers and types of epi-alterations and Y-axis shows the expression fold changes. (C) Relative expression of genes with the epi-alteration of H3K4me3 and H3K27ac. Genes were categorized into seven groups according to the epigenetic alteration patterns, i.e., H3K4me3 up-regulation and H3K27ac up-regulation abbreviated with (1,1).

Figure 4.

Alterations of epi-modifications affect gene expression in a coordinative manner in both subtypes. (A) Global expression correlation coefficients between cancer and control for all genes and genes remaining following removal of each epi-alteration genes. In all instances except H3K9me3, the differences are significant (P < 0.05 by Fisher R to Z transformation test). (B) The expression alteration of genes with different chromatin pattern categories. X-axis indicates the genes with different numbers and types of epi-alterations and Y-axis shows the expression fold changes. (C) Relative expression of genes with the epi-alteration of H3K4me3 and H3K27ac. Genes were categorized into seven groups according to the epigenetic alteration patterns, i.e., H3K4me3 up-regulation and H3K27ac up-regulation abbreviated with (1,1).

In order to investigate the associations between epigenetic alterations and gene expression patterns, we evaluated each epigenetic modification for each gene in relation to its expression pattern in two subtypes. We first categorized genes into four different groups according to the number of different types of epi-modification alterations co-located within regions of their promoters: genes with single type of epi-modification alterations, genes with two types of epi-modification alterations, genes with three types of epi-modification alterations and genes with no less than four types of epi-modification alterations. Genes in each of the four groups described above were subsequently divided into different subgroups, according to the effect of the alterations of epi-modifications on the transcription of corresponding genes. Marks of H3K4me1, H3K4me3, H3K27ac and H3K36me3 were generally associated with transcriptional activeness, whereas H3K9me3, H3K27me3 and DNA methylation were associated with transcriptional repression (23,33,34). In the ‘active’ subgroup, genes were all with up-regulated active epi-modifications and/or down-regulated repressive epi-modifications. Genes in the ‘repressive’ subgroup were with up-regulated repressive epi-modifications and/or down-regulated active epi-modifications. The ‘indeterminate’ subgroup encompassed genes with opposite regulation direction of epi-modification alterations, such as up-regulated active epi-modifications and up-regulated repressive epi-modifications, whose effect on gene expression was theoretically indeterminate. The global expression level of genes in ‘active’ groups was increased, while expression levels of genes within ‘repressive’ groups were found to decrease in both subtypes (Fig. 4C and D). Furthermore, as the number of types of altered epi-modifications involved within the promoters increased, the effect of the alterations on transcriptions became more apparent (Fig. 4C and D). Specifically, genes with H3K4me3 and H3K27ac alterations were with the highest changes in gene expression, which were higher than those with H3K4me3 or H3K27ac alterations in both subtypes (Fig. 4E and F). Although for H3K4me3, a function during transcription initiation has been proposed, a similar function has not been established for H3K27ac. A possible explanation of the effect of H3K27ac on gene expression is that H3K27ac might prevent the repressive tri-methylation of the same residue, because H3K27ac and H3K27me3 are mutually exclusive (32). Indeed, we found only few genes were with co-alterations of H3K27ac and H3K27me3 (Fig. 3B). Alternatively, H3K27ac itself may be recognized by a protein complex required for transcription, thus activating the expression of downstream genes (32). Although the alterations of a single modification can affect the gene expression, not both these two modifications are equally important to the effects of gene expression, and the effect of H3K4me3 on gene expression was greater than that of H3K27ac.

Our results indicated that various types of epi-modifications might alter synergistically or antagonistically to regulate the transcription of genes involved in biological processes of breast cancer, which was in agreement with previous study that has demonstrated that the combination of several histone modifications showed a cooperative manner in regulating transcriptional events (30).

### Subtype-specific epi-modification alterations involved in hallmarks of human cancers

To further investigate the functional roles of altered epi-modifications in breast cancer subtypes, we performed a functional enrichment analysis focusing on the genes showing both differential epi-modifications and a corresponding distinctive expression when compared with control. Although ‘cancer’ comprises a heterogeneous group of diseases, one characteristic and unifying feature is the creation of abnormal cells that grow beyond their natural boundaries, which was driven by the hallmarks of human cancers (8,35). Thus, we mainly investigated the cancer-relevant epi-modification alterations through analysis of target genes in the context of hallmarks of human cancer. In total, this analysis revealed 12 and 13 ‘hallmarks of cancer’-associated GO biological process terms in luminal and basal subtypes (FDR < 0.05), respectively. Not surprisingly, it revealed the fact that these two cancer subtypes share several biological processes commonly used by breast cancers (Fig. 5A). Among the shared six biological processes, four were involved in the hallmark of cancer ‘tissue invasion and metastasis’, one was involved in ‘self-sufficiency in growth signals’ and the other one was in ‘insensitivity to antigrowth signals’. Invasive potential is one of the features of the primary tumors where luminal and basal subtypes were derived from. These shared biological processes may partly represent ones that were frequently dysregulated in breast cancer, suggesting that some epigenetic modifications regulate common biological processes across multiple subtypes of breast cancer.

Figure 5.

Epigenetically dysregulated genes are involved in hallmarks of human cancers in breast cancer subtypes. (A) Functional enrichment analysis of genes with epi-alterations in each subtype. The X-axis indicates the –log10 P-values for enrichment. The functions associated with the same human cancer hallmarks were marked with the same colors. (B) Most genes associated with the same hallmarks are unique to either breast cancer subtype, with most amount of genes are with 1–3 epi-alterations. (C) The same genes associated with the same hallmarks in two subtypes are with distinct epigenetic patterns. Each row represents a type of epigenetic modification and each column represents a gene.

Figure 5.

Epigenetically dysregulated genes are involved in hallmarks of human cancers in breast cancer subtypes. (A) Functional enrichment analysis of genes with epi-alterations in each subtype. The X-axis indicates the –log10 P-values for enrichment. The functions associated with the same human cancer hallmarks were marked with the same colors. (B) Most genes associated with the same hallmarks are unique to either breast cancer subtype, with most amount of genes are with 1–3 epi-alterations. (C) The same genes associated with the same hallmarks in two subtypes are with distinct epigenetic patterns. Each row represents a type of epigenetic modification and each column represents a gene.

Interestingly, a considerable part of genes in the shared hallmarks associated biological processes were subtype-specific (Fig. 5B). For example, 194 and 178 genes with epi-alterations were annotated in the hallmark of ‘tissue invasion and metastasis’ in luminal and basal subtypes, respectively. However, only about 50% genes were common. Meanwhile, these common genes were regulated by various types of epigenetic modifications (Fig. 5C). These subtle differences between these two subtypes may be part of the intricate factors that lead to the distinctive degrees in capabilities of invasion, metastasis and abnormal cell growth. On the other hand, although some biological processes were associated with the same hallmarks of cancer, they were epigenetically perturbed in specific subtype (Fig. 5A). For example, the hallmark ‘evading apoptosis’ was epigenetically dysregulated in both luminal and basal subtypes, but involved biological processes were different, viz. that ‘negative regulation of programmed cell death’ in luminal and ‘negative regulation of apoptosis process’ in basal subtypes. Furthermore, specific cancer hallmarks were epigenetically dysregulated only in one of these two subtypes. In particular, the hallmarks ‘sustained angiogenesis’ and ‘reprogramming energy metabolism’ were found to be epigenetically dysregulated in luminal and basal subtypes, respectively. The hallmark ‘sustained angiogenesis’ is deemed to be a preponderant capability of relatively incipient cancer cells that require expansion. Cancer cells initially lack angiogenic ability, limiting their potential to expand. In order to progress, they must develop a blood supply. Whereas ‘reprogramming energy metabolism’ hallmark indicates adjustments of energy metabolism in order to adapt the cancer cells for the current microenvironment and ensuing development (8). During the progressive stages of tumorigenicity and tumor development, to sustain uncontrolled proliferation, cancer cells make adjustments to their energy production by reprogramming energy metabolism. These findings indicated that different types of tumor cells take advantage of distinct molecular strategies to activate and sustain the development of tumor.

Collectively, these observations suggested that most of the biological processes that were associated with hallmarks of cancer were epigenetically dysregulated in specific subtype. Even in the commonly involved biological processes, different subsets of genes or the same subsets of genes with distinctive epi-modification alterations participated in the oncogenic processes of these two subtypes.

### Epigenetic assessment reveals distinct patterns of oncogenic pathways activation in breast cancer subtypes

Both genetic and epigenetic changes contribute to development of human cancers. Systematic characterization of breast cancer genomes has identified somatic mutations in several key signaling pathways; next, we explored whether these genes were also regulated by epigenetics and further investigated that if the epigenetic regulations were subtype specific. Globally, most of the frequently somatic-mutated genes, such as ER, HER2, AKT and MAPKs, were also regulated by various types of epi-modifications in both subtypes (Fig. 6A), implying the interactions of genetic and epigenetic processes in tumor onset and progression. It is well-established that the genetic change not only plays a significant role in tumorgenesis, but also it is associated with inter- and intra-tumor heterogeneity (36). Interestingly, we found that these genetic signatures were also epigenetically regulated in a subtype specific manner. In the key signaling pathways of breast cancer, receptors are molecular gates on the surface of cells, which receive chemical signals from external substances and direct cells to response, such as cellular division, apoptosis or permissions for entrance or exit of specific molecules. Thus, different epigenetic dysregulation of receptor genes may result in distinctive biological consequences. For example, CDH1, encoding cadherin-1 (also known as E-cadherin), was with significantly increased H3K4me1 and H3K27ac signals within its promoter region in luminal subtype (Fig. 6A), but there were no epigenetic changes in basal subtype. Cadherin-1 is a calcium-dependent cell–cell adhesion glycoprotein, and activation of this gene is thought to weaken the potential of proliferation, invasion and metastasis of cancer cells (37,38). Therefore, the epigenetic activation of CDH1 in luminal might be contributed to the relatively weaker ability of proliferation, invasion and metastasis compared with the basal subtype. In addition, the gene AKT1, which is a key intermediate of signaling pathways, was specifically mutated in luminal breast cancers. A large body of literature has documented frequent hyperactivation of AKT kinases in a wide assortment of human solid tumors (39). We found that this gene was epigenetically dysregulated in luminal subtype (Fig. 6A), with significantly increased signals of H3K4me1 and H3K27ac. Although it is not genetically or epigenetically dysregulated in basal subtype, the upstream molecules, including epidermal growth factor receptor (FGFR), phosphatidylinositol 3-kinase (PI3K) and PTEN (Phosphatase and Tensin homologue deleted on chromosome 10) were frequently dysregulated (Fig. 6A). Dysregulation of these upstream regulators may further result in perturbing the AKT pathway in basal subtype. Other candidate driver genes that fit into an emerging pattern of subtype-specific pathway activation were ESR1, NRAS and HRAS, which were all with distinct epigenetic regulation patterns.

Figure 6.

The distinct patterns of oncogenic pathway activation in different breast cancer subtypes. (A) The key signaling pathways of breast cancer were distinctively epi-regulated in two subtypes. The digits on the top and bottom of each gene marked the type of epi-alterations in luminal and basal subtypes, respectively. The red color indicated up-regulation while the green represented down-regulation of epigenetic modifications. The subtype-specific drug targets were also marked in the figure. (B) Hierarchical clustering of 330 breast cancer samples (232 luminal and 98 basal samples) using expression profiles of epigenetically regulated genes in key signaling pathways. (C) Subtype-specific progression associated genes. (D and E) Kaplan–Meier estimates of overall survival of luminal and basal subtype patients according to the subtype-specific genes.

Figure 6.

The distinct patterns of oncogenic pathway activation in different breast cancer subtypes. (A) The key signaling pathways of breast cancer were distinctively epi-regulated in two subtypes. The digits on the top and bottom of each gene marked the type of epi-alterations in luminal and basal subtypes, respectively. The red color indicated up-regulation while the green represented down-regulation of epigenetic modifications. The subtype-specific drug targets were also marked in the figure. (B) Hierarchical clustering of 330 breast cancer samples (232 luminal and 98 basal samples) using expression profiles of epigenetically regulated genes in key signaling pathways. (C) Subtype-specific progression associated genes. (D and E) Kaplan–Meier estimates of overall survival of luminal and basal subtype patients according to the subtype-specific genes.

These results suggested the distinct patterns of oncogenic pathways activation in different breast cancer subtypes, which is likely to generate more insight into identifying subtype-specific drug targets. As a result, we found that evidences have shown that AKT1 is a luminal-specific drug target (40). In contrast, most of the upstream regulators of AKT1 are basal subtype-specific drug targets, such as EGFR and ERBB2. Erlotinib has been shown to be an effective drug to target EGFR and could inhibit tumor growth and metastasis in basal subtype. These different therapies were consistent with the distinct patterns of oncogenic pathways activation. Additionally, as epigenetic regulation of gene expression is a dynamic and reversible process, assessment of the epigenetic patterns may predict more subtype-specific epigenetic drugs. As EGFR is significantly regulated by H3K27ac, we proposed that histone deacetylase inhibitors may be other alternative drugs for treatment of basal subtype breast cancers. Indeed, evidences have shown that inhibition of histone deacetylase can suppress EGF signaling pathways by destabilizing EGFR mRNA in ER-negative human breast cancer cells (41).

Genes with subtype-specific epigenetic regulation may have an important role in clinical practice; we found that the 39 genes in the key signaling pathways showed significant subtype-specific expression patterns in clinical samples. These genes were able to distinguish between luminal and basal subtypes (Fig. 6B and Supplementary Material, Table S3). Next, we further explored which genes can be effectively used as a prognosis signature for breast cancers. To demonstrate this, we performed Cox regression analysis to evaluate the significance of the associations between gene expression and overall survival. As a result, no genes were identified for all breast cancer patients. However, one and six genes were identified as subtype-specific progression biomarkers in luminal and basal breast cancers, respectively (Fig. 6C). In both subtypes, patients were further divided into high- and low-risk groups based on their risk scores [detailed methods were described in ref. (42)]. Consequently, patients with high-risk scores had shorter median survival than those with low-risk scores (Fig. 6D and E).

Taken together, these observations suggested that the cumulative changes in genetic and epigenetic influence gene expression during tumorgenesis. New models of oncogenomic progression must consider the combined effect of epigenetic and genetic changes which contribute to the tumor heterogeneity collectively. Systematic analyses of the epigenetic patterns of different types of cancers may open another door to epigenetic targeted therapy.

## DISCUSSION

In this study, we have characterized epigenetic alterations associated with the two major breast cancer subtypes, defined by ER, PR and HER2 status. We found that in both subtypes, the expression of these three marker genes was strictly regulated by epigenetics. ERBB2 (or HER2) gene amplification occurs in most of breast cancers, and is thought to be the major reason of its overexpression in breast cancer. We found that ERBB2-overexpressing breast carcinomas acquire the H3K4me3 mark on the ERBB2 promoter, which is consistent with one recent study that has demonstrated that in addition to genetic alterations such as amplification, expression of ERBB2 is also regulated by epigenetic factors independent of gene amplification (43). The H3K4me3 enrichment on the ERBB2 promoter has been demonstrated to be responsible for enhanced AP-2 promoter occupancy and elevated ERBB2 expression. In addition, it has been demonstrated that down-regulation of ERBB2 by reduced H3ac was sufficient to effectively inhibit colonogenicity (44). In our study, we also observed elevated H3K27ac in the ERBB2 overexpression cell line. Thus, ERBB2 activation in human breast cancers can involve a number of discrete epigenetic as well as genetic alterations. Genome-wide comparison of epi-modification alterations in two breast cancer subtypes, we found that even though the two breast cancer subtypes were with different epigenetic patterns, they still shared a core set of epigenetic dysregulated genomic regions (9.87–52.4 MB, Supplementary Material, Fig. S1). In addition, most (96.8%∼99.7%) of these genomic regions were epigenetically dysregulated in the same direction. These observations point toward the existence of a core of epigenetic lesions that are a necessary event during malignant transformation. However, most of the abnormal epigenetic patterning does not occur in a stereotypical manner in cancer (45), instead adopts distinct and specific distributions in different cancers or the same cancer with different genetic background.

The studies of gene promoters and CGIs under the assumption that epigenetic variation at these locations would have functional importance have been the focus of most cancer-related epigenetic studies (46). In both breast cancer subtypes, a relatively high portion of promoters and CGIs were with epi-modification alterations. However, more detailed analysis identified markedly dissimilar distribution of DMCs at CGI-associated features between luminal and basal subtypes (Fig. 2B). We also have shown that integration of the epigenomic and transcriptional data for both subtypes reveals that epi-modification alterations affect gene expression in a cooperative fashion. Furthermore, these epigenetically regulated genes were significantly enriched in some hallmarks of cancer. Even though they shared some hallmarks of cancer, the underlying biological processes or even genes in the same hallmarks were altered by distinctive epi-modifications.

Finally, we related our results to distinct patterns of oncogenic pathway activation in different subtypes and showed that conducting the analysis in each tumor subtype pointed to genes and associated pathways that may represent therapeutic targets for specific groups of patients. In a test of 77 therapeutic compounds, nearly all drugs showed differential responses across different breast cancer subtypes, and approximately one-third showed subtype, pathway-specific responses (17). Our systematical analysis of the epigenetic patterns may also reveal some subtype-specific drug targets, such as AKT1, EGFR and MAP2K1 (Fig. 6). Given the significant role that has been uncovered for epigenetic modifications in cancer development, coupled with the reversible nature of epigenetics, our observations could have enormous significance for the prevention and control of cancer.

The goal of our study was to discover, in an unbiased fashion, epigenetic patterns of different breast cancer subtypes. For a more comprehensive understanding of how epigenetic alterations drive cancer cell survival and proliferation, it is imperative to conduct an analysis that integrates epigenetic alteration information and transcriptional data from the same individual samples. However, limitations on the abundance of tissue samples with both histone modification, DNA methylation and transcriptional data necessitate the use of cell lines in the study of breast cancer subtype-related epigenetic patterns. And evidences have shown that the cell lines represent a suitable in vitro model system to study the underlying mechanisms of tumor development (18,47). With the increase of epigenetic data set of tissue samples, this paradigm can be expanded. In addition, we also validate our observations in a series of real patient breast cancers of these different molecular subtypes. As a result, we found that in luminal subtype, CGIs were more frequently overlapping with DMCs, whereas CGI shores harbored more DMCs than CGIs did in basal subtype (Supplementary Material, Fig. S4). Moreover, in a series of real patient breast cancers of these two subtypes, different epigenetic modifications regulated the expression of gene in a coordinate manner (Supplementary Material, Fig. S5). It is clear that the epigenetic regulation patterns of the two subtypes are different. This is evident that the epigenetically dysregulated genes that were involved in the key signaling pathways can significantly distinguish breast tumor samples into luminal and basal subtypes (Supplementary Material, Fig. S6). In addition, we found that some genes were associated with the survival of patients in specific subtype, such as MDM2, RB1 and CDK4. Most of these genes were epigenetically dysregulated in specific subtype. These results indicated that systematical analysis of the epigenetic patterns provided more accurate classification and survival predictions for patients with breast cancer, aiding more appropriate for adjuvant therapy.

The reason behind the different epigenetic patterns in the breast cancer subtypes is unknown but could reflect different cellular origins. It has been suggested that basal-like tumors originate from an aberrant population of luminal progenitor cells (48). Therefore, basal-like breast cancer has a more stem cell-like characterization (48). Nevertheless, luminal tumors arise from differentiated luminal cells (13). To further explore the underlying epigenetic mechanism, we investigated the PRC2 target genes that are involved in embryonic development, differentiation and cell fate decisions (49). We used a gene set for PRC2 targets in human embryonic stem (ES) cells consisting of 654 genes identified by Lee et al. (50). For this PRC2 target gene set, we identified 391 (59.8%) and 336 (51.4%) genes that harbor epi-modification alterations in luminal and basal subtypes, respectively. This observation suggested a role for epigenetic events in the process of regulation of alternative cell lineages. In particular, of these PRC2 targets, there were more genes having reduced H3K27me3 signal in luminal subtype than that in basal subtype (228 and 146 genes in luminal and basal, respectively). Remarkably, trimethylation of H3K27 is a known PRC2-mediated silencing mechanism essential for maintaining stem cell in an undifferentiated state (49). This finding is in line with the fact that basal breast cancer has more undifferentiated features than luminal subtype does. Additionally, an analysis of gene expression for PRC2 target genes in ES cells revealed high expression in luminal subtype and low expression in basal subtype (Supplementary Material, Fig. S7), which is consistent with the process of normal differentiation during which PRC2 is removed and these PRC2 targets are preferentially activated (51).

In our study, seven types of epigenetic modifications were analyzed, due to the limitation of accessible data. And furthermore, different modifications may target different function-rich genomic sequence compartments, including both proximal and distal regions relative to genes, such as promoters and enhancers, respectively. Thus, the present study partially illuminated epigenetic steps in the convergence and divergence between breast cancer subtypes. Our results add a novel layer of information to the differences between the molecular subtypes and the heterogeneous nature of breast cancers.

## MATERIALS AND METHODS

### Epigenetic and transcriptomic data sets of two breast cancer subtypes

Here, we portrayed and compared the alterations of six types of histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3 and H3K27me3) and DNA methylation between two cell lines (MCF-7 and HCC1954) that represent two breast cancer subtypes—luminal and basal. As a control, the human mammary epithelial cell (HMEC) was used. The ChIP-Seq data corresponding to H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3 and H3K27me3 in HCC1954 cell line were obtained from NCBI Gene Expression Omnibus (GEO) database (52) (Supplementary Material, Table S4). The ChIP-Seq data corresponding to these six histone modification in HMEC and MCF-7 cell lines were all downloaded from the UCSC (generated by the ENCODE group) (53) except for H3K4me1 in MCF-7 cell line which was downloaded from GEO database (54). Download files are all in FASTQ format.

In addition, the DNA methylation data sets for these three samples were all obtained from GEO (52,55) (see details in Supplementary Material, Table S4), which were generated by the MethylC-Seq approach. Gene expression data for MCF-7 was generated from ENCODE (53), and those for HCC1954 and HMEC were both obtained from GEO (52) (Supplementary Material, Table S4).

### Annotation of genomic features

The genomic features, including 5′-untranslated regions (5′-UTRs), 3′-untranslated regions (3′-UTRs), exon, intron, coding sequences (CDSs) and CGIs for the hg18 genome, were downloaded from UCSC (56). CGIs were defined as regions of DNA of greater than 500 bp with a G + C equal to or greater than 55% and observed CpG/expected CpG of 0.65 (57). In addition, the CGI shores were defined as the 2 kb regions upstream or downstream of CGIs (58). Similar as previous studies, promoters were defined as regions 2 kb up- and downstream of the transcription start sites (TSSs) of Refseq genes (59).

### Methods

Reads generated from ChIP-Seq were mapped to human reference genome (hg18) by Bowtie program (60), and only reads mapped to hg18 with, at most, three mismatches were retained for subsequential analysis. RNA-Seq reads were mapped to hg18 by the Tophat program (61) with default parameters. To quantify FPKM expression (Fragments Per Kilobase of exon model per Million mapped fragments) at gene level, the Cufflinks program (62) was applied to assemble the mapped Tophat reads with default parameters.

#### Identification of differential histone modification sites

The ChIPDiff program (63) was used with default options for quantitative comparison of all the six histone modification levels between breast cancer cells and normal breast cells (comparisons between MCF-7 and HMEC, HCC1954 and HMEC, separately). ChIPDiff is a tool designed for the genome-wide comparison of histone modification sites identified by ChIP-Seq, which is based on the observations of ChIP fragment counts and employs a hidden Markov model to infer the states of histone modification changes at each genomic location. The whole genome was partitioned into 1 kb bins by ChIPDiff, and we removed bins located on the Y chromosome. One kilobit bins that were identified as DHMSs by ChIPDiff and DHMRs that were merged from consecutive DHMSs with no gaps were kept for downstream analysis.

#### Identification of DMCs

Reads of MethylC-Seq data were aligned to the reference genome (human hg18) using Bismark alignment package v.0.7.1 (64) with a maximum of three mismatches, and only uniquely aligned reads were kept. Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and performs methylation calls in a single step. Percent methylation values for CpG dinucleotides are calculated by dividing the number of methylated Cs by total coverage on that base. Then methylKit (65) which is an R package for epigenome analysis was used with default parameters to identify the DMCs. Briefly, we used the Fisher's exact test to compare the fraction of methylated Cs in test and control samples. The CpGs with a q-value < 0.01 and %methylation difference > 25% were identified as DMCs.

#### Identification of epigenetically dysregulated features

To discern the biological impact of differential epi-modification events, each event must be put into its genomic context for subsequent analysis. Genomic features (such as, promoters, 5′-UTR, CDS, CGI) overlapping with DHMRs or DMCs were regarded as epigenetically dysregulated features. The identification was conducted by BEDTools (66).

#### Identification of enriched hallmarks of cancer

Enrichment of Gene Ontology (GO) biological process terms in both epigenetically regulated gene sets was assessed by computing a hypergeometric P-value with the Benjamini–Hochberg correction (FDR ≤ 0.05). In addition, a list of GO terms that were related to the hallmarks of cancer were obtained from a previous study (67). In both subtypes, we selected all the GO terms that pass the significance threshold and are in the list of cancer hallmarks related GO terms. The statistical analysis was conducted using the R software package (68).

#### Key signaling pathways in breast cancer and subtype-specific drug targets

Systematic characterization of breast cancer genomes has identified somatic mutations in several key signaling pathways. The key signaling pathways in breast cancer were obtained from a SnapShot of the Cancer Cell journal (69).

The subtype-specific drug targets were obtained from previous studies (40). Briefly, for a drug, the IC50 values between luminal and basal cell lines were compared. Kruskal–Wallis ANOVAs were used to evaluate the statistically significant differences in IC50 between the subtypes.

### Clustering analysis

Hierarchical clustering of the gene expression arrays was done using the bioconductor package ‘gplots’, using hierarchical agglomerative clustering (complete linkage) and Euclidean distance to perform the hierarchical clustering.

### Identification of the subtype-specific prognostic biomarkers

In order to explore whether the genes with epigenetic alterations can be used as prognostic biomarkers, Cox regression analysis was used to evaluate the association between survival and the expression level of each gene in all breast cancer samples, luminal subtypes or basal subtypes. Genes that were significantly correlated with survival (P < 0.05) were identified as members of the gene signature. More specifically, we assigned each patient a risk score according to a linear combination of the gene-expression values weighted by the regression coefficients from the aforementioned univariate Cox regression analyses. The risk score for each patient was calculated as follows:

$Risk_score=∑i=1nβi×Expsignature(i),$
where βi is the Cox regression coefficient of gene i, and n is the number of genes that are significantly associated with survival. All patients were thus divided and dichotomized into high-risk and low-risk groups using the median risk score as the cut-off point. The Kaplan–Meier method was further used to estimate the overall survival time for the two subgroups. The differences in the survival times were analyzed using the log rank test.

## SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

## FUNDING

This work was supported in part by the National High Technology Research and Development Program of China (863 Program, Grant Nos. 2014AA021102), the National Program on Key Basic Research Project (973 Program, Grant Nos. 2014CB910504), the National Natural Science Foundation of China (Grant Nos. 91129710, 61170154 and 61203264), and China Postdoctoral Science Foundation (2012M520764).

Conflict of Interest statement. None declared.

## REFERENCES

1
Jemal
A.
Bray
F.
Center
M.M.
Ferlay
J.
Ward
E.
Forman
D.
Global cancer statistics
CA

2011
61
69
90
2
Stingl
J.
Caldas
C.
Molecular heterogeneity of breast carcinomas and the cancer stem cell hypothesis
Nat. Rev. Cancer

2007
7
791
799
3
Parker
J.S.
Mullins
M.
Cheang
M.C.
Leung
S.
Voduc
D.
Vickery
T.
Davies
S.
Fauron
C.
He
X.
Hu
Z.
et al.
Supervised risk predictor of breast cancer based on intrinsic subtypes
J. Clin. Oncol.

2009
27
1160
1167
4
van't Veer
L.J.
Dai
H.
van de Vijver
M.J.
He
Y.D.
Hart
A.A.
Mao
M.
Peterse
H.L.
van der Kooy
K.
Marton
M.J.
Witteveen
A.T.
et al.
Gene expression profiling predicts clinical outcome of breast cancer
Nature

2002
415
530
536
5
Chin
K.
DeVries
S.
Fridlyand
J.
Spellman
P.T.
Roydasgupta
R.
Kuo
W.L.
Lapuk
A.
Neve
R.M.
Qian
Z.
Ryder
T.
et al.
Genomic and transcriptional aberrations linked to breast cancer pathophysiologies
Cancer Cell

2006
10
529
541
6
Kao
J.
Salari
K.
Bocanegra
M.
Choi
Y.L.
Girard
L.
Gandhi
J.
Kwei
K.A.
Hernandez-Boussard
T.
Wang
P.
Gazdar
A.F.
et al.
Molecular profiling of breast cancer cell lines defines relevant tumor models and provides a resource for cancer gene discovery
PLoS ONE

2009
4
e6146
7
Comprehensive molecular portraits of human breast tumours
Nature

2012
490
61
70
8
Hanahan
D.
Weinberg
R.A.
Hallmarks of cancer: the next generation
Cell

2011
144
646
674
9
Rodriguez-Paredes
M.
Esteller
M.
Cancer epigenetics reaches mainstream oncology
Nat. Med.

2011
17
330
339
10
Esteller
M.
Cancer epigenomics: DNA methylomes and histone-modification maps
Nat. Rev. Genet.

2007
8
286
298
11
Bediaga
N.G.
Acha-Sagredo
A.
Guerra
I.
Viguri
A.
Albaina
C.
Ruiz Diaz
I.
Rezola
R.
Alberdi
M.J.
Dopazo
J.
Montaner
D.
et al.
DNA methylation epigenotypes in breast cancer molecular subtypes
Breast Cancer Res.

2010
12
R77
12
Dedeurwaerder
S.
Desmedt
C.
Calonne
E.
Singhal
S.K.
Haibe-Kains
B.
Defrance
M.
Michiels
S.
Volkmar
M.
Deplus
R.
Luciani
J.
et al.
DNA methylation profiling reveals a predominant immune component in breast cancers
EMBO Mol. Med.

2011
3
726
741
13
Holm
K.
Hegardt
C.
Staaf
J.
J.
Jonsson
G.
Olsson
H.
Borg
A.
Ringner
M.
Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns
Breast Cancer Res.

2010
12
R36
14
Elsheikh
S.E.
Green
A.R.
Rakha
E.A.
Powe
D.G.
Ahmed
R.A.
Collins
H.M.
Soria
D.
Garibaldi
J.M.
Paish
C.E.
Ammar
A.A.
et al.
Global histone modifications in breast cancer correlate with tumor phenotypes, prognostic factors, and patient outcome
Cancer Res.

2009
69
3802
3809
15
Finn
R.S.
Dering
J.
Conklin
D.
Kalous
O.
Cohen
D.J.
Desai
A.J.
Ginther
C.
Atefi
M.
Chen
I.
Fowst
C.
et al.
PD 0332991, a selective cyclin D kinase 4/6 inhibitor, preferentially inhibits proliferation of luminal estrogen receptor-positive human breast cancer cell lines in vitro
Breast Cancer Res.

2009
11
R77
16
Neve
R.M.
Chin
K.
Fridlyand
J.
Yeh
J.
Baehner
F.L.
Fevr
T.
Clark
L.
Bayani
N.
Coppe
J.P.
Tong
F.
et al.
A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes
Cancer Cell

2006
10
515
527
17
Heiser
L.M.
A.
Kuo
W.L.
Benz
S.C.
Goldstein
T.C.
Ng
S.
Gibb
W.J.
Wang
N.J.
S.
Tong
F.
et al.
Subtype and pathway specific responses to anticancer compounds in breast cancer

2012
109
2724
2729
18
Charafe-Jauffret
E.
Ginestier
C.
Monville
F.
Finetti
P.
J.
Cervera
N.
Fekairi
S.
Xerri
L.
Jacquemier
J.
Birnbaum
D.
et al.
Gene expression profiling of breast cell lines identifies potential new basal markers
Oncogene

2006
25
2273
2284
19
Curtis
C.
Shah
S.P.
Chin
S.F.
Turashvili
G.
Rueda
O.M.
Dunning
M.J.
Speed
D.
Lynch
A.G.
Samarajiwa
S.
Yuan
Y.
et al.
The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
Nature

2012
486
346
352
20
Stephens
P.J.
Tarpey
P.S.
Davies
H.
Van Loo
P.
Greenman
C.
Wedge
D.C.
Nik-Zainal
S.
Martin
S.
Varela
I.
Bignell
G.R.
et al.
The landscape of cancer genes and mutational processes in breast cancer
Nature

2012
486
400
404
21
Jenuwein
T.
Allis
C.D.
Translating the histone code
Science

2001
293
1074
1080
22
Jones
P.A.
Takai
D.
The role of DNA methylation in mammalian epigenetics
Science

2001
293
1068
1070
23
Kouzarides
T.
Chromatin modifications and their function
Cell

2007
128
693
705
24
Sircoulomb
F.
Bekhouche
I.
Finetti
P.
J.
Ben Hamida
A.
Bonansea
J.
Raynaud
S.
Innocenti
C.
Charafe-Jauffret
E.
Tarpin
C.
et al.
Genome profiling of ERBB2-amplified breast cancers
BMC Cancer

2010
10
539
25
Beardsley
D.I.
Kowbel
D.
Lataxes
T.A.
Mannino
J.M.
Xin
H.
Kim
W.J.
Collins
C.
Brown
K.D.
Characterization of the novel amplified in breast cancer-1 (NABC1) gene product
Exp. Cell Res.

2003
290
402
413
26
Ling
L.J.
Lu
C.
Zhou
G.P.
Wang
S.
Ectopic expression of RhoBTB2 inhibits migration and invasion of human breast cancer cells
Cancer Biol. Ther.

2010
10
1115
1122
27
J.
Finetti
P.
Bekhouche
I.
Repellini
L.
Geneix
J.
Sircoulomb
F.
Charafe-Jauffret
E.
Cervera
N.
Desplans
J.
Parzy
D.
et al.
Integrated profiling of basal and luminal breast cancers
Cancer Res.

2007
67
11565
11575
28
Deaton
A.M.
Webb
S.
Kerr
A.R.
Illingworth
R.S.
Guy
J.
Andrews
R.
Bird
A.
Cell type-specific DNA methylation at intragenic CpG islands in the immune system
Genome Res.

2011
21
1074
1086
29
Suzuki
M.M.
Bird
A.
DNA methylation landscapes: provocative insights from epigenomics
Nat. Rev. Genet.

2008
9
465
476
30
Wang
Z.
Zang
C.
Rosenfeld
J.A.
Schones
D.E.
Barski
A.
Cuddapah
S.
Cui
K.
Roh
T.Y.
Peng
W.
Zhang
M.Q.
et al.
Combinatorial patterns of histone acetylations and methylations in the human genome
Nat. Genet.

2008
40
897
903
31
Yu
H.
Zhu
S.
Zhou
B.
Xue
H.
Han
J.D.
Inferring causal relationships among different histone modifications and gene expression
Genome Res.

2008
18
1314
1324
32
Karlic
R.
Chung
H.R.
Lasserre
J.
Vlahovicek
K.
Vingron
M.
Histone modification levels are predictive for gene expression

2010
107
2926
2931
33
Barski
A.
Cuddapah
S.
Cui
K.
Roh
T.Y.
Schones
D.E.
Wang
Z.
Wei
G.
Chepelev
I.
Zhao
K.
High-resolution profiling of histone methylations in the human genome
Cell

2007
129
823
837
34
Zhou
V.W.
Goren
A.
Bernstein
B.E.
Charting histone modifications and the functional organization of mammalian genomes
Nat. Rev. Genet.

2011
12
7
18
35
Hanahan
D.
Weinberg
R.A.
The hallmarks of cancer
Cell

2000
100
57
70
36
Bayani
J.
Selvarajah
S.
Maire
G.
Vukovic
B.
Al-Romaih
K.
Zielenska
M.
Squire
J.A.
Genomic mechanisms and measurement of structural and numerical instability in cancer cells
Semin. Cancer Biol.

2007
17
5
18
37
Semb
H.
Christofori
G.
Am. J .Human Genet.

1998
63
1588
1593
38
Andrews
J.L.
Kim
A.C.
Hens
J.R.
The role and function of cadherins in the mammary gland
Breast Cancer Res.

2012
14
203
39
Cheung
M.
Testa
J.R.
Diverse mechanisms of AKT pathway activation in human malignancy
Curr. Cancer Drug Targets

2013
13
234
244
40
Zaman
N.
Li
L.
Jaramillo
M.L.
Sun
Z.
Tibiche
C.
Banville
M.
Collins
C.
Trifiro
M.
Paliouras
M.
Nantel
A.
et al.
Signaling network assessment of mutations and copy number variations predict breast cancer subtype-specific drug targets
Cell Reports

2013
5
216
223
41
Zhou
Q.
Shaw
P.G.
Davidson
N.E.
Inhibition of histone deacetylase suppresses EGF signaling pathways by destabilizing EGFR mRNA in ER-negative human breast cancer cells
Breast Cancer Res. Treatment

2009
117
443
451
42
Li
Y.
Xu
J.
Chen
H.
Bai
J.
Li
S.
Zhao
Z.
Shao
T.
Jiang
T.
Ren
H.
Kang
C.
et al.
Comprehensive analysis of the functional microRNA-mRNA regulatory network identifies miRNA signatures associated with glioma malignant progression
Nucleic Acids Res.

2013
41
e203
43
Mungamuri
S.K.
Murk
W.
Grumolato
L.
Bernstein
E.
Aaronson
S.A.
Chromatin modifications sequentially enhance ErbB2 expression in ErbB2-positive breast cancers
Cell Rep.

2013
5
302
313
44
Falahi
F.
Huisman
C.
Kazemier
H.G.
van der Vlies
P.
Kok
K.
Hospers
G.A.
Rots
M.G.
Towards sustained silencing of HER2/neu in cancer by epigenetic editing
Mol Cancer Res.

2013
11
1029
1039
45
Akalin
A.
Garrett-Bakelman
F.E.
Kormaksson
M.
Busuttil
J.
Zhang
L.
Khrebtukova
I.
Milne
T.A.
Huang
Y.
Biswas
D.
Hess
J.L.
et al.
Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic landscapes in acute myeloid leukemia
PLoS Genet.

2012
8
e1002781
46
Dai
W.
Zeller
C.
Masrour
N.
Siddiqui
N.
Paul
J.
Brown
R.
Promoter CpG island methylation of genes in key cancer pathways associates with clinical outcome in high-grade serous ovarian cancer
Clin. Cancer Res.

2013
19
5788
5797
47
Wistuba
I.I.
Behrens
C.
Milchgrub
S.
Syed
S.
M.
Virmani
A.K.
Kurvari
V.
Cunningham
T.H.
Ashfaq
R.
Minna
J.D.
et al.
Comparison of features of human breast cancer cell lines and their corresponding tumors
Clin. Cancer Res.

1998
4
2931
2938
48
Lim
E.
Vaillant
F.
Wu
D.
Forrest
N.C.
Pal
B.
Hart
A.H.
Asselin-Labat
M.L.
Gyorki
D.E.
Ward
T.
Partanen
A.
et al.
Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers
Nat. Med.

2009
15
907
913
49
Bracken
A.P.
Dietrich
N.
Pasini
D.
Hansen
K.H.
Helin
K.
Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions
Genes Dev.

2006
20
1123
1136
50
Lee
T.I.
Jenner
R.G.
Boyer
L.A.
Guenther
M.G.
Levine
S.S.
Kumar
R.M.
Chevalier
B.
Johnstone
S.E.
Cole
M.F.
Isono
K.
et al.
Control of developmental regulators by Polycomb in human embryonic stem cells
Cell

2006
125
301
313
51
Sparmann
A.
van Lohuizen
M.
Polycomb silencers control cell fate, development and cancer
Nat. Rev. Cancer

2006
6
846
856
52
Hon
G.C.
Hawkins
R.D.
Caballero
O.L.
Lo
C.
Lister
R.
Pelizzola
M.
Valsesia
A.
Ye
Z.
Kuan
S.
Edsall
L.E.
et al.
Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer
Genome Res.

2012
22
246
258
53
Consortium
E.P.
Bernstein
B.E.
Birney
E.
Dunham
I.
Green
E.D.
Gunter
C.
Snyder
M.
An integrated encyclopedia of DNA elements in the human genome
Nature

2012
489
57
74
54
Joseph
R.
Orlov
Y.L.
Huss
M.
Sun
W.
Kong
S.L.
Ukil
L.
Pan
Y.F.
Li
G.
Lim
M.
Thomsen
J.S.
et al.
Integrative model of genomic factors for determining binding site selection by estrogen receptor-alpha
Mol. Syst. Biol.

2010
6
456
55
Sun
Z.
Asmann
Y.W.
Kalari
K.R.
Bot
B.
Eckel-Passow
J.E.
Baker
T.R.
Carr
J.M.
Khrebtukova
I.
Luo
S.
Zhang
L.
et al.
Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing
PLoS ONE

2011
6
e17490
56
Dreszer
T.R.
Karolchik
D.
Zweig
A.S.
Hinrichs
A.S.
Raney
B.J.
Kuhn
R.M.
Meyer
L.R.
Wong
M.
Sloan
C.A.
Rosenbloom
K.R.
et al.
The UCSC Genome Browser database: extensions and updates 2011
Nucleic Acids Res.

2012
40
D918
D923
57
Takai
D.
Jones
P.A.
Comprehensive analysis of CpG islands in human chromosomes 21 and 22

2002
99
3740
3745
58
Irizarry
R.A.
C.
Wen
B.
Wu
Z.
Montano
C.
Onyango
P.
Cui
H.
Gabo
K.
Rongione
M.
Webster
M.
et al.
The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores
Nat. Genet.

2009
41
178
186
59
Kim
M.
Park
Y.K.
Kang
T.W.
Lee
S.H.
Rhee
Y.H.
Park
J.L.
Kim
H.J.
Lee
D.
Lee
D.
Kim
S.Y.
et al.
Dynamic changes in DNA methylation and hydroxymethylation when hES cells undergo differentiation toward a neuronal lineage
Human Mol. Genet.

2014
23
657
667
60
B.
Trapnell
C.
Pop
M.
Salzberg
S.L.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Genome Biol.

2009
10
R25
61
Trapnell
C.
Pachter
L.
Salzberg
S.L.
TopHat: discovering splice junctions with RNA-Seq
Bioinformatics

2009
25
1105
1111
62
Trapnell
C.
Williams
B.A.
Pertea
G.
Mortazavi
A.
Kwan
G.
van Baren
M.J.
Salzberg
S.L.
Wold
B.J.
Pachter
L.
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nat. Biotechnol.

2010
28
511
515
63
Xu
H.
Wei
C.L.
Lin
F.
Sung
W.K.
An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data
Bioinformatics

2008
24
2344
2349
64
Krueger
F.
Andrews
S.R.
Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications
Bioinformatics

2011
27
1571
1572
65
Akalin
A.
Kormaksson
M.
Li
S.
Garrett-Bakelman
F.E.
Figueroa
M.E.
Melnick
A.
Mason
C.E.
methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles
Genome Biol.

2012
13
R87
66
Quinlan
A.R.
Hall
I.M.
BEDTools: a flexible suite of utilities for comparing genomic features
Bioinformatics

2010
26
841
842
67
Plaisier
C.L.
Pan
M.
Baliga
N.S.
A miRNA-regulatory network explains how dysregulated miRNAs perturb oncogenic processes across diverse cancers
Genome Res.

2012
22
2302
2314
68
Dessau
R.B.
Pipper
C.B.
‘R’—project for statistical computing
Ugeskr. Laeger

2008
170
328
330
69
Polyak
K.
Metzger Filho
O.
SnapShot: breast cancer
Cancer Cell

2012
22
562–562 e561

## Author notes

These authors contributed equally to this work.