High-resolution mass spectrometry (MS)-based proteomics is a powerful method for the identification of soluble protein complexes and large-scale affinity purification screens can decode entire protein interaction networks. In contrast, protein complexes residing on chromatin have been much more challenging, because they are difficult to purify and often of very low abundance. However, this is changing due to recent methodological and technological advances in proteomics. Proteins interacting with chromatin marks can directly be identified by pulldowns with synthesized histone tails containing posttranslational modifications (PTMs). Similarly, pulldowns with DNA baits harbouring single nucleotide polymorphisms or DNA modifications reveal the impact of those DNA alterations on the recruitment of transcription factors. Accurate quantitation – either isotope-based or label free – unambiguously pinpoints proteins that are significantly enriched over control pulldowns. In addition, protocols that combine classical chromatin immunoprecipitation (ChIP) methods with mass spectrometry (ChIP-MS) target gene regulatory complexes in their in-vivo context. Similar to classical ChIP, cells are crosslinked with formaldehyde and chromatin sheared by sonication or nuclease digested. ChIP-MS baits can be proteins in tagged or endogenous form, histone PTMs, or lncRNAs. Locus-specific ChIP-MS methods would allow direct purification of a single genomic locus and the proteins associated with it. There, loci can be targeted either by artificial DNA-binding sites and corresponding binding proteins or via proteins with sequence specificity such as TAL or nuclease deficient Cas9 in combination with a specific guide RNA. We predict that advances in MS technology will soon make such approaches generally applicable tools in epigenetics.
Gene expression starts with regulatory proteins binding to DNA at the promoter and enhancer regions, which have the potential to alter the local chromatin environment or recruit polymerases and other proteins of the core transcriptional machinery. These proteins bind directly to specific DNA sequences or interact with other chromatin components such as specific posttranslational modifications (PTMs) on histone tails. The composition of complexes in the regulatory region of a gene decides whether a gene is actively transcribed, repressed, or held in an intermediated state. A true understanding of how a gene is controlled in normal function or disease requires identification of the complete inventory of regulatory proteins and complexes that reside in its regulatory regions, as well as their interactions and modifications.
High resolution, quantitative mass spectrometry (MS)-based proteomics has turned into a powerful tool to study diverse aspects of proteins in a global and unbiased manner ( 1 ), and in particular protein-protein interactions ( 2 ). Initial strategies involved stringent purification of protein complexes followed by SDS-PAGE separation and MS-based analysis of visually selected gel bands. Today, dramatic improvements in shot-gun proteomics allow direct in-solution digestion of immuno-precipitates, which defines significantly enriched proteins over background binders ( 3 ). This requires accurate quantitation, which can be achieved either in isotope labelled formats such as SILAC ( 4 ), or by the increasingly powerful label-free methods ( 5 , 6 ). Label-based quantitation methods allow a direct read out of relative peptide quantities within the same MS run and generally involve the determination of a ‘heavy to light ratio’ in the same mass spectrum. In contrast, modern, label free methods employ sophisticated bioinformatic normalization strategies to compare protein intensities between an unlimited number of MS runs. As sequencing information can be matched between runs ( 5 ), label free quantification methods can also provide a higher sensitivity compared to label based methods, which is especially helpful for the analysis of low abundant protein interactions. Following such strategies, recent large-scale studies have charted the composition of soluble complexes and their networks on a global scale ( 7–10 ).
Compared to soluble complexes, the characterization of chromatin-associated ones is much more difficult owing to the tight integrity of chromatin that needs to be disrupted, without affecting the complex of interest. Two main strategies are employed: The first one simulates the interaction between (modified) histones or DNA and protein complexes in vitro using peptide or DNA baits. The second one resembles classical chromatin immunoprecipitation (ChIP) and ChIP-derived protocols except that they are followed by MS.
In this review, we will focus on the contribution of proteomics to the epigenetics field, in particular the identification of gene regulatory complexes on chromatin. Due to limits in length, we refer the reader to recent reviews for the use of proteomics for the identification of histone PTMs and the definition of soluble forms of chromatin-associated complexes ( 11–13 ).
Identification of the complete inventory of regulatory proteins and complexes associated with chromatin would be of tremendous help in understanding gene regulation and other chromatin-related processes. High resolution MS has already been applied to the DNA damage response ( 14 , 15 ), DNA repair ( 16 ), DNA replication ( 17–20 ) and mitosis ( 21 ). The analysis of the soluble Histone H3.1 interactome provided a comprehensive view of histone chaperones and components of the replication fork ( 22 ).
To characterize proteins responsible for gene transcription, several studies characterized interphase chromatin, starting with a pioneering study of Aebersold and colleagues in which they analysed the chromatin fraction of human B lymphocytes prior and after overexpression of the oncogene c-Myc ( 23 ). This revealed a large set of transcription factors and other chromatin-associated proteins, some of which displayed significant expression changes, including a 10-fold downregulation of the transcription factor ATF-3 and a 2-fold induction of the progesterone receptor. As part of an extensive chromatin characterization study in HeLa cells, Garcia and colleagues distinguished proteins differentially associated with euchromatin and heterochromatin by using partial MNase digestion ( 24 ). A recent study also applied MNase digestion to solubilize the chromatin fraction of embryonic stem cells (ESCs) and neural progenitor cells (NPCs), identifying a preferential chromatin association of the esBAF complex member Smarcd1 in ESCs ( 25 ).
Taking a different tack, Rappsilber and co-workers applied formaldehyde crosslinking and denaturing washing conditions to capture proteins that are tightly crosslinked to chromatin ( 26 , 27 ). Surprisingly, about half of these proteins did not have apparent chromatin-related functions. To define interphase chromatin-associated proteins, the authors therefore compared 28 different biological conditions, including different cell types, cell cycle phases, and drug treatments and applied a machine learning algorithm to identify proteins with common dynamic behavior that differed from contaminant proteins.
Histone Modification Readers
Histone proteins can be marked by phosphorylation, methylation, acetylation, ubiquitination and numerous other PTMs. These can directly affect chromatin structure, or be recognized by specific reader proteins, which themselves regulate chromatin structure to affect the expression of nearby genes ( 28 ). MS greatly contributed to the identification of these modifications ( 11 ), as well as of their respective reader proteins.
Typically, peptides representing parts of the histone tail sequence and containing one or more PTMs are linked with beads and incubated with cell extracts. Proteins specifically recognizing the PTM are enriched and then analysed by MS ( Figure 1A ) . Such assays identified key events such as the binding of Wdr5 and NURF to trimethylated lysine 4 on histone H3 (29,30). The introduction of quantitative proteomics in the form of SILAC labeling ( 4 ) dramatically increased sensitivity and specificity, revealing that the TFIID complex directly interacts with H3K4me3 (31) as well as interactions of other factors with trimethylated lysines ( 32 ). The same approach elegantly unraveled differential binding of different PRMT isoforms to symmetric and asymmetric dimethylation of H3R2 (33). Improvements in MS data quality and algorithms have led to a switch to label-free quantitation and this has successfully been applied to mouse tissues ( 34 ) and even birds ( 35 ). To more closely resemble binding in the context of chromatin, mono- or oligo-nucleosomes with modified histone tails have been generated by their chemical linkage to the globular domains ( 36–38 ) ( Figure 1B ). Although the differences in reader proteins in peptide and nucleosome pulldowns for a given histone PTM appear to be minor, an advantage of nucleosome pulldowns is the combined assessment of histone and DNA modifications. For instance, while PRC2 complex members effectively bound H3K27me3 modified nucleosomes, this interaction was lost upon introduction of DNA methylation, while the association with the ORC complex was increased ( 38 ).
Binding of proteins to chromatin in vivo is influenced by many different factors, such as the combinatorics of different modifications on histones and DNA, the presence of DNA-binding proteins other than histone core particles and the three dimensional structure of chromatin ( 39 ). In the classical chromatin-immunoprecipitation (ChIP) method chromatin is crosslinked by formaldehyde and sheared by nuclease treatment or sonication to a length of one to three nucleosomes, followed by antibody based affinity purification and analysis of the bound DNA by PCR or next generation sequencing (ChIP-seq) ( 40 , 41 ). Technological improvements in MS have recently allowed the application of these ChIP protocols to elucidate chromatin associated protein complexes, a technology termed ChIP-MS ( 42 , 43 ), ChroP ( 44 , 45 ), mChIP ( 46 , 47 ), or RIME ( 48–50 ) ( Figure 2A ). Two recent papers enriched nucleosomes containing particular histone modifications to study protein interaction networks of genomic regions marked with specific histone PTMs for promoter, enhancer and heterochromatin regions in mouse ESCs ( 51 , 52 ). Interestingly, the same method can also identify co-occurrence of histone marks, by analysing the histone PTM levels in the precipitated material ( 44 ). The application of ChIP-MS to histone variants elucidated nucleosomal ratios and histone PTMs of mononucleosomes containing non-canonical histone variants ( 43 ).
ChIP-MS for Transcription Factor Complexes and Other Chromatin Components
In addition to localizing histone PTMs across the genome, ChIP-Seq can also elucidate the binding of transcription factors, coregulatory proteins and other chromatin components. While ChIP-Seq infers protein associations by correlating their peak locations, ChIP-MS directly identifies chromatin associated complexes without a priori assumptions ( Figure 2B ). A pioneering ChIP-MS study in yeast characterized interaction partners of histone H2A, the histone variant Htz1p and three chromatin-bound proteins, that are difficult to purify by traditional co-immunoprecipitation (Co-IP) protocols ( 46 ). In this and a following study, in which they extended the screen to 102 different chromatin associated yeast proteins including several transcription factors, the group employed TAP-tagged baits, which were precipitated using IgG-coated magnetic beads ( 47 ).
ChIP-MS also determined interactors of the MSL (male-specific lethal) complex in Drosophila, which is responsible for X-chromosome dosage compensation in male flies ( 42 ). MSL2 and 3, two subunits of the complex, were tagged with an HTB tag, which is biotinylated in vivo , transfected into S2 cells, and precipitated from cross-linked and sheared chromatin preparation with streptavidin beads. The authors identified a range of proteins involved in active gene transcription including a putative H3K36me3 binding protein, whose loss of function phenotype resulted in partial mislocalization of the MSL complex to autosomes ( 42 ). Interestingly, the H3K36me3 modification was enriched in the MSL ChIP-MS, whereas H3K4me3 was depleted, in accordance with the ChIP-seq results ( 42 ). By exchanging the HTB with a bioTAP tag ( 53 , 54 ), the method was recently applied to find interaction partners PRC1 and 2 complex members in human cells ( 55 ) and drosophila HP1 from cells and from different life stages of flies ( 56 ). In a variant of ChIP-MS, a TAP-based immunoprecipitation is followed by an in vitro lysine actyl-transferase (KAT) reaction in presence of isotopically labelled Acetyl-CoA ( 57 ). This allows identification of substrate-KAT relationships of chromatin associated protein complexes.
ChIP-MS for chromatin-associated factors can also be conducted without tags using antibodies directed against the endogenous protein ( 48–50 ). Carroll and co-workers focused on coregulatory proteins of oestrogen receptor (ER) and defined specific interactors by removing proteins that were identified in a control ChIP-MS experiment using total IgG ( 48 ). By further including SILAC, the authors defined specific interactions that depend on treatment with either oestrogen or the ER antagonist tamoxifen, which identified GREB1 as an oestrogen-specific ER-interacting protein ( 48 ). GREB1 ChIP-Seq revealed a strong overlap with the ER coregulators CBP and p300, while GREB1 downregulation by siRNA resulted in a displacement of those two factors from ER-binding sites. ER ChIP-MS in solid tumour samples confirmed the interaction with GREB1 in three out of six tumour samples, suggesting clinical relevance. Subsequently, the group reported an interaction of ER with progesterone receptor (PR) in response to progesterone treatment ( 49 ). Interestingly, activated PR directed ER to genomic loci such that the resulting gene expression program was associated with a good clinical outcome.
Long non-coding RNAs (lncRNAs) can directly act at the level of chromatin as functional regulators of gene expression ( 58 , 59 ). In analogy to ChIP-seq, the localization of those RNAs on the genome can be determined by a ChIP-like method called ChIRP (Chromatin Isolation by RNA Purification), which includes precipitation of a given lncRNA using biotinylated antisense oligos covering the whole length of the RNA sequence ( 60 ) ( Figure 2C ). To identify proteins associated with the Xist lncRNA, ChIRP was recently combined with MS and uncovered 81 proteins specifically interacting with Xist on chromatin. This included HnrnK and Spen (/Sharp), which were both found to participate in Xist-mediated gene silencing ( 61 ). UV crosslinking and purification under denaturing conditions has also been used on Xist ( 62 ), and strikingly, 9 of 10 interactors overlapped with the factors identified by ChIRP-MS. Hence, formaldehyde crosslinking under ChIP conditions efficiently covers both direct lncRNA interactions as well as the chromatin background. Of note, while the UV crosslinking study postulated that the gene inhibitory mechanism of Spen (/Sharp) is mediated by an interaction with Smrt and Hdac3, neither factor was identified in the UV-crosslinking RNA pulldown or in the ChIRP-MS experiment. Hence, either ChIRP-MS was not sufficiently comprehensive, or Smrt and Hdac3 are not physically associated with Xist chromatin loci.
For all the advantages of ChIP-MS, it does not provide locus-specificity in the genome. However, other MS methods contribute to the understanding of protein-DNA interactions, most prominently DNA pull-down experiments followed by MS ( 63 ) ( Figure 1C ). DNA baits containing the DNA sequence of interest or a control sequence are biotinylated and incubated with nuclear extracts to enrich for specifically interacting proteins. Such an approach successfully identified new interactors of telomeres using polymerized biotinylated double‐stranded oligonucleotides of the telomeric sequence TTAGGG as baits ( 64 , 65 ). Furthermore, it has shed new light on the enigma of ultra-conserved elements (UCE) in the genome ( 66 ). Analysis of a total of 193 UCE sequences between 200bp and 1000bp in length identified a set of 425 proteins that robustly associated with those elements ( 67 ). Interestingly, UCEs in non-exonic regions were most enriched in intrinsic interactors, and at the same time most refractory to the binding of PRC proteins, which was much higher on random genomic loci, proving non-exonic UCEs to be integrators of enhancer localized transcription factors.
In addition to their application in basic biology, DNA pulldowns linked to MS also have great potential in a genetic context, by identifying differential binders of single nucleotide polymorphisms (SNPs). This approach had been termed PWAS (proteome-wide analysis of SNPs). In an example from animal husbandry, PWAS determined the cause of the genetic difference responsible for the lean phenotype of European pigs, which turned out to be due to a repressor differentially binding to an imprinted locus at the IGF2 gene ( 68 , 69 ). In a clinical context, PWAS was first applied to identify differential interactors to SNPs that are highly associated with type 1 diabetes at the interleukin-2 receptor (CD25) locus ( 70 ). Using concatemerized DNA oligonucleotides representing the genomic sequence surrounding a particular SNP, PWAS identified four differential interactions of transcription factors at four out of 12 investigated SNPs. Note that differential binding to more than the lead SNP in a haplo-block should be investigated, as the lead SNP is not necessarily the functional one.
Another PWAS study focused on two somatic mutations in the TERT promoter region, which are frequently associated with oncogenesis (C228T and C250T) ( 71 ). Both mutations generate de novo consensus binding motifs for E-twenty-six (ETS) transcription factors ( 72 , 73 ), which are recognized in vivo by the transcription factor GABP ( 74 ). Performing PWAS with either of the two mutations revealed binding of the ETS transcription factors ELF1, ELF2, and ETV, while both mutations together bound GABP as a hetero-tetramer, which requires two adjacent ETS-binding motifs. Interestingly, GABP binding was even more favoured with a DNA-bait containing two native upstream ETS-binding sites in combination with the C228T mutation. This highlights the importance of the combinatorial nature of transcription factor binding in DNA pull-down assays and suggests that oligonucleotides, which only minimally cover an SNP could fail to identify biologically meaningful interactions.
To overcome limitations of SILAC labeling, which is most easily used in cell-lines, a recent study applied label-free quantification and dimethyl labelling for DNA pulldowns, enabling the measurement of DNA interactions in extracts of primary cells ( 75 ). Using nuclear extracts of peripheral blood mononuclear cells (PBMCs), this revealed allele-specific binding to a SNP related to chronic lymphocytic leukaemia ( 75 ). Such a strategy may be interesting for future PWAS studies, as it allows direct screening for differential-binding partners as long as they are expressed in available target cells.
Apart from the elucidation of sequence-specific binders, DNA pulldowns coupled to MS can identify proteins that specifically recognize DNA modifications, such as 5-methylcytosine (5-mC) ( 63 , 76 ). Vermeulen and co-workers characterized reader proteins of 5-mC and 5-hydroxymethylcytosine (5-hmC), as well as their oxidized derivatives 5-formylcytosine (5-fC) and 5-carboxylcytosine (5-caC) in mouse ESCs, NPCs and brain tissue ( 77 ). While 5-mC is a hallmark for epigenetic silencing, its oxidized derivatives are actively involved in gene activation ( 78 ). Interestingly, this discovered not only distinct binding proteins for 5-mC and its oxidized derivatives, but also highly dynamic binding of proteins in the three different cell types, indicating specialized roles towards differentiation and pluripotency control ( 77 ).
Binding of protein complexes to DNA in vivo is strongly influenced by the chromatin environment. To study the protein repertoire at a particular genomic locus, one ideally would need to isolate just that region and all proteins associated with it. The main challenge is the required sensitivity, as a single copy-binding protein cannot yield more than two molecules per cell for a diploid locus ( Figure 2D ).
In the first locus-specific study, Kingston and colleagues used a desthiobiotinylated DNA probe complementary to telomere repeats, taking advantage of the repetitive character of telomere sequences ( 79 ). This uncovered a large set of proteins with a known or potential telomere-associated role, including the homeobox telomere-binding protein 1 (HOT1) also identified by the telomere DNA pull down approach mentioned above ( 64 ). The same group also identified proteins interacting with telomere-associated sequence (TAS) repeats in Drosophila ( 80 ).
Demonstrating proof of principle for single locus-specific ChIP MS, Tacket and colleagues genetically engineered a yeast strain containing a LexA-binding site in close vicinity to the GAL1 promoter, together with constitutive expression of the LexA-PrA fusion protein serving as affinity handle. Analysis of proteins interacting under transcriptionally active and repressive conditions identified Gal3, two subunits of the RNA polymerase complex (Rpb1 and Rpb2) and Spt16, a component of the FACT complex ( 81 ). The low number of specific gene activators and especially of known repressor proteins indicates that the detection threshold was still a limiting factor despite using 2.5*10 11 cells.
Two arrays of eight LexA-binding sites flanked on each side by six copies of the chicken HS4 insulator complex were integrated into a plasmid in an attempt to study novel proteins that could interact with insulator elements ( 82 ). The genomically integrated LexA arrays were then targeted by an exogenously expressed LexA-CBP-FLAG fusion protein and anti-Flag antibodies. Subsequent analysis was done by silver staining, which is presumably the reason that only RNA helicase p68 and the nuclear matrix protein Matrin-3 were followed up on as specific interactors.
Analogously, a binding site of the tetracycline repressor protein (TetR) inserted into a human γ-globin minilocus was integrated into transgenic mice ( 83 ). Affinity purification used a triple tag (CFP, HA, Bio) TetR protein (TetR3T), transgenically inserted into mice expressing the BirA enzyme and crossed into the mice containing the γ-globin minilocus. A cell line of erythroid progenitor cells was then generated and expanded from the transgenic mice. Treating cells with doxycycline provided an elegant control as it leads to release of the TetR3T protein from the TetR binding site. Performing HA/streptavidin tandem purification from crosslinked cells, the authors identified 14 candidate proteins specifically interacting under non-doxycycline conditions, among them several known to bind to the γ-globin promoter such as GATA1 and CHD4.
Transcription activator-like (TAL) effector proteins have also been targeted to specific genomic loci ( 84 ). Tacket and colleagues used a TAL-PrA fusion construct recognizing a specific sequence of the GAL1 promoter ( 85 ). This led to the identification of the same four transcriptional activating proteins that they had identified with the LexA-PrA system. However, under repressive conditions, this system did not enrich the GAL1 locus, indicating that the TAL-PrA protein was not able to recognize its target sequence. In another study, a TAL-3xFLAG fusion protein was used to isolate telomeres from the mouse hematopoietic cell line Ba/F3 ( 86 ).
CRISPR-based methods are versatile tools for genomic engineering and have recently been extended by nuclease deficient Cas9 (dCas9) fusion proteins that are targeted to specific promoter regions to modify their gene expression ( 87–89 ). When this technology was employed to target the GAL1 locus it successfully enriched the GAL1 locus and revealed several enriched proteins related to transcription, however, not GAL3 nor the other three factors found before ( 90 ). Similar to the TAL-PrA construct, the CRISPR based method was not able to enrich the locus under transcriptional inactive conditions probably due to a lack of DNA accessibility as in the case of TAL-PrA.
The IRF1 promoter in human embryonic kidney-derived 293T cells was targeted by expressing a dCas9-3xFlag fusion protein together with a specific gRNA or control gRNA ( 91 ). Although the authors identified a range of nuclear and chromatin associated proteins, no specific transcription factors known to recognize this locus were identified.
As we have documented here, there are now many proteomics methods to advance chromatin interaction research. The identification of reader proteins of histone PTMs and DNA sequences by pulldown approaches is most mature. PWAS enables geneticists to follow up on their SNP of interest in a more direct way than just predicting transcription factor binding by consensus sequences. When applying label free proteomics, primary cells and tissues can easily be used, while a lower limit in input material for meaningful PWAS results of around 600 µg of nuclear extract for an experiment containing technical triplicates for each of the two alleles has been reported ( 71 ).
ChIP-based techniques for targeting transcription factors even under endogenous levels are already showing great potential. Powerful label-free quantitative MS methods are making ChIP-MS increasingly applicable to clinical questions related to transcription factors in areas such as cancer or metabolic diseases. Special care should be taken when using antibodies directed against endogenous proteins as these may have off-target specificities. Fortunately, this can be excluded by elegant controls, such as knock down of the bait or a second antibody reacting with a different epitope.
Locus-specific ChIP-MS so far works well only for repeat sequences with multiple copies in the cell and existing studies targeting individual loci are primarily conceptual. This is almost entirely due to the fact that a very low number of bound proteins are extracted from each cell. Therefore, this is clearly an area where future generations of MS technology could enable major progress.
The rapidly developing CRISPR-based techniques will likely make them the methods of choice also for locus-specific ChIP-MS. Multiple guide RNAs for targeting a locus of interest either in combination or in individual pull-downs should decrease off-target binding of dCas9 with a single guide ( 92 ). This will be important to reduce false-positive hits in locus-specific ChIP-MS experiments. Studies with both TAL and dCas9 fusion proteins indicate problems accessing the target sequence under transcriptionally repressive conditions ( 85 , 90 ) and genomic binding of dCas9 positively correlates with DNAse hypersensitivity ( 92 ) and low nucleosome occupancy ( 93 ). Therefore, care should be taken that the guide RNA has access to the genome, which can be done by taking into account recent recommendations for effective guide RNA design.
In conclusion, chromatin-associated interaction proteomics has already yielded many new discoveries in biology. Ongoing technological advances in high sensitivity, quantitative MS, combined with ingenious genomic and biochemical strategies have even greater potential for basic research and for the clinic.
We thank our colleagues at the Department of Proteomics and Signal Transduction for help and fruitful discussions.
Conflict of Interest statement . None declared.
M.W. is supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the e:Med research and funding concept (grant 01ZX1313A-2014). Funding to pay the Open Access publication charges for this article was provided by the Max Planck Society for the Advancement of Science.