The internal interaction in RBBP5 regulates assembly and activity of MLL1 methyltransferase complex

Abstract The Mixed Lineage Leukemia protein 1 (MLL1) plays an essential role in the maintenance of the histone H3 lysine 4 (H3K4) methylation status for gene expression during differentiation and development. The methyltransferase activity of MLL1 is regulated by three conserved core subunits, WDR5, RBBP5 and ASH2L. Here, we determined the structure of human RBBP5 and demonstrated its role in the assembly and regulation of the MLL1 complex. We identified an internal interaction between the WD40 propeller and the C-terminal distal region in RBBP5, which assisted the maintenance of the compact conformation of the MLL1 complex. We also discovered a vertebrate-specific motif in the C-terminal distal region of RBBP5 that contributed to nucleosome recognition and methylation of nucleosomes by the MLL1 complex. Our results provide new insights into functional conservation and evolutionary plasticity of the scaffold protein RBBP5 in the regulation of KMT2-family methyltransferase complexes.


INTRODUCTION
The methylation of Histone H3 Lysine 4 (H3K4) is crucial to the epigenetic regulation of gene transcription (1,2). The dimethylation and trimethylation of H3K4 mainly occur at the promoters and coding regions of actively transcribed genes, while H3K4 monomethylation is primarily enriched at the enhancers (3,4). The methylation of H3K4 is predominantly mediated by the KMT2-family histone methyltransferases (5,6). Set1 is the only KMT2-family protein discovered in Saccharomyces cerevisiae (7,8). At least six Set1 homologs have been identified in mammalian cells, including SET1A, SET1B, and four Mixed Lineage Leukemia proteins (MLL1, MLL2, MLL3, and MLL4) (5,6). Of these, MLL1, the founding member of KMT2-family methyltransferase, has drawn the most attention because chromosome translocations of MLL1 lead to acute lymphoid and myeloid leukemia (9).
Recent studies on the structure of KMT2-family complexes have shed light on how these regulatory proteins bind and activate KMT2-family proteins (13)(14)(15). The splA and ryanodine receptor (SPRY) domain of ASH2L (ASH2L SPRY , residues 286-505) forms a heterodimer with a short fragment of RBBP5 (RBBP5 AS-ABM , residues 330-360) to stimulate the HKMT activity of MLL-family methyltransferases (15). The crystal structure of the minimized MLL3 SET -RBBP5 AS-ABM -ASH2L SPRY complex revealed that RBBP5 AS-ABM -ASH2L SPRY constrained the flexibility of the SET-I motif of MLL SET , thereby facilitating substrate binding and catalytic activation (15). Furthermore, the structures of yeast COMPASS complexes revealed the assembly and regulation mechanisms of an active holoenzyme (13,14). The yeast COMPASS complex adopts a Y-shaped structure in which Swd1 (an RBBP5 ortholog) and Swd3 (a WDR5 ortholog) form the two arms of the Y-shape. Set1 sits at the central fork and Bre2 (an ASH2L ortholog) is located at the bottom of the Y-shape (13,14). Low-resolution cryo electron micoscropy analysis suggested that the MLL1 core complex may adopt a similar architecture as the yeast COMPASS complex (16). However, the detailed structural models of different human KMT2-family complexes remain to be determined.
Among KMT2-associated regulators, RBBP5 and its yeast ortholog Swd1 play a pivotal role in the assembly and activity regulation of KMT2-family complexes (13)(14)(15). Both RBBP5 and Swd1 contain a WD40 propeller domain followed by a long C-terminal tail (17). The C-terminal tail can be further divided into two regions, a WD40 repeat proximal (WDRP) region and a C-terminal distal (CTD) region ( Figure 1A) (14). The conserved WDRP region of RBBP5/Swd1 contains three adjacent short motifs that mediate interactions with KMT2 SET , ASH2L/Bre2, and WDR5/Swd3, respectively (14,15). Although RBBP5 orthologs are highly similar in sequence, they exhibit variations that may relate to distinct roles of RBBP5 in different species. First, the WD40 propeller of human RBBP5 features a ring structure along the axis arising from arginine side-chains (hereinafter referred to as arginine-ring), which is involved in DNA/RNA-binding. This unique feature is absent in yeast Swd1 (17). Second, the C-terminal distal region of RBBP5 is highly variable in length in different species, from 25 aa (residues 403-427) in Saccharomyces cerevisiae to 158 aa (residues 381-538) in Homo sapiens. Kluyveromyces lactis COMPASS structure shows that the short CTD of Swd1 (residues 422-439) extending from the Swd3-binding motif is sandwiched between Swd3 WD40 and Swd1 WD40 to further secure Swd3 binding (14). Removal of the CTD 422-439 from Swd1 completely abolished the binding of Swd3 (14). However, in human RBBP5, the CTD region (residues 381-538) is dispensable for the binding of RBBP5 to WDR5 (18). Therefore, the conformation and function of the extended CTD region in human RBBP5 still require further characterization.
Although RBBP5 and its yeast ortholog Swd1 have been extensively studied, there are still controversies regarding its structural feature and functional roles in the regulation of KMT2-family complex. In our work, we elucidated an internal interaction between the WD40 propeller and the CTD region in human RBBP5 by structural analysis and biochemical assays. This internal interaction promotes a compact conformation of the MLL1 complex and finetunes the activity of the MLL1 complex. We also uncovered a novel DNA-binding motif in the CTD region that is involved in nucleosome binding and methylation of nucleosomes mediated by the MLL1 complex.

Protein expression and purification
MLL1 WIN-SET (MLL1 3754-3969 ) containing both WIN motif and SET domain, full-length RBBP5, full-length ASH2L, and WDR5 22 (WDR5 23-334 ) were purified as previously described (15). RBBP5 WD40-containing constructs and CTD-containing fragments were expressed as His-Sumofusion proteins in Rosetta cells. The proteins were purified by Ni-NTA affinity column, and on-beads digestion using Ulp1 protease. The eluted proteins were further purified by size-exclusion chromatography (SEC) on a Hiload Superdex 75 or 200 column. Purified proteins were concentrated, snap-frozen in liquid nitrogen, and stored at −80 • C. Se-Met derivative protein of RBBP5  was expressed by growing cells in M9 minimal media and purified by using the same protocol as the native protein. All mutations were introduced by PCR-based site-directed mutagenesis, and mutant proteins were purified by using the same protocol as described above.

Crystallization, data collection and structural determination
After extensive screening, we obtained the complex crystal from the following constructs: WD40 propeller with residues 10-325 (RBBP5  ) and C-terminal distal region from 390 to 480 with the 422-443 loop deletion (RBBP5 390-480d ). The RBBP5 10-325 -RBBP5 390-480d complex was obtained by mixing two separately purified proteins at 1:2 molar ratio, followed by size-exclusion chromatography on a Hiload Superdex 200 column in the buffer containing 150 mM NaCl, 25 mM Tris-HCl, pH 8.0 and 5 mM DTT. Native or Se-Met RBBP5 complex was crystallized in 0.1 M HEPES-Na, pH 7.5, 2% (w/v) PEG 400, 2.0 M ammonium sulfate at 16 • C by vapor diffusion method. A 1.8Å selenomethionine-SAD (single-wavelength anomalous dispersion) dataset was collected at the Se peak wavelength at beamline BL19U1 of National Facility for Protein Science Shanghai (NFPS) in Shanghai Synchrotron Radiation Facility (SSRF). The dataset was processed by HKL3000 (19). The crystal belongs to the P1 space group, and there are two complexes in one asymmetric unit. AutoSHARP was used for selenium site search, solvent flattening, and automatic model building (20). Eight selenium atoms were located and refined, and an initial model was automatically built with good quality using ARP/WARP (21). Crystallography refinement was then carried out in PHENIX (22) together with manual model building in COOT (23).

Isothermal titration calorimetry
The equilibrium dissociation constants between RBBP5 2-333 and different C-terminal constructs of RBBP5 were determined by the MicroCal iTC200 calorimeter (Malvern Panalytical Ltd, UK). Different C-terminal constructs of RBBP5 (1 mM) were titrated into 0.1 mM and RBBP5 381-538 , three different salt concentrations were tested (150 mM NaCl, 300 mM NaCl and 800 mM NaCl) in the 25 mM Tris-HCl, pH 8.0 buffer. To evaluate the effects of mutations in the CTD region of RBBP5, RBBP5 390-480d or its mutants (1 mM) was injected into a sample cell containing 0.1 mM RBBP5 2-333 in the assay buffer (300 mM NaCl, 25 mM Tris-HCl, pH 8.0). The dissociation constants between different RBBP5 constructs and WDR5 (WDR5  ) were determined in the assay buffer of 150 mM NaCl, 25 mM Tris-HCl, pH 8.0. A total of 20 injections were carried out at 20 • C with a reference power of 5 cal/s. Binding data were plotted and analyzed in Origin 7 (OriginLab, USA). The ITC measurements for each interaction pair were repeated twice. The dissociation constants (K d ) and the fitting errors are derived from one representative ITC curve by data fitting using one-site binding model.

Biological Small-Angle X-ray Scattering
Small-angle X-ray scattering (SAXS) experiments were performed at beamline BL19U2 of National Facility for Protein Science Shanghai (NFPS) in Shanghai Synchrotron Radiation Facility (SSRF) (24). The wavelength of X-ray is 0.918Å. Scattered X-ray intensities were collected using a Pilatus 1M detector (DECTRIS Ltd, Switzerland). The sample-to-detector distance was set such that the detecting range of momentum transfer [q = 4 sin/, where 2 is the scattering angle] of SAXS experiments was 0.008-0.47 A −1 . A flow cell made of a cylindrical quartz capillary with a diameter of 1.5 mm and a wall of 10 m was used. The MLL1 complexes were obtained by mixing MLL1 3754-3969 , WDR5 23-334 , ASH2L FL and RBBP5 at the molar ratio of 2:1:1:1, followed by the purification on the Superdex 200 increase (10/300 GL) column. Then they were diluted to three different concentrations (0.5, 1 and 2 mg/ml) in the buffer of 300 mM NaCl, 25 mM Tris, pH 8.0, 4% glycerol and 1 mM TCEP. SAXS data were collected as 20 × 1-s exposures and scattering profiles for the 20 passes were compared at 10 • C using 60 l sample. The 2D scattering images were converted to 1D SAXS curves through azimuthally averaging after solid angle correction and then normalizing with the intensity of the transmitted X-ray beam, using the software package BioXTAS RAW (25). Background scattering was subtracted using PRIMUS in ATSAS software package (26). Linear Guinier plots in the Guinier region (q*R g < 1. 3) were confirmed in all experimental groups. Pair distance distribution functions of the particles P(r) and the maximum sizes D max were computed using GNOM (27).
The ab initio shapes were determined using DAMMIF (28) (29) was used to analyze the normalized spatial discrepancy (NSD) between the 15 models. The models with an NSD greater than the average NSD ± 2 standard deviation were excluded, and the remaining aligned models were averaged to get the representative molecular envelope.

Cross-linking mass spectrometry
The MLL1 complex was obtained by mixing MLL1 3754-3969 , WDR5 23-334 , ASH2L FL and RBBP5 FL at the molar ratio of 2:1:1:1, followed by the purification on the Superdex 200 increase (10/300 GL) column. The purified complex was diluted to 5 M in the buffer of 150 mM NaCl, 25 mM HEPES, pH 7.0 with a final volume of 20 l. EDC (1-ethyl-3-[3-dimethylaminopropyl] carbodiimide hydrochloride) was first mixed with MLL1 complex at a final concentration of 1 mM. After incubation at room temperature for 1 h, the reaction was quenched by addition of 20 mM 2-mercaptoethanol. Then Sulfo-NHS (Nhydroxysulfosuccinimide) was added to the reaction mixture at a final concentration of 2 mM, followed by incubation at room temperature for 2 h. Finally, the reaction was quenched by adding 20 mM Tris (pH 8.0). The reaction products were separated by SDS-PAGE on 8-10% Trisglycine gels and visualized by staining with Coomassie blue. The gel band of cross-linked MLL1 complex was excised and cut into small pieces. The gel sample was then destained with methanol/H 2 O, followed by TCEP reduction (10 mM), iodoacetamide alkylation (55 mM) and trypsin digestion. Trypsin digestion was performed with sequencinggrade modified trypsin (Promega, USA) at 37 • C overnight. The tryptic digested peptides were extracted and desalted with monoSpin C18 columns (GL Science, Japan), followed by separation in a Proxeon EASY-nLC liquid chromatography system (ThermoFisher Scientific, USA) by applying a step-wise gradient of 0-80% acetonitrile in 0.1% formic acid. Peptides eluted from the LC column were directly electrosprayed into the mass spectrometer with a distal 2 kV spray voltage. Tandem mass spectrometry (MS/MS) analysis was performed on a Q-Exactive instrument (Ther-moFisher Scientific, USA) in a 120-minute gradient. Crosslinked peptides were identified and evaluated using pLink2 software (30).

MALDI-TOF-based methyltransferase assay
The MLL1 complex was prepared by mixing MLL1 3754-3969 , WDR5 23-334 , ASH2L FL and RBBP5 at the molar ratio of 1:1:1:1 without further purification. For a 50 l reaction, 250 M SAM, 10 M histone peptide (ARTKQTARY) and 1 M MLL1 complex were mixed in the buffer containing 20 mM HEPES (pH7.8), 10 mM NaCl, and 5 mM DTT and incubated at 25 • C. At different time points, 4 l reaction mixture was taken out and treated with 0.5% trifluoroacetic acid to terminate the reaction. Then, 1 l reaction mixture was mixed with 1 l 10 mg/ml ␣-cyano-4-hydroxycinnamic acid matrix and spotted onto the sample plate. The molecular mass of histone peptides was determined on the TOF/TOF 5800 system (AB Sciex, USA) operated at the reflectron mode. The assays were performed in triplicate, and the averaged data points were used to calculate pseudo-first-order rate constants and fitting errors by using Origin 7 (OriginLab, USA) software.

Western-blot-based methyltransferase assay
The MLL1 complex was prepared by mixing MLL1 3754-3969 , WDR5 23-334 , ASH2L FL and RBBP5 at the molar ratio of 1:1:1:1 and incubate on ice for 30 minutes without further purification. The reaction assays were performed with 20 M SAM, 1 M nucleosome and 1 M MLL1 complex in the buffer containing 20 mM HEPES, pH 7.8, 10 mM NaCl and 5 mM DTT. The reaction was incubated at 25 • C for 1 h. Each reaction mixture was divided into three parts that were separated on a 15% SDS-PAGE. The methylation products of H3K4 were detected by western blot using corresponding antibodies (H3K4me1 antibody, #39297, Active motif; H3K4me2 antibody, # 07-030 Millipore; H3K4me3 antibody, #9751, CST).

MTase-Glo TM methyltransferase assay
To monitor the progression of the nucleosome-methylation reaction, we used the MTase-Glo™ Methyltransferases Assay Kit (Promega, USA). MLL1 complex (50 nM containing MLL1 3754-3969 , WDR5 23-334 , ASH2L FL and RBBP5 at the molar ratio of 1:1:1:1) and nucleosome (1 M) were mixed in the buffer of 25 mM HEPES-Na, pH 7.4, 100 mM NaCl, 0.1 mg/ml BSA and 1 mM DTT. The reaction was initiated by adding 20 M SAM and incubated at 30 • C. At different time points (1, 3, 8, 15 min), 8 l reaction mixture was taken out and stopped by adding 2 l 0.5% trifluoroacetic acid. Then we followed the protocol described in the kit manual to measure the amount of S-adenosyl-Lhomocysteine (SAH) produced. The assays were performed in triplicate, and the averaged data points were used for linear fitting of the amount of SAH generated to give an estimation of the initial reaction rate (nM SAH/min).

Preparation of nucleosome
Xenopus laevis histones were expressed in Escherichia coli BL21 (DE3) and purified from inclusion bodies as described before (31). The nucleosome was reconstituted from histone octamers and the 167-bp Widom 601 DNA (31). The reconstituted NCP was further purified through Superose 6 increase 10/300 gel filtration.

Electrophoretic mobility shift assay (EMSA)
The MLL1 complexes were obtained by mixing MLL1 3754-3969 , WDR5 23-334 , ASH2L FL and RBBP5 at the molar ratio of 2:1:1:1, followed by the purification on the Superdex 200 increase (10/300 GL) column. MLL1 complexes were diluted into a serial of concentrations (4-0.125 M) in six reaction tubes with a final volume of 5 l in the buffer containing 20 mM Tris-HCl, pH 8.0, 150 mM NaCl, and 10% glycerol. Then 5 l of 0.2 M nucleosome was added to each tube. After incubation on ice for 1 h, the reaction mixture was separated on 6% native polyacrylamide gels and visualized by staining with ethidium bromide (EB).

Both WD40 and CTD of RBBP5 contribute to activity regulation of MLL1 complex
RBBP5 WDRP (residues 330-381) is the primary region required for activity stimulation of KMT2-family complexes (15). However, our previous data showed that an RBBP5 WDRP -ASH2L 286-505 complex stimulated MLL3 HKMT activity to a level of ∼70% of full-length RBBP5-ASH2L (15). Results from Wilson lab also declared that the HKMT activity of MLL1 complex assembled with RBBP5 325-380 was ∼60% of that with RBBP5 1-380 (17). These studies suggest that other regions of RBBP5, including the WD40 propeller and the C-terminal distal region ( Figure 1A), may contribute to fine-tuning the activity of the KMT2-family complexes. Thus, we first quantitatively characterized the HKMT activities of the MLL1 core complexes in the presence of different RBBP5 domains by a mass-spectrometry-based methyltransferase assay. We used an H3 peptide as the substrate. As expected, WDR5 23-334 -ASH2L FL -RBBP5 330-381 substantially stimulated the HKMT activity of MLL1 WIN-SET (k = 0.23 ± 0.02 h −1 ) ( Figure 1B and Supplementary Figure S1), confirming the previous notion that RBBP5 WDRP is the major region required for activation of the MLL1 complex (15). We found that the activity of the MLL1 complex containing full-length RBBP5 (k = 0.82 ± 0.04 h −1 ) was ∼3.5-fold higher than that of RBBP5 WDRP -containing complex, indicating that either the WD40 propeller or the CTD region of RBBP5 may also participate in the activation of the MLL1 complex. However, the inclusion of either WD40 (RBBP5 2-381 ) or CTD (RBBP5 330-538 ) only slightly changed the activity, and the activity boost was only observed in RBBP5 2-538 containing both WD40 and CTD ( Figure 1B). These results imply that the WD40 propeller and the CTD region of RBBP5 have a synergistic effect on stimulating the methyltransferase activity of the MLL1 complex.
One possible explanation for the activity-stimulating effect of the WD40 propeller and the CTD of RBBP5 is that these two regions may cooperatively facilitate MLL1 complex assembly. Indeed, GST pull-down assays showed that GST-ASH2L FL -RBBP5 2-381 pulled down more WDR5 and MLL1 than GST-ASH2L FL -RBBP5 330-381 did ( Figure  1C), indicating WD40 propeller of RBBP5 enhanced the stability of the MLL1 complex. RBBP5 2-538 that includes the CTD region further increased the binding amount of WDR5 and MLL1 to GST-ASH2L FL ( Figure 1C). Therefore, we hypothesized that the WD40 and CTD regions of RBBP5 might provide additional binding interfaces for other proteins in the MLL1 core complex to facilitate its assembly.

The internal interaction between RBBP5 WD40 and RBBP5 CTD
Then we employed crosslinking mass spectrometry (CX-MS) to map potential RBBP5-mediated interaction sites in the MLL1 complex. We purified the MLL1 3754-3969 -WDR5 23-334 -RBBP5 FL -ASH2L FL complex and subjected it to chemical crosslinking using a zero-length crosslinker, EDC-Sulfo-NHS. The crosslinked products were digested by trypsin and then analyzed by liquid chromatographymass spectrometry. From the MS data, we identified 37 intermolecular cross-links and 38 intramolecular cross-links (Figure 2A and Supplementary Figure S2). Most of the crosslinked patterns confirmed the previously-known interaction interfaces (Figure 2A). For example, ASH2L SPRY (residues 286-505) is in close contacts with RBBP5 ASH2Lbinding motif (ABM, residues 344-360). SET-N and SET-I modules of MLL1 SET were crosslinked with RRBP5 AS+ABM fragments. Unexpectedly, we observed extensive crosslinking between RBBP5 381-538 (named as RBBP5 CTD ) and WDR5 WD40 propeller. In addition, RBBP5 CTD was also intra-molecularly crosslinked to its own WD40 propeller domain (Figure 2A and Supplementary Figure S2). This result indicates that RBBP5 CTD has a physical interaction to these two WD40 propellers or they may be spatially close.
To test whether RBBP5 CTD can directly bind these two WD40 propellers in WDR5 and RBBP5, we used isothermal titration calorimetry (ITC) to characterize their interactions. No interaction was detected between RBBP5 CTD and WDR5 WD40 (residues 23-334) in our assay condition ( Figure 2B), suggesting that they may be in spatial proximity. In contrast, RBBP5 CTD interacted with RBBP5 WD40 propeller (residues 2-333) with a dissociation constant (K d ) of 6.8 ± 0.9 M at 150 mM NaCl concentration ( Figure  2C). The binding of RBBP5 CTD to its WD40 propeller was primarily driven by hydrophobic interactions because increasing salt concentrations only mildly reduced the affinity between these two domains ( Figure 2C).

RBBP5 CTDM wraps around RBBP5 WD40
To further characterize the internal interaction between WD40 propeller and C-terminal regions of RBBP5, we generated a series of RBBP5 C-terminal constructs to assess their interactions with RBBP5 2-333 by ITC assays (Supplementary Figure S3A and B). A minimal RBBP5 C-terminal construct (390-480) retained the ability to bind with RBBP5 2-333 (Supplementary Figure S3A and B). After extensive screening, we found that deletion of a disordered loop (residues 422-443) from RBBP5 390-480 (RBBP5 390-480d ) promoted the crystallization of the complex composed of RBBP5 390-480d and RBBP5  (Supplementary Figure S3A and B). Hereafter, we refer to RBBP5 390-480d as RBBP5 CTDM (C-Terminal Distal Minimized) and RBBP5 10-325 as RBBP5 WD40 . Selenomethionine-substituted crystals diffracted up to 1.8Å resolution, and we were able to solve the complex structure by single-wavelength anomalous dispersion  ( Table 1). The electron density map unambiguously allowed us to build the atomic model of the majority of RBBP5 WD40 (residues 12-324) and two segments of RBBP5 CTDM (residues 396-404 and 452-474) (Figure 3 and Supplementary Figure S4A-C). RBBP5 WD40 forms a canonical ␤-propeller fold with seven WD40 repeats ( Figure 3A). The seventh repeat is comprised of three strands from the C-terminal region (RBBP5 296-324 ) and one strand from the N-terminal region (RBBP5 [12][13][14][15][16][17][18][19][20][21][22], thereby sealing the ␤-propeller. The RBBP5 WD40 conformation is almost identical to that of apo-RBBP5 WD40 with a root-main-square deviation value of 0.39Å for 319 C ␣ pairs, indicating that CTDM binding induces no dramatic structural rearrangement of RBBP5 WD40 (Supplementary Figure S4D). Some loop regions of WD40 propeller contacting CTDM slightly change their configurations (Supplementary Figure S4D). RBBP5 CTDM exhibits an extended conformation and wraps the WD40 propeller through two segments, burying 1472Å 2 of the surface area at the interface. The first segment of RBBP5 CTDM (CTD1) contains residues 396-404 and sits on the top surface of RBBP5 WD40 . The second segment (CTD2) meanders its way around the side face of RBBP5 WD40 and also pairs with the repeat 2 of the WD40 propeller to form a five-stranded ␤-sheet. After a sharp turn, CTD2 further extends along the bottom face and ends in the central cavity of the ␤-propeller fold (Figure 3). CTD1 and CTD2 are connected by a disordered loop which is invisible in the crystal structure.

The interfaces between RBBP5 CTD1 and RBBP5 WD40
The first segment of CTDM (CTD1) is rich in non-polar residues ( 396 SKALLYLPI 404 ) and runs across a hydrophobic groove formed by repeats 1 and 2 of RBBP5 WD40 (Figures 3 and 4A). Three leucine residues (L399, L400 and L402) in CTD1 fit snugly into the hydrophobic groove formed by residues from RBBP5 WD40 , including W35, L38, I50, W74, V95, L96, and the aliphatic side chain of K60 ( Figure 4B). Consistent with the structural model, alanine substitution of hydrophobic residues on RBBP5 CTDM (L399A, L400A) resulted in ∼13-fold de- creased binding affinities between between RBBP5 2-333 and RBBP5 CTDM ( Figure 4C). The activity assays showed that RBBP5  L399A mutation mildly decreased the HKMT activities of MLL1 core complexes and the activity with RBBP5 2-480 L400A mutation was not significantly different from wild type RBBP5 2-480 ( Figure 4D). It suggests that the decreased CTDM-WD40 interactions of RBBP5 L399A and L400A mutations (K d ∼ 70 M) are still strong enough to maintain the activities of MLL1 core complexes.
In addition to the hydrophobic contacts observed in the structure, electrostatic interactions also participate in stabilizing RBBP5 WD40 -RBBP5 CTD1 interaction. Although the acidic N-terminus of RBBP5 CTD1 (390DEELED395) is invisible in the crystal structure, this N-terminal extension is presumably located near the central arginine ring of the WD40 propeller and may interact with RBBP5 WD40 through electrostatic attractions ( Figure 4A). This model is partly supported by the observation that the RBBP5 393-480 construct with fewer acidic residues in CTD1 possessed a reduced affinity with RBBP5 2-333 (Supplementary Figure  S3A). In summary, the hydrophobic contacts and accompanying electrostatic interactions determine the specific recognition of RBBP5 CTD1 by RBBP5 WD40 .
It is worth noting that a similar CTD1 motif in yeast Swd1 also mediates an internal interaction with the Swd1 WD40 propeller (14). Superimposition of the WD40 domains from hRBBP5 and KlSwd1 reveals that a portion of Swd1 CTD1 adopts a similar configuration as RBBP5 CTD1 and binds to a hydrophobic cleft of Swd1 WD40 (Supplementary Figure S5A-D). Notably, RBBP5 CTD1 from higher organisms contains a hydrophobic core flanked by two acidic segments, but yeast orthologs have much fewer acidic residues (Supplementary Figure S5B). Correspondingly, the RBBP5 orthologs in yeast species also do not contain the arginine-ring residues that are involved in CTD1-binding (Supplementary Figure S5A), so the acidic feature of Swd1 CTD1 may not be necessary for its association with the WD40 propeller. Another major difference between human RBBP5 CTD1 and yeast Swd1 CTD is that yeast CTD1 ends with a short ␣ helix binding to the WDR5 ortholog, Swd3 (Supplementary Figure S5C). Swd1 CTD1 is absolutely essen-tial for binding Swd3 and removal of this short CTD1 completely abolished Swd3-Swd1 interaction (14). On the contrary, hRBBP5 CTD1 only plays a negligible role in stabilizing WDR5-RBBP5 interaction, because full-length RBBP5 (RBBP5 2-538 ) and RBBP5 WDRP (RBBP5 330-381 ) have comparable binding affinities with WDR5 ( Supplementary Figure S5E). Taken together, the CTD1 motif of RBBP5 is a conserved module and has gained structural and functional plasticity in the process of evolution.

The interface between RBBP5 CTD2 and RBBP5 WD40
The second segment of RBBP5 CTDM wraps the side and the bottom faces of RBBP5 WD40 . The absence of CTD2 in yeast species (Supplementary Figure S5B) indicates a more specific role of CTD2 in higher organisms. The CTD2 motif is docked to a relatively hydrophobic pocket formed by repeats 2, 3 and 4 (Figures 3 and 4E). The interface between RBBP5 CTD2 and RBBP5 WD40 consists of a set of hydrophobic residues, including I457, L459, V462, P469, L470 and L471 from RBBP5 CTD2 and L28, C70, F106, I110, L111, P127, V133, V144 and L299 from RBBP5 WD40 ( Figure  4F). Mutations of the hydrophobic residues (L470A, I457A and L459A) on CTD2 diminished the interaction between RBBP5 CTDM and RBBP5 2-333 ( Figure 4C). Combined mutation of CTD1 and CTD2 (L399A/L400A/I457A/L459A, named as RBBP5 CTDM -4A mutant) further disrupted the RBBP5 CTDM -RBBP5 2-333 interaction ( Figure 4C) and reduced the HKMT activity of the MLL1 complex to the activity level comparable with that of the complete-CTDdeletion construct (RBBP5 2-381 ) ( Figure 4D). These results reinforced the notion that both CTD1 and CTD2 are required for stable association with RBBP5 WD40 , and this internal interaction fine-tunes the methyltransferase activity of the MLL1 complex.

The internal interaction of RBBP5 compacts the MLL1 complex
To further investigate how the internal interaction of RBBP5 contributes to the assembly of the MLL1 core   Figure S6A). The SAXS data were used to calculate the maximum particle dimension (D max ) and the radius of gyration (R g ). The complexes with RBBP5 2-480 and RBBP5 2-538 have similar D max and R g values ( Figure 5A and Supplementary Table S1). In contrast, the MLL1 complex assembled with RBBP5 2-381 had much larger D max and R g values, indicating that deletion of RBBP5 CTD (residues 381-538) resulted in an extended conformation of the MLL1 complex ( Figure 5A and Supplementary Table  S1). The SAXS data were used to calculate the ab initio model for each complex (Supplementary Figure S6B). Rigid body superposition clearly showed that the molecular envelope of the RBBP5 2-480 -containing MLL1 complex is similar to the RBBP5 2-381 -containing complex, but the longest dimension of the RBBP5 2-480 -containing MLL1 complex (184Å) expands to 210Å in the RBBP5 2-381 -containing complex ( Figure 5B). This result indicates that the existence of CTD compacts the overall conformation of the MLL1 complex. In keeping with this idea, the 4A mutant of RBBP5 2-480 , which disrupted RBBP5 CTDM -RBBP5 WD40 interaction ( Figure 4C), generated an extended MLL1 complex with increased D max and R g values ( Figure 5A and Supplementary Table S1). These SAXS analyses steadily suggest that the RBBP5 WD40 -RBBP5 CTD interaction plays a vital role in promoting the MLL1 complex to maintain a compact conformation required for efficient methylation reactions.

A novel CTD3 motif is important for methylation on nucleosomes
The above studies of MLL1 activity regulation by RBBP5 are limited to the methylation of H3 peptides. Then we are eager to extend the landscape to know if the RBBP5 WD40 -RBBP5 CTD interaction is also crucial for histone methylation on nucleosomes. We first tested the HKMT activities of the MLL1 complexes with recombinant mononucleosomes by a western-blot-based methyltransferase assay. The MLL1 complex assembled with RBBP5 WDRP (RBBP5 330-381 ) has a weak methyltransferase activity on nucleosomes ( Figure 6A, lane 1). The inclusion of either the WD40 propeller or CTD only slightly increased the activity of the MLL1 complex ( Figure 6A, lanes 2 and 3). A considerable increase in the monomethylation and dimethylation levels of nucleosomal H3 was observed in the MLL1 complex assembled with RBBP5 2-480 , which retains the interaction between RBBP5 WD40 and RBBP5 CTDM ( Figure 6A, lane 4). The MLL1 complex assembled with the RBBP5 4A mutant that disrupted RBBP5 WD40 -RBBP5 CTDM interaction had much weaker methyltransferase activity than the complex containing RBBP5 2-480 ( Figure 6A, lane 7). These results thus confirmed the critical role of the RBBP5 WD40 -RBBP5 CTDM interaction in stimulating the methyltransferase activity of MLL1. Interestingly, the inclusion of a C-terminal tail (RBBP5 480-538 ) moderately increased the methylation of nucleosomal H3 ( Figure 6A, lane 5), indicating an important role of the uncharacterized tail of RBBP5 in activity regulation of the MLL1 complex.
In the western-blot-based methyltransferase assay, equal concentrations of enzymes and the nucleosome substrates were used. The result may not quantitatively describe the activities of the MLL1 complexes containing different RBBP5 fragments since western blot signals can be easily saturated. To accurately compare the activities of the MLL1 complexes, we performed steady-state kinetic analysis of the methyltransfer reaction using 50 nM enzymes and 1 M substrates. We found that the MLL1 complex assembled with RBBP5 2-381 had negligible activity towards nucleosomes at such a condition, and the initial reaction rate could not be accurately measured ( Figure 6B). Time-course monitoring of methylation progression manifested that RBBP5 2-480 substantially increased the activity of MLL1 complex on nucleosomes and RBBP5 2-538 generated a ∼3-fold higher activity than RBBP5 2-480 (Figure 6B). It is noteworthy that RBBP5 2-538 and RBBP5 2-480 had the same methylation activities on H3 peptides ( Figure  4D). These data suggest that RBBP5 480-538 played an additional supporting role in mediating efficient methylation of nucleosomes. Here we name the RBBP5 480-538 segment RBBP5 CTD3 . RBBP5 CTD3 may stimulate the HKMT activity of the MLL1 complex by facilitating the nucleosome binding to the MLL1 complex. To test this idea, we carried out the electrophoretic mobility shift assay (EMSA) to characterize the interactions between MLL1 complexes and nucleosomes. The EMSA data showed that the MLL1 complex assembled with RBBP5 2-538 had a much higher binding affinity with nucleosomes than the MLL1 complex assembled with RBBP5 2-480 ( Figure 6C). This result thus implies that RBBP5 CTD3 contains extra binding interfaces with nucleosomes and increases the MLL1 complex association with nucleosomes. Each data point was shown as mean ± s.d. from triplicate measurements. Some error bars were not shown because the error bar was shorter than the size of the symbol. The dissociation constants (K d ) and the reported fitting errors were determined from the averaged data points by fitting with the sigmoidal dose-response model.

The CTD3 motif has a DNA-binding activity
Examination of the RBBP5 CTD3 sequence provides us some clues on how RBBP5 CTD3 contributes to nucleosome binding. We noticed that RBBP5 CTD3 is rich of basic residues. There are 14 K/R out of 59 residues in RBBP5 CTD3 (residues 480-538). We hypothesized that these K/R residues might be involved in DNA binding, thereby increasing the binding of MLL1 complex to nucleosomes.
To test this hypothesis, we checked the binding ability of RBBP5 to a fluorescently labeled 167-bp DNA by fluorescence polarization (FP) assay. As expected, RBBP5 CTD3 bound strongly to DNA (K d = 13.3 ± 2.4 M), comparable to the full-length RBBP5 (K d = 11.5 ± 0.9 M) ( Figure 6D). Removal of CTD3 from RBBP5 dramatically disrupted its binding to DNA (the RBBP5 2-480 curve, Figure 6D). Additionally, we mutated ten lysine residues in CTD3 (K480/481/482/488/491/493/495/500/502/505) to alanine and found that this full-length RBBP5 mutant (RBBP5 KMA ) substantially attenuated the interaction with DNA ( Figure 6D). This DNA-binding-deficient RBBP5 mutant also severely impaired the association of the MLL1 complex with nucleosomes ( Figure 6C). As a result, the methylation activity of the MLL1 complex assembled with RBBP5 KMA was decreased ( Figure 6A and B). Collectively, these data reveal that RBBP5 CTD3 is a novel DNA-binding motif in RBBP5. This new element can stabilize the MLL1 complex associated with nucleosomes and stimulates the HKMT activity of the MLL1 complex on nucleosome H3.

DISCUSSION
RBBP5 is a conserved scaffold protein to orchestrate the assembly of KMT2-family methyltransferase complexes (13)(14)(15). In the present work, we identified an internal interaction in RBBP5 and revealed its roles in MLL1 complex assembly and activity regulation. The dissociation constant for the interaction between RBBP5 WD40 and RBBP5 CTD is around sub micro-molar range as determined by ITC assays ( Figure 2C). Given that the binding affinity is measured from two purified proteins separated in solution, the actual interaction between these two regions on a single RBBP5 polypeptide is expected to be much stronger than that reported here. Thus, this internal interaction in RBBP5 might be comparable with other RBBP5-mediated intermolecular interactions in the MLL1 complex (e.g. WDR5-RBBP5, K d = 1.8 ± 0.1 M; ASH2L-RBBP5, K d = 0.46 ± 0.04 M) (18,32). Such a robust internal interaction may have a profound effect on the assembly of the MLL1 complex. We indeed observed that the interaction between RBBP5 WD40 and RBBP5 CTD helps the MLL1 complex to form a compact conformation, as shown by the SAXS analyses ( Figure 5). The compact conformation of the MLL1 complex driven by the RBBP5 WD40 -RBBP5 CTD interaction may facilitate the proper coordination of structural elements required for efficient methylation reactions ( Figure  7). Another intriguing finding in our study is the novel DNA-binding CTD3 motif found at the end of RBBP5. It is worth noting that Mittal et al. recently reported that RBBP5 WD40 domain could bind dsDNA (K d : 29.7 ± 4.2 M) and ssRNA (K d : 22.5 ± 3.2 M), presumably mediated by the arginine-rich surface of WD40 propeller (17). We also found that RBBP5 2-325 binds to dsDNA with a similar affinity (K d : 35.1±8.7 M) ( Figure 6D). However, RBBP5 2-480 that includes WD40, WDRP, CTD1 and CTD2 had a negligible DNA-binding activity ( Figure 6D). This result can be explained by our structural model, in which the arginine-rich surface of RBBP5 WD40 is involved in the binding of the acidic segment of RBBP5 CTD1 ( Figure 4A), thereby masking the potential DNA-binding surface on the WD40 propeller. Instead, the CTD3 motif serves as the primary DNA-binding element in RBBP5, as the mutation in CTD3 substantially decreased the DNA-binding activity of full-length RBBP5 ( Figure 6D). This novel DNAbinding element is important for nucleosome recognition, as deletion or mutation of the CTD3 motif decreased the nucleosome-binding affinity of the MLL1 complex ( Figure  6C). We propose that when the MLL1 complex binds to nucleosomes, RBBP5 is positioned close to the nucleosome and RBBP5 CTD3 directly contacts with nucleosomal DNA (Figure 7). RBBP5 CTD3 acts as an additional buckle to secure nucleosome recognition and also helps orient the H3 tail to the active site of MLL1 SET , thereby increasing the MLL1 methylation activity on nucleosomal substrates.
Finally, the structural comparison of human RBBP5 and yeast Swd1 allowed us to understand the evolutionary plasticity of RBBP5. The primary sequences of the N-terminal WD40 propeller and the central WDRP region of RBBP5 are well conserved across species (Supplementary Figure  S5A). In contrast, the C-terminal distal regions in different species are highly variable (Supplementary Figure S5B). Due to the absence of the CTD2 and CTD3 motifs in RBBP5 yeast orthologs (Supplementary Figure S5B), the internal interaction between WD40 and CTD is not necessarily conserved in yeast species. The CTD2 motif that wraps around RBBP5 WD40 is specific for animal and plant species. The CTD3 motif involved in nucleosome recognition is exclusively found in animal species. It is worth noting that the differences in CTD3 motifs are closely correlated with the complexity of the animal species. For example, C. elegans RBBP5 has a very short CTD3, while vertebrate RBBP5 has a much longer CTD3 with more positively charged residues. Differences in CTD regions strongly support the idea that RBBP5 becomes more complicated during evolution to accommodate its role in precise regulation of gene expression in higher organisms. How RBBP5 mediates functional specification of the KMT2-family methyltransferase complexes in different species awaits further investigation.