The MLL1 trimeric catalytic complex is a dynamic conformational ensemble stabilized by multiple weak interactions

Abstract Histone H3K4 methylation is an epigenetic mark associated with actively transcribed genes. This modification is catalyzed by the mixed lineage leukaemia (MLL) family of histone methyltransferases including MLL1, MLL2, MLL3, MLL4, SET1A and SET1B. The catalytic activity of this family is dependent on interactions with additional conserved proteins, but the structural basis for subunit assembly and the mechanism of regulation is not well understood. We used a hybrid methods approach to study the assembly and biochemical function of the minimally active MLL1 complex (MLL1, WDR5 and RbBP5). A combination of small angle X-ray scattering, cross-linking mass spectrometry, nuclear magnetic resonance spectroscopy and computational modeling were used to generate a dynamic ensemble model in which subunits are assembled via multiple weak interaction sites. We identified a new interaction site between the MLL1 SET domain and the WD40 β-propeller domain of RbBP5, and demonstrate the susceptibility of the catalytic function of the complex to disruption of individual interaction sites.


INTRODUCTION
Post-translational modifications on histone tails are key epigenetic signals for regulation of chromatin structure and gene expression. H3K4 methylation is a complex, dynamic process that is strongly correlated with actively transcribed genes or those that are in a poised or bivalent state (1). Mono-, di-and trimethlyated species of H3K4 exhibit a gradient distribution with respect to transcription start sites (TSSs); H3K4me3 is most abundant close to TSSs and in promoter regions, whereas H3K4me2/me1 marked histones are enriched further up-and downstream (2). H3K4 methylation is catalyzed by the MLL/SET1 family of histone methyltransferases (3,4), through their evolutionarily conserved SET domain (5,6). The founding member of this family of H3K4 methyltransferases is the yeast SET1 protein (7,8). In mammals, methylation of H3K4 is carried out by a family of six proteins: MLL (mixed lineage leukemia protein)1 to MLL4, SET1A and SET1B (9)(10)(11)(12)(13)(14)(15). The MLL proteins play crucial roles in embryonic development and hematopoiesis through transcriptional regulation of the clustered homeobox (Hox) genes and other genes important for developmental regulation (10,(16)(17)(18)(19). Deletion of MLL1 and MLL2 can lead to severe defects in embryonic development in mice (18,20). The MLL1 gene is frequently rearranged in human acute leukemia in both adults and children (21)(22)(23). Recently, studies have identified inactivating mutations in MLL3 and MLL4 in different types of human tumors (24)(25)(26)(27), as well as in Kabuki syndrome (28).
The catalytic activity of MLL/SET family members are dependent to varying degrees on the presence of additional evolutionarily conserved protein subunits, RbBP5, WDR5, ASH2L and DYP30, which together form the core complexes of MLL enzymes (29)(30)(31)(32)(33)(34). A minimal core enzyme can be reconstituted with the C-terminal SET domain fragment of MLLs and at least two of the other subunits (29,31,35). In studies of these reconstituted core enzymes, MLL1 appears to be unique among the family members in its requirements for, and interactions with other subunits. For example, in relation to other MLLs, the catalytic activity of MLL1 is most strongly stimulated by WDR5 (31,36,37), whereas it binds with the least affinity and is only weakly stimulated by the RbBP5-ASH2L heterodimer (35).
Crystallographic studies of MLL3 support a model in which the RbBP5-ASH2L heterodimer stabilizes the catalytically active conformations of MLL2/3/4 through interactions with conserved surfaces on their SET domain (35). However, it was suggested that two key variant residues on this surface of MLL1 dramatically weakened the interaction between MLL1 and RbBP5-ASH2L relative to that of other MLL members, thereby increasing the dependence of MLL1 on WDR5 (35). The unique dependence of MLL1 activity on WDR5 may be of therapeutic relevance, as we and others have shown that pharmacological targeting of the MLL interaction site on WDR5 can functionally antagonize MLL1 in cancers that are dependent on MLL1 activity (38)(39)(40).
While there are several structures of WDR5 bound to MLL and RbBP5 peptides (37,(41)(42)(43)(44), as well as a crystal structure of the apo-SET domain (45) of MLL1 and a 24Å resolution cryo-EM model of the homologous yeast COM-PASS (46), an atomic level picture of a functional MLL1 catalytic complex is still lacking. There is evidence of a hierarchical organization, wherein WDR5 and RbBP5 jointly interact with MLL1 to form a stable species (29,31,34), which we refer to as the 'minimal catalytic complex'. This trimer can serve as a scaffold for the association of ASH2L and DPY30 (29).
Here, we report a hybrid methods study of MLL1 and its catalytic core components in solution. Using small angle X-ray scattering (SAXS), cross-linking mass spectrometry (XL-MS), nuclear magnetic resonance (NMR) spectroscopy and computational modeling we derived a dynamic ensemble model for the WDR5-RbBP5-MLL1 complex, and identify a new interaction site between the MLL1 SET domain, and the N-terminal WD40 repeat domain of RbBP5. Our data support the notion that the functional MLL1 enzyme comprises a collection of weak but specific interactions, and that the disruption of individual interactions can have significant destabilizing effects on the entire complex.

Protein preparation
Individual components of the MLL1 complex were expressed in Escherichia coli and purified using an Nterminal GST-tag (for MLL1) or His-tag (for WDR5 and RbBP5). We found MLL1 RBS-SET to be better behaved and more stable than MLL1 WIN-SET . Therefore, we used the MLL1 RBS-SET construct whenever possible. The characterization of complexes involving both MLL1 and WDR5 required the use of MLL1 WIN-SET . The dimeric and trimeric complexes of MLL1 used for SAXS and cross-linking studies were expressed in Sf9 cells. The dimeric complex of WDR5-MLL1 WIN-SET and WDR5-RbBP5 were purified using TALON affinity resin (Clontech), followed by gel filtration chromatography. Purified dimeric complexes were incubated together on ice for 2 h to reconstitute the trimeric complex, which was subsequently purified and recovered by gel filtration chromatography. Detailed procedures are described in the 'Supplementary Data' section.

SAXS data collection, analysis and modeling
SAXS measurements were carried out at the beamline 12-ID-C of the Advanced Photon Source, Argonne National Laboratory. The energy of the X-ray beam was 18 Kev (wavelength λ = 0.6888Å), and two setups (smalland wide-angle X-ray scattering, SAXS and WAXS) were used in which the sample to charge-coupled device detector (MAR research, Hamburg) distance were adjusted to achieve scattering q values of 0.006 < q < 2.3Å −1 , where q = (4/)sin and 2 is the scattering angle. Data were analyzed using the program PRIMUS (ATSAS package, EMBL (47)). Detailed descriptions of SAXS data collection, analysis and modeling protocols, are provided in the Supplementary Data.

Chemical cross-linking mass spectrometry
The reconstituted trimer of WDR5, RbBP5 and MLL1 WIN-SET was cross-linked at a concentration between 12 M and 16 M, with 1 mM of isotopically coded disuccinimidyl suberate (DSS-d 0 ,DSS-d 12 ) as described previously (48). Protease digestion was carried out with LysC and trypsin. After acidification, cross-linked peptides were purified on C18 cartridges and enriched by size-exclusion chromatography (SEC). SEC fractions Nucleic Acids Research, 2019, Vol. 47, No. 17 9435 were analyzed in duplicate on an LC-MS (Easy-nLC 300; Orbitrap LTQ XL). For complete details, refer to Supplementary Data.

GST Pull-down experiments
Recombinant purified MLL1-GST proteins were incubated with various RbBP5 constructs (in an assay buffer containing 20 mM TRIS pH 7.7, 150 mM NaCl, 10 M ZnCl 2 , 5 mM ␤-mercaptoethanol, 5 mM dithiothreitol (DTT), and 1 mM phenylmethanesulfonyl fluoride (PMSF)) in a 1:2 molar ratio at 4 • C for 1 h. Proteins were then incubated with 100 L of glutathione-Sepharose beads (GE Healthcare) for an additional 1 h. The mixture was transferred to a micro-column and was extensively washed with assay buffer. Bound proteins were eluted with 30 mM reduced glutathione, and detected by sodium dodecyl sulphate-polyacrylamide gelelectrophoresis (SDS-PAGE) and Coomassie staining.

Biolayer Interferometry
The interaction between various RbBP5 constructs with GST-tagged MLL RBS-SET and WDR5 was measured using the Octet Red System (Forte Bio). All experiments were performed using phosphate-buffered saline containing 0.2 mg/ml bovine serum albumin and 0.1% (v/v) Tween-20, in a 96-well plate with 200 L in each well and constant shaking (1000 rpm). GST-tagged constructs were loaded onto anti-GST antibody-coated biosensors (Forte Bio), and the sensors were washed for an extended period in the buffer. Loaded sensors were then incubated with RbBP5 constructs at different concentrations before discharge into separate buffer wells. The binding affinity was determined by steady-state analysis using the program Gnuplot.

Histone methyltransferase assay
Activity assays were performed in 50 mM Tris-HCl, pH 8.0, 5 mM DTT and 0.01% Triton X-100, using 5 M 3 H-SAM and 5 M Biotin-H3 1-25 . Increasing concentrations of RbBP5 were added to 200 nM of WDR5-MLL1 WIN-SET (with either wild-type or mutant MLL1). All reactions were incubated for 90 min at room temperature and a scintillation proximity assay (SPA) was used to determine activities. Experiments were performed in triplicate. For assays with OICR-9429, increasing concentrations of the compound was incubated with 200 nM WDR5-MLL1 WIN-SET for 20 min before adding 400 nM RbBP5.

SAXS data reveal solution ensembles for WDR5, RbBP5 and MLL1 RBS-SET
To model catalytically active MLL1 complexes, we first collected reference solution data for the individual subunits including the SET domain of MLL1, the WD40 repeat region of WDR5 (WDR5 WD40 ), the N-terminal domain of RbBP5 (RbBP5 NTD ) and full-length RbBP5 (which we refer to from here forward simply as RbBP5), followed by the characterization of dimeric and trimeric complexes. Figure  1A shows the protein constructs used in this study. Normalized Kratky plots of WDR5 WD40 and RbBP5 NTD exhibit a typical bell-shape with a maximum at (1.73, 1.1) expected for globular proteins and are nearly superimposable in the q range 0<qRg<3 ( Figure 1B). Also, the experimental values of Rg predicted for WDR5 WD40 and RbBP5 NTD are in agreement with the theoretical values expected for globular proteins (Table 1 and Supplementary Figure S1). The normalized Kratky plot of MLL1 RBS-SET also exhibits a bellshape, but its maximum is shifted with respect to the globular protein position, with poor convergence at high q-values, indicating that MLL1 RBS-SET contains flexible regions. The observed flexibility of MLL1 RBS-SET could be attributed to known inherent dynamics of the SET domain in the absence of cofactor (35), and to the disordered N-terminus of the MLL1 RBS-SET construct. The calculated solution ensembles for each protein taking into account known or predicted disordered regions (see Supplementary Data for details) establish good correspondence between our SAXS measurements and the crystal structures of WDR5 (53), the SET domain of MLL1 (45) and the WD40 domain of RbBP5 (54) (Supplementary Figure S2).
Initially, one of the main challenges in modeling the MLL1 complex was the lack of structural information on RbBP5. For our characterization and modeling of RbBP5containing complexes we made use of a ROSETTA-derived homology model of its WD40 domain (i.e. RbBP5 NTD ). However, late in the course of manuscript preparation, Mittal et al. (54) reported the crystal structure of the mouse RbBP5 WD40 repeat region, which forms a canonical 7unit ␤-propeller structure (PDB ID: 5OV3). The human and mouse WD40 domains of RbBP5 have 100% sequence identity and there is excellent agreement between our homology model and the reported structure (r.m.s.d. ∼2.1Å; Supplementary Figure S2H), which we believe validates the model's use in our study. To help understand RbBP5 behavior in solution, we collected ( 1 H-15 N)-TROSY spectra of a full-length construct, as well as constructs corresponding to the C-terminus (CT) and NTD ( Figure 2A). The spectrum of RbBP5 NTD is consistent with our model, and the reported ␤-propeller fold; there is considerable peak dispersion due to the high ␤-strand content, and we are able to identify ∼250 out of 316 expected backbone amide signals. We see a similar level of peak dispersion in TROSY spectra of WDR5 WD40 (vide infra). We can distinguish approximately seven out of nine expected tryptophan indole signals based on their position in the lower left corner of the spectrum; however without resonance assignments, this cannot be unambiguously verified. Amide residues in long unstructured regions of proteins generally have poorly differentiated chemical environments and long relaxation times due to fast internal dynamics on the ps-ns timescale, resulting in sharp signals clustered between 7.5 and 8.5 ppm (55). The spectrum of RbBP5 CT indicates a lack of structure ( Figure  2A). We are able to identify ∼120 peaks excluding putative side-chain signals that are expected to appear in the upper left region of the spectrum (i.e. 7.8-6.6 ppm for 1 H and 115-110 ppm for 15 N). The RbBP5 CT construct contains 199 residues of which 19 are prolines and no tryptophans, and it is likely that several peaks comprise signals from two or more amides. The TROSY spectrum of RbBP5 (  RbBP5 NTD and RbBP5 CT indicates that the strongest amide signals for RbBP5 are clustered in the center of the spectrum and arise from residues in the C-terminus (Supplementary Figure S3A). Nevertheless, several resonances from the ␤propeller region are visible, and do not uniformly overlap with those in the spectrum of RbBP5 NTD . The NMR spectra indicate that RbBP5 exhibits a high degree of disorder and this is consistent with its gel filtration profile (Supplementary Figure S3E). This is also reflected in its Rg, calculated from SAXS measurements (Table 1 and Supplementary Figure S1), and the shape of the normalized Rg-based Kratky plot and the pair distance distribution function P(r) ( Figure 1B and C). In particular, the P(r) function has an asymmetric shape with a long smooth tail at large r-values, and the position of its maximum is shifted only slightly (∼4 A) with respect to that of RbBP5 NTD . The latter features indicate that RbBP5 has no globular content beyond its ␤-propeller domain. Moreover, sequence-based theoretical calculations of both secondary structure and order parameters also predict a rigid globular N-terminus and a flexible coil-like C-terminus (Supplementary Figure S3F). Molecular weight estimates derived from SAXS data indicate that both RbBP5 and RbBP5 NTD are monomeric in solution (Table 1). We used the sparse ensemble selection (SES) approach (56) to calculate a solution ensemble of RbBP5 that would satisfy the SAXS data. An initial ensemble consisting of 20 000 models with random conformations of its flexible regions (i.e. residues 1-23 and 326-538) did not fit the SAXS data well (goodness-of-fit χ saxs = 9.4). We next generated an ensemble that better fits the SAXS data, by calculating an optimal weight for each model in the initial ensemble using a multi-orthogonal matching pursuit algorithm (56) (see Supplementary Data for details). The resulting optimal ensemble fits the SAXS data very well with χ saxs = 0.38 (Figure 2C and Supplementary Figure S2B)--the most populated models are shown in Figure 2B and Supplementary Figure S2F and G. In these models both N-and C-terminal regions preferably 'fold in', rather than adopt extended conformations (Supplementary Figure S2F). The optimal ensemble displays a much more narrow Rg distribution than the initial random ensemble, with a major peak at 37Å (Fig-ure 2C). This indicates that RbBP5 is more compact than would be predicted if its C-terminus was fully random.

Binary subcomplexes have dynamic non-random solution conformations mediated by WD40 repeat domains
Our SAXS data for the binary complexes of WDR5-MLL1 WIN-SET and WDR5-RbBP5 both suggest the presence of significant disorder, especially for WDR5-MLL1 WIN-SET ( Figure 3A). The P(r) functions of WDR5-MLL1 WIN-SET and WDR5-RbBP5 are typical for proteins containing globular domains tethered by long disordered regions ( Figure 3A). The position of the P(r) major peaks for the aforementioned complexes is close to their respective positions for the individual components ( Figure 1C), indicating that in both complexes the globular domains are not in close contact and may not adopt a unique arrangement in solution. WDR5 is known to interact with RbBP5 and MLL1 through small peptide segments designated as the WDR5 binding motif (WBM) (43) and WDR5 interacting (WIN) motif (41), respectively ( Figure  1A). Both interactions have reported dissociation constants on the order of 1-2 M (36,41,43,44). To calculate solution ensembles of the binary complexes, we first used ( 1 H-15 N)-TROSY titrations to verify that WDR5 s interaction with the motifs, as observed in the crystal structures, is maintained in solution. To this end, we expressed a triply labeled ( 15 N/ 13 C/ 2 H) WDR5 WD40 construct (which contains 311 residues) and assigned 254 backbone spin systems representing 82% of the sequence (Supplementary Figure  S4). The assignments have been deposited in the BMRB database (BMRB ID: 27528). Amide resonance chemical shift perturbations (CSPs) were then quantified for WDR5 titrated with peptides corresponding to the two motifs. Residues with the highest CSPs (Supplementary Figure  S5A) were mapped onto the WIN (PDB ID: 4ESG) and WBM (PDB ID: 2XL2) peptide-bound crystal structures ( Figure 3B). For both titrations, all of the assigned WDR5 residues at the binding interface were among those with the highest CSPs (Supplementary Figure S5A). We are therefore confident in using the crystal structures to delineate restraints governing the interaction of these motifs with WDR5 in our modeling of the binary complexes. The binary subcomplexes, which are both flexible, exhibit different structural organizations. The optimal ensemble for WDR5-MLL1 WIN-SET has an Rg distribution as broad as the initial random ensemble (Figure 3D), and the arrangement of the globular domains in the most populated models does not support the existence of additional interactions outside of the WIN motif (Supplementary Figure S5). In contrast, the optimal ensemble for the WDR5-RbBP5 displays a relatively narrow Rg distribution, with a major peak at ∼41Å (Figure 3D), indicating the predominantly populated conformations are more compact than those in the initial random ensemble. The relative position of the WDR5 and RbBP5 WD40 domains in the ensemble are well defined, with a distance between their centres of mass (d WR ) equal to 45.1 ± 0.7Å ( Figure 3C and Supplementary S5G).
There is no apparent direct contact between the domains and their relative orientation with respect to each other is variable. The r.m.s.d. between highly populated conformers in the optimal ensemble is ∼18Å due to the interdomain dynamics. The preference for compact conformers may be explained by the formation of interactions between RbBP5 CT and the two ␤-propeller domains. These contacts cannot be more precisely defined due to our use of rigid-body models in the calculations. We used biolayer interferometry (BLI) to estimate the binding affinity for WDR5-RbBP5 interaction, and in our hands found the K D to be ∼ 0.3 M (Supplementary Figure S5H and Table S1). This compares to a value of ∼2.4 M estimated using analytical ultracentrifugation by Cosgrove and colleagues (31) (Supplementary Table S1). In summary, our structural analysis of the binary subcomplexes suggests that WDR5-RbBP5 is relatively compact, with a well-defined distance between the WD40 domains. In contrast, WDR5-MLL1 WIN-SET has a significantly higher degree of flexibility, with a broad interdomain distance distribution profile. It should be noted that SAXS measurements were collected for a putative RbBP5-MLL1 RBS-SET complex, however the data were not of high enough quality to proceed with structural analysis. We believe this is due to the weak affinity between MLL1 RBS-SET and RbBP5, as compared to the intermolecular affinities observed with the WDR5-RbBP5 and WDR5-MLL WIN-SET pairs (see Supplementary Table S1).

SAXS and cross-linking data suggest a dynamic triangulated ensemble for WDR5-RbBP5-MLL1 WIN-SET
Our SAXS data for the WDR5-RbBP5-MLL1 WIN-SET complex showed significant flexibility in the sample. The shape of the experimental Kratky plots of the complex is typical of proteins with substantial interdomain flexibility ( Figure 4A and Supplementary Figure S6A). In particular, the Rg-based Kratky plot is a bell-shaped curve with a maximum at (2.26, 1.27), coordinates which are shifted to higher values with respect to those expected for a globular protein. Also, the presence of a high degree of flexibility is evidenced by the poor convergence of the Kratky plots at high q-values. The low maximum value of 0.48 in the Vcbased Kratky plot (Supplementary Figure S6A), as well as the asymmetric shape of the P(r) function (Supplementary Figure S6B), suggests an elongated shape. This agrees with the averaged ab initio SAXS-derived molecular envelope, which showed an extended shape with approximate dimensions of 220 × 105 × 70Å (Supplementary Figure S6C).
We note that pair distance distribution functions of proteins containing several globular domains, connected by long disordered regions, are characterized by peaks at low rvalues, corresponding to intradomain distances. Therefore, if the three globular domains of WDR5, MLL1 WIN-SET and RbBP5 are not interacting directly with each other within the trimer, we would expect the P(r) function to have peaks at 26-32Å, reflecting the interatomic distances prevailing within these domains ( Figure 1C and Supplementary Figure S6B). However, the experimental P(r) function has its maximum at a much larger distance of ∼47Å (Supplementary Figure S6B), suggesting the existence of interdomain contacts.
To aid our modeling of the trimeric complex, we performed XL-MS studies. We observed many intramolecular cross-links within each of the three proteins. These were highly consistent with the available WDR5 (53), MLL1 WIN-SET (45) and RbBP5 NTD (54) crystal structures indicating that the models are reliable representations of the domains within the complex in solution. We also observed a number of intermolecular cross-links, with the largest number being between MLL1 and RbBP5 suggesting they are in close proximity. Figure 4B shows sequence mapping of both intra-and intermolecular DSS cross-links. There are six intermolecular cross-links between lysine residues within the globular subunits that are shown on Figure 4B by solid blue lines. All 31 experimentally observed cross-links were used in the modeling (Supplementary Tables S2 and 3).
Using both SAXS and cross-linking data as conformational restraints, we utilized the SES approach to calculate solution ensembles of WDR5-RbBP5-MLL1 WIN-SET that satisfy both sets of experimental data. An initial pool of representative structures was generated by combining rigid-body modeling and molecular dynamics simulations for both all-atomic and coarse-grained models along with cross-link derived distance restraints (see Supplementary Data for details). It was assumed that MLL1 WIN-SET and RbBP5 were tethered to WDR5 via the WIN and WBM motifs, respectively, as seen in crystal structures (36,(41)(42)(43). It should be noted here that individual members of the initial ensemble of conformers did not necessary satisfy all intermolecular cross-links: each satisfied on average three to four.
The optimal ensemble of WDR5-RbBP5-MLL1 WIN-SET fits the SAXS data as a whole, with χ saxs = 0.23 in the qrange 0<q<0.23. While only SAXS data were used to select the optimal ensemble, each experimentally observed crosslink is consistent with at least one member, so that the ensemble as a whole is consistent with all 31 cross-links. The expected Rg exhibits a wide distribution with a maximum at ∼48Å ( Figure 4C) and the SES-derived ensemble suggests that the complex can assume a range of interdomain arrangements in solution (Figure 4 and Supplementary Figure S6C). One notable feature of the optimal ensemble is that the ␤-propeller domains of WDR5 and RbBP5 adopt the same relative positions within the trimer as they do in the WDR5-RbBP5 dimer ( Figure 4D and Supplementary Figure S6E). The placement of the MLL1 SET domain is more variable. Most conformers (∼80%) adopt a compact arrangement in which the SET domain, and the two WD40 domains are in close proximity ( Figure 4D). However, in a small population of conformers (∼20%), the SET domain is 'detached'. When the conformers adopt a compact conformation, the relative position of the SET and RbBP5 ␤-propeller domains is well-defined ( Supplementary Figure S6F) and this is supported by four interdomain crosslinks ( Figure 4B; Supplementary Figure S6D and Table S2). However, WDR5 s positioning varies because its contact with RbBP5 and MLL1 occurs within their flexible linker regions. There are only two interdomain cross-links that involve WDR5, and they can only be satisfied simultaneously in ∼10% of the conformers of the optimal ensemble.

The WD40 ␤-propeller domain of RbBP5 has a unique interaction with MLL1
A crystallographic study by Li et al. (35) highlighted the critical role of the AS+ABM region of RbBP5 in stimulating SET domain methyltransferase activity in the MLL family. The catalytic activities of MLL2/3/4 were found to be highly dependent on the presence of RbBP5 AS+ABM -ASH2L SPRY . In contrast, methyltransferase activity of MLL1 was weakly stimulated by RbBP5 AS+ABM -ASH2L SPRY , and more dependent on WDR5. The authors identified a surface of the SET domain (in the I-SET motif) that serves as a hub for MLL-RbBP5-ASH2L interaction. Two MLL1 residues at this surface (Asn3861 and Gln3867) have different side-chain properties compared to MLL2/3/4 (hydrophilic/bulky vs. hydrophobic) that prevent 'optimal' RbBP5 AS+ABM -ASH2L SPRY interaction. Mutation of these two residues to their MLL2 (or MLL3) counterparts restored the binding interface, such that MLL1 could be crystallized with (PDB ID: 5F6L) and its methyltransferase activity stimulated by the RbBP5-ASH2L dimer.
Our study of the WDR5-RbBP5-MLL1 WIN-SET complex provides a basis for understanding how MLL1 methyltransferase activity is stimulated by RbBP5 and WDR5. When the trimer adopts a compact configuration (found in ∼75% of the optimal ensemble conformers), we observe a direct interaction between the WD40 domain of RbBP5 and a short peptide sequence of MLL1 located between its WIN motif and SET domain. We refer to this 7-residue RbBP5 binding sequence as the RBS region of MLL1 ( Figure 1A). To confirm this specific interaction, we performed GST pull-down and BLI binding studies with several RbBP5 and GST-tagged MLL constructs ( Figure  5). Both MLL1 RBS-SET and MLL1 WIN-SET were found to interact exclusively with RbBP5 constructs containing the Nterminal WD40 domain ( Figure 5A and B). No interaction is observed between MLL1 and RbBP5 AS-containing Cterminal constructs in GST pull-down ( Figure 5C) or BLI assays (data not shown). BLI was used to estimate the K D for the interaction of MLL1 RBS-SET and RbBP5 NTD , and found to be ∼8 M ( Figure 5D and E).
The RBS is a unique feature of MLL1 and is not conserved in MLL2/3/4 which rely strongly on the ASH2L-RbBP5 dimer for activation ( Figure 5F). We performed mutagenesis experiments to confirm the importance of the RBS in promoting MLL1 interaction with RbBP5, and in stimulating the methyltransferase activity of the SET domain. We constructed two MLL1 mutants: the first, MLL WIN-SET 7D, where its 7-residue RBS was deleted; and the second, MLL1 WIN-SET 3M, where three mutations (Q3787V, P3788L and Y3791G) transform MLL1 RBS to the corresponding sequence of MLL2. Both mutants failed to bind to RbBP5 constructs containing the N-terminal WD40 domain (Figure 5G). Furthermore, we found that both deletion or mutation of MLL1 RBS decreased RbBP5 s ability to stimulate methyltransferase activity of the WDR5-MLL1 WIN-SET complex ( Figure 5H).
Our primary model for WDR5-RbBP5-MLL1 WIN-SET , which represents the majority of compact conformers in our optimal ensemble is presented in Figure 6. In this model, only the ␤-propeller domain of RbBP5 makes contact with the MLL1 SET domain--the AS is not positioned correctly to enable contact with the SET domain. The RBS binding surface of RbBP5 NTD consists of a number of hydrophobic residues (V249, W279, I283, L286, V287 and I289), as well as Q273, Y277 and P253 (Supplementary Figure  S6G). The RBS may also participate in an intramolecular Nucleic Acids Research, 2019, Vol. 47, No. 17 9445 association with the SET domain which serves to bridge the RbBP5/SET interaction. Interestingly, in our model we see an interaction between RbBP5 NTD and Asn3861 of the SET domain, which as noted above, was identified as one of the two critical residues that distinguishes MLL1 from MLL2/3/4 vis-à-vis its ability to interact with RbBP5-ASH2L. SET domain residues that form the putative RBS + RbBP5 NTD binding interface (K3825, K3828, N3861, R3871, M3897, H3898, G3899, R3903 and F3904) are shown in Supplementary Figure S6H.
Within our compact trimer optimal ensemble ( Figure  4D) we see a small population of conformers that adopt a domain arrangement where the RbBP5 AS is favorably positioned to interact with the SET domain (Supplementary Figure S7). It is important to note that in these species, the RBS maintains its contact with RbBP5 NTD as seen in our primary model, however it no longer forms the intramolecular bridge with the SET domain ( Figure 6 and Supplementary Figure S7). This minor population of conformers highlights the potential for dual NTD/AS RbBP5 contacts with MLL1. Our GST pull-down and BLI binding studies show that the AS does not on its own, measurably interact with the SET domain ( Figure 5). However, it is possible there could be an avidity effect, where RBS binding to RbBP5 NTD promotes SET domain/AS interaction. Our attempts to confirm this avidity under multiple buffer conditions using BLI were inconclusive--while we see potentially stronger SET domain binding using RbBP5 constructs having both the NTD and AS+ABM region (Supplementary Figure S8), the binding behavior gave rise to non-ideal BLI sensorgrams, without steady-state and complete dissociation phases needed for proper K D determination. We believe this is due to protein aggregation in the assays. At present, we can only speculate that within the context of the trimer, the presence of full-length RbBP5 and WDR5 may facilitate some level of AS/SET interaction. This is supported by MD simulations of the all-atom model of the trimer which were initiated with the three globular domains positioned according to the minor population in the ensemble (Supplementary Figure S7), and with the AS positioned in contact with MLL1 SET as per the crystal structure of MLL1 N3861I/Q3867L bound to the RbBP5 AS+ABM -ASH2L SPRY dimer (32). Throughout the course of the MD trajectory (100 ns), the AS maintained constant contact with the SET domain.
Taken together, our structural characterization of WDR5-RbBP5-MLL1 WIN-SET suggests that its activation is mediated in part through the unique, but weak interaction of the MLL1 RBS with RbBP5, which in turn stabilizes the SET-I motif of the catalytic SET domain. WDR5 serves as a hub to promote this interaction, through its dual binding to the WIN (on MLL1) and WBM motifs (on RbBP5). Hence, we hypothesize that a triumvirate of weak, but specific intermolecular interactions are required to maintain the integrity of the MLL1 minimal complex, and that disruption of an individual interaction site may be sufficient to disrupt catalytic activity. To test this hypothesis, we measured the ability of OICR-9429, a small molecule antagonist of WDR5-MLL1 WIN interaction, to disrupt the association of WDR5-RbBP5-MLL1 WIN-SET using gel filtration ( Figure 7A). The disruption of WDR5-MLL1 interaction by the compound compromised the assembly of the trimer ( Figure 7A and B), and inhibited its catalytic activity ( Figure 7C). This is consistent with our previous work showing that OICR-9429 can disrupt the assembly and function of endogenous MLL1 complexes in cells (38). Similar results have been reported for MM-401, a peptide-based antagonist of WDR5-MLL WIN interaction (39,57). This has important implications for the development of pharmacological antagonists of the MLL1 complex, and further strengthens this approach to target other multiprotein complexes that are dependent on weak, but druggable interactions.
During revision of our manuscript, the crystal (58) (PDB ID: 6CHG) and cryo-EM structures (59) (PDB ID: 6BX3) for the yeast COMPASS were reported, which comprises orthologues of SET1, WDR5, RbBP5, ASH2L and DPY30. It is interesting to note that the relative position of the WD40 domains of WDR5 and RbBP5 is conserved not only in our dimer and trimer models, but is also consistent with the orientation in the reported COMPASS structures (Supplementary Figure S9A). However, the relative positions of the SET and two WD40 domains adopted in COMPASS (Supplementary Figure S9B) is not consistent with our experimental SAXS and cross-links data obtained for the MLL1 trimer (Supplementary Figure S9C). Moreover, the domain arrangement in COMPASS is not compatible with any of the conformers that make up our optimal ensemble of the minimal MLL1 trimer. This extends as well to our preliminary characterization of SAXS and cross-links data for the MLL1 pentameric complex. These differences suggest additional evidence of the distinct properties of MLL1 among the SET1 family of enzymes.