Microarray screening reveals two non-conventional SUMO-binding modules linked to DNA repair by non-homologous end-joining

Abstract SUMOylation is critical for numerous cellular signalling pathways, including the maintenance of genome integrity via the repair of DNA double-strand breaks (DSBs). If misrepaired, DSBs can lead to cancer, neurodegeneration, immunodeficiency and premature ageing. Using systematic human proteome microarray screening combined with widely applicable carbene footprinting, genetic code expansion and high-resolution structural profiling, we define two non-conventional and topology-selective SUMO2-binding regions on XRCC4, a DNA repair protein important for DSB repair by non-homologous end-joining (NHEJ). Mechanistically, the interaction of SUMO2 and XRCC4 is incompatible with XRCC4 binding to three other proteins important for NHEJ-mediated DSB repair. These findings are consistent with SUMO2 forming a redundant NHEJ layer with the potential to regulate different NHEJ complexes at distinct levels including, but not limited to, XRCC4 interactions with XLF, LIG4 and IFFO1. Regulation of NHEJ is not only relevant for carcinogenesis, but also for the design of precision anti-cancer medicines and the optimisation of CRISPR/Cas9-based gene editing. In addition to providing molecular insights into NHEJ, this work uncovers a conserved SUMO-binding module and provides a rich resource on direct SUMO binders exploitable towards uncovering SUMOylation pathways in a wide array of cellular processes.


INTRODUCTION
Posttranslational modification (PTM) with SUMO (small ubiquitin-like modifier) is key to regulating a panoply of cellular signalling pathways, including transcription, chromatin organisation, nuclear trafficking, DNA replication and DNA repair (1,2). It is therefore not surprising that deregulation of the SUMO system is associated with a range of prevalent human diseases including neurodegenerative disorders, cardiovascular diseases and cancer (1). In humans, the SUMO paralogues SUMO1-3 are ubiquitously expressed and established as posttranslational modifiers. SUMO2 and SUMO3 are almost identical, sharing 97% sequence identity, whereas SUMO1 and SUMO2/3 are ∼55% different. SUMOylation is mediated by an enzymatic triad: an activating E1 enzyme -a heterodimer formed by UBA2 (aka SAE2) and SAE1, the conjugating E2 enzyme UBE2I (aka UBC9), and one of ∼10 E3 ligases. SUMOylation can occur on one or multiple lysines of substrate proteins as monomers or chains of multiple SUMO moieties, creating a complex SUMO code. PolySUMO chains in cells are primarily formed by SUMO2/3 linked via their internal K11 residues, with SUMO1 mainly being deemed a chain terminator. Biochemical outcomes for distinct SUMO architectures can differ, with polySUMO chains being formed particularly in response to certain types of stressors, suggesting their importance in responding to such stimuli. However, we still know little about how polySUMO chains regulate specific cell signalling events (3). By translating SUMOylations into defined biochemical actions, SUMO receptorsproteins non-covalently binding to and recognising SUMO topologies -play key roles in determining the functional outcomes of SUMOylation events. Despite their importance and the large number (>7000) of substrate SUMOylations existing in human cells, only few (several tens) of SUMO receptors have been validated, and even less have been characterised for their binding to different SUMO topologies. As a consequence, little is known about lengthand paralogue-selective recognition of SUMO topologies. Indeed, knowledge of different SUMO-binding modules is mostly limited to a small number of varying themes centred on 4-5 hydrophobic amino acids called SUMO interacting motifs (SIMs) (4,5,6). This knowledge-gap limits our understanding of how SUMO functions at mechanistic levels and how it can best be exploited for treating human diseases associated with SUMO dysfunctions.
Here, we systematically screen the human proteome for receptors of polySUMO2 chains, identifying hundreds of candidates with diverse roles in established and emerging areas of SUMO biology. We validate a substantial and functionally varied set of SUMO receptors followed by in-depth characterisation of the SUMO-binding modules of one of the identified receptors, XRCC4. XRCC4 is a core DNA repair factor known for its importance in DNA doublestrand break (DSB) repair by non-homologous end-joining (NHEJ). DNA damage occurs frequently and can be caused by endogenous and exogenous sources. DSBs are the most cytotoxic DNA lesions and if left mis-or unrepaired, they can lead to cell death, mutagenesis or chromosomal translocation, and in turn cancer (7). Cells have evolved two major pathways to repair DSBs: homologous recombination (HR), which repairs DSBs with high fidelity in late S/G2 cell cycle phases using a homologous sequence as a template, usually the sister chromatid; and NHEJ, which is less accurate than HR, but functions throughout interphase and repairs the large majority of DSBs in mammalian cells (8). The importance for, and underlying mechanisms of, the SUMO system for key aspects of DSB repair by HR are well established, with SUMOylations of various HR factors and their decoding mechanisms via downstream receptors characterised (9). By contrast, little is known about how SUMOylation regulates NHEJ. Here, we identify and characterise two distinct non-conventional polySUMO2binding modules on XRCC4 located in its head and coiledcoil domains and demonstrate the relevance of one of them for DNA repair by NHEJ, as well as for our understand-ing of SUMO-binding in other, non-DNA repair-related proteins. Due to their locations, XRCC4 interactions with SUMO2 represent prime targets for regulating NHEJ at the level of three other key NHEJ factors, XLF, LIG4 and IFFO1, and/or via XRCC4 complexes functioning independently of these proteins (10).

PolySUMO2 microarray staining and analysis
PolySUMO2 chains (#ULC-220, Boston Biochem) were directly labeled with Cy5 following the manufacturer's guidelines (GE Healthcare). After 45 min incubation in the dark, 10% reaction volume of 2 M Tris-HCl pH 7.5 were added to quench unreacted dye, and the incubation extended in the dark for 10 min. PolySUMO2 chains were then purified in PD25 spin columns according to the manufacturer's recommendation (GE Healthcare). The purified and labeled polySUMO2 chains were immediately applied to blocked human proteome microarrays (HuProt™v2.0, CDI Laboratories) that contain >21 000 protein spots, representing ∼15 000 unique proteins with one or more isoforms. Microarrays were removed from −20 • C storage and placed at room temperature (RT) for 15 min before opening, to avoid condensation. Arrays were then blocked for 1 h at RT in PBS containing 0.05% Tween-20, 20 mM reduced glutathione, 1 mM DTT, 3% BSA, and 25% glycerol. Three PBS washes preceded a 90 min incubation step at RT with labelled polySUMO2 chains (or Tris-quenched Cy5 dye as a reference). After two washing steps with PBS containing 0.05% Tween-20, two PBS washes, and two washes with water, centrifugal drying (1000 rpm for 5 min at RT) was performed and the arrays scanned using a GenePix scanner (4100A by Molecular Devices). Microarray images were gridded and quantitated using GenePix Pro (v7) software. Median intensities (features and local backgrounds) were utilised, and signal-to-noise ratios calculated. Values were then normalised to biological controls within each array and duplicate features (representing identical proteins) summarised by average. These values were compared between arrays (polySUMO2-bound minus mock-treated array) then Loess transformed by print tip and location to remove technical sources of error (11), resulting in the final estimate of magnitude change (M-value). The threshold for proteins classifying as polySUMO2 receptor candidates was set to 1 standard deviation of the population above the population average. Given that the M-value is a twice-normalised (biologically and for technical sources of error) difference between mean signal-to-noise ratios generated from relative fluorescence units, it is reported/graphed as 'M-value' without units.

Carbene footprinting
Samples were prepared and analysed as previously described (12). Briefly, 20 M full-length (FL) XRCC4 or 25 M XRCC4 1-164 were mixed with 20 or 25 M of mSUMO2, respectively, in a buffer containing 20 mM HEPES pH 6.8, 140 mM NaCl, 1 mM EDTA, 2 mM DTT and 0.02% NaN 3 , as well as 10 mM of aryldiazirine probe (total volume, 20 l). The mixture was left to equilibrate for 5 min at RT before 6 l aliquots were placed in crystal-clear vials (Fisher Scientific UK) and snap-frozen in liquid nitrogen. The labelling reaction was initiated by photolysis of the mixture using the third harmonic of a Nd:YLF laser (Spectra Physics, repetition frequency 1000 Hz, pulse energy 125 J) at a wavelength of 347 nm. The frozen samples were irradiated for 10 s. All experiments were performed in triplicate. Following irradiation, samples were thawed, reduced (10 mM DTT in 10 mM ammonium bicarbonate), alkylated (55 mM iodoacetamide in 10 mM ammonium bicarbonate) and incubated at 37 • C with trypsin overnight (1:20 protease/protein ratio in 10 mM ammonium bicarbonate). The analysis of the digests was carried out on a Bruker MaXis II ESI-Q-TOF-MS connected to a Dionex 3000 RS UHPLC fitted with an ACE C18 RP column (100 × 2.1 mm, 5 m, 30 • C). The column was eluted with a linear gradient of 5-100% MeCN containing 0.1% formic acid over 40 min. The mass spectrometer was operated in positive ion mode with a scan range of 200-3000 m/z. Source conditions were: end plate offset at −500 V; capillary at −4500 V; nebulizer gas (N 2 ) at 1.6 bar; dry gas (N 2 ) at 8 l/min; dry temperature at 180 • C. Ion transfer conditions were: ion funnel RF at 200 V pp ; multiple RF at 200 V pp ; quadrupole low mass at 55 m/z; collision energy at 5.0 eV; collision RF at 600 V pp ; ion cooler RF at 50-350 V pp ; transfer time at 121 s; prepulse storage time at 1 s. A previously described method was used to quantitate the fraction of each peptide modified (13). Briefly, the chromatograms for each singly-labelled and unlabelled peptide were extracted within a range of ±0.1 m/z and the spectrum for each peak was manually inspected to ensure the sampling of the correct ion only. The peptide fractional modification was calculated using Equation 1 below, where A(labelled) and A(unlabelled) correspond respectively to the peak area of each labelled and unlabelled peptide. Differences in the extent of labelling between peptides were considered significant when the P value obtained from a Student t-test was <0.05.

NMR spectra acquisition and analysis
Protein spectra were recorded at 310 K on a Bruker 800 MHz spectrometer with a 1 H/ 13 C-15 N TCI cryoprobe equipped with z-gradients in 20 mM HEPES pH 6.8, 140 mM NaCl, 1 mM EDTA, 2 mM DTT, 100 mM arginine, 100 mM glutamic acid, 0.02% NaN 3 , unless otherwise specified. XRCC4 1 H-15 N spectra were standard Bruker BEST-TROSY with phase-sensitive Echo/Antiecho gradient selection. SUMO 1 H-15 N spectra were standard Bruker sensitivity-enhanced, phase-sensitive HSQC spectra using Echo/Antiecho gradient selection. 1D 1 H spectra were recorded using excitation sculpting water suppression. The assignments for mSUMO1 and mSUMO2 were taken from BMRB entries 25576 and 6801, respectively, and temperature and buffer conditions were incremented from the conditions used in the assignments to those used for this study, to allow for the associated chemical shift changes. Assignment of 6×His-XRCC4 1-164 was carried out as described (14). Assignment of 6×His-XRCC4  was attempted by the same methods using a 1 H-15 N-13 Clabelled protein sample. The docked protein structures were based on PDB entries 1IK9 for XRCC4 (chain A) and 2D07 for SUMO2. In order to assess the reproducibility, the HADDOCK docking was repeated with one active residue omitted from each binding partner for each possible pair. The 63 docked structures generated, clustered into seven classes, and these seven clusters showed only two orientations of mSUMO2 relative to XRCC4. These two orientations were used to generate a model of 4×SUMO2 binding to XRCC4, adding two intervening copies of SUMO2 with no interactions, and placed arbitrarily except to ensure the continuity of the peptide chain. This model was then minimised and equilibrated by molecular dynamics using GROMACS (MD step used 5 ns in 2 fs steps using the AMBER99SB-ILDN forcefield and TIP3P water). For the interaction between the coiledcoil region of XRCC4 and SUMO2, the 'active' residues for XRCC4 were 164, 165, 166 and 167, with no 'passive' residues, and the same set of 'active' and 'passive' residues was used for SUMO2. For this complex, there were too few interacting residues to attempt the omission strategy. HAD-DOCK generated a single cluster for this complex. This complex and the most common (model 1) complex of the SIM56/SIM101 interaction were used to generate models of di-and tri-SUMO2 complexes, which span both binding sites. This was minimised and equilibrated using the same molecular dynamics strategy.

Surface plasmon resonance (SPR)
Experiments were performed on a ProteOn XPR36 instrument (BioRad Laboratories) using a running buffer containing 100 mM NaCl, 10 mM HEPES pH 7.

Biolayer interferometry (BLI)
BLI was performed using an Octet RED96 instrument (ForteBio). 50 g of recombinant 6×His-4×SUMO2-Strep were biotinylated using EZ-link NHS-PEG4-Biotin (Thermo Fisher) following the manufacturer's protocol. Excess biotin was removed using Zeba desalting spin columns (Thermo Fisher). 1 g of biotinylated 6×His-4×SUMO2-Strep was immobilised on streptavidin (SA) biosensors (ForteBio) until an approximately 1000 nm response was reached. The baseline was set by submerging 4×SUMO2captured sensors in kinetics buffer (PBS+0.02% Tween-20, 0.1% BSA, 0.05% NaN 3 ) in a 96-well plate, integrating an orbital shake function. The binding curves were obtained by dipping the sensors in 96-well plates containing the analytes diluted in kinetics buffer, or kinetics buffer only as reference. Finally, the sensors were dipped in fresh kinetics buffer for the dissociation step. Sensors were regenerated using 100 mM glycine pH 2.5 prior to reuse. Unloaded SA biosensors were used as controls and subtracted where unspecific binding was observed.

GFP immunoprecipitations
HEK293T cells transfected with the desired expression construct were washed with ice-cold PBS and scraped into lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 10% glycerol, 2 mM MgCl 2 , 10 mM N-ethylmaleimide) with 1× Complete EDTA-free protease inhibitors (Roche) and 6 l benzonase (Millipore) and rotated at RT for 15 min. Subsequently, the lysates were centrifuged at 16 000 g for 60 min and the supernatant bound to 25 l of GFP-Trap magnetic beads (Chromotek) for 1 h with end-over-end rotation at 4 • C. Protein-bound beads were then washed 5 times with lysis buffer and resuspended in 2× SDS Laemmli buffer (120 mM Tris-HCl pH 6.8, 4% SDS, 20% glycerol, 0.02% bromophenol blue and 2.5% ␤-mercatoethanol). For competition experiments the indicated amount of recombinant 6×His-4×SUMO2-Strep was added to the lysates prior to the incubation with the beads. 4% of input lysate were loaded unless stated otherwise.

GST precipitations
Pulldowns with cellular extracts were carried out as described for GFP-immunoprecipitations using 1-3 g of GST-fused proteins bound to glutathione magnetic beads (Promega). For immunoprecipitations of recombinant proteins, 1-3 g of GST-fused proteins bound to glutathione magnetic beads were resuspended in 600 l of binding buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 10% glycerol, 2 mM MgCl 2 , 10 mM N-ethylmaleimide) containing 2 g or equimolar concentrations of His-tagged recombinant proteins and incubated for 1 h with end-over-end rotation at 4 • C. Protein-bound beads were then washed 5 times with lysis buffer and resuspended in 2× SDS Laemmli buffer (120 mM Tris-HCl pH 6.8, 4% SDS, 20% glycerol, 0.02% bromophenol blue and 2.5% ␤-mercatoethanol). 4% of input lysate were loaded unless stated otherwise. Proteins were visualised with InstantBlue stain (Expedeon) or by immunoblotting.

Aggregation onset temperature (T agg ) determination
An Uncle (Unchanged Labs) platform was used to measure intensities of static light scattering (SLS) at 260 nm over a temperature ramp ranging from 20 • C to 95 • C. Light scattering of a 260 nm laser was monitored from samples at a concentration of 0.5 mg/ml, which were forced to aggregate by raising the temperature at 1 • C intervals. The aggregation onset temperatures (T agg 's) were calculated using Uncle software (version 2.0). SLS (260 nm) values were set to zero at the start of the temperature ramp.

Microscale thermophoresis (MST)
Recombinant XRCC4 was fluorescently labeled using the Monolith protein labeling kit RED-NHS (amine reactive) dye (NanoTemper Technologies), following the manufacturer's guidelines. Labeled XRCC4 was diluted in the appropriate buffer (PBS, 0.05% Tween-20) to a final concentration of 5 nM. Non-labeled recombinant 4×SUMO2 protein was serially diluted 1:1 in the same buffer, using 16 doubling dilutions from a starting concentration of 500 nM. Equal volumes of the labeled protein were mixed with each dilution, before sample loading into Monolith Pico premium capillaries (NanoTemper Technologies). Thermophoresis was performed at RT, using a Monolith PICO instrument controlled with NT Control software version 1.0.1 with 5, 30, 5 s laser off, on, off times, respectively, and 5% LED power and medium IR-laser (MST) power. Experiments were carried out in triplicate and analysed using the NT Affinity Analysis software version 2.0.2 and applying the K D model of fit. Data points displaying irregularities or where the MST/TRIC traces showed bleaching and/or artefacts from aggregation were defined as outliers, as previously described (22).

Proteome microarray screening retrieves polySUMO2 receptors with diverse gene ontologies
SUMO receptors have mainly been identified with yeasttwo-hybrid systems or affinity purification from whole cell extracts using SUMO topologies as baits combined with mass spectrometry (5,29,31,32,33,34,35). These techniques are limited by their propensity to identify indirect SUMO binders in addition to direct, binary receptors. Additionally, mass spectrometry-based approaches are restricted by the cell-tissue-specific proteomes used as starting materials and biased towards abundant proteins. A recent photo-crosslinking approach targeted binary receptors (direct SUMO binders as opposed to proteins indirectly associating with SUMO via SUMO-independent protein interactions with the direct binders), but was limited to the identification of receptors interacting with a specific region on SUMO2 (18). To overcome these restrictions, we systematically screened the human proteome for polySUMO2 receptors using microarrays containing duplicate protein spots for >21 000 proteins representing ∼15 000 unique fulllength human genes with one or more isoforms. To this end, we incubated the arrays with enzymatically linked poly-SUMO2 chains fluorescently labelled with Cy5, followed by fluorescence scanning and background subtraction using soluble Cy5 as a reference ( Figure 1A, Supplementary Table  S1). The screen identified a total of 258 unique binary poly-SUMO2 receptor candidates (Figure 1B; Supplementary  Table S2), featuring known SUMO2 receptors and numerous proteins with no previous SUMO-binding functions assigned to them. As expected, known/predicted SUMO receptors harboured components of the SUMO conjugation cascade, in addition to downstream receptors with no known SUMOylation roles ( Figure 1B). Moreover, SUMO receptors with known preferences for SUMO1 binding, e.g. RGS17, DPP9 and PARK2, did not score as hits, suggesting that our approach was able to distinguish between different SUMO paralogues.
SUMOylation is known for its importance in regulating processes inside the nucleus and in response to stress (1,9). Consistent with this notion, gene ontology analysis of the polySUMO2 hits resulted in a significant enrichment of biological processes linked to stress-induced transcription and DNA repair, in particular in response to hypoxia and DNA damage ( Figure 1C). These findings underpin the validity of our approach in identifying receptors in pathways associated with known SUMO functions, and in  contexts where the formation of polySUMO chains is triggered, an achievement previously unattained with proteome microarrays (36). Other significantly enriched ontologies included chromatin-associated processes such as remodelling and nucleosome assembly/disassembly, also linked to SUMO function (37) ( Figure 1C). Significant enrichment was also observed for processes less established for regulation by SUMOylation, such as cytoplasmic cytoskeletal organisation, consistent with SUMOylation emerging as an important regulatory layer in that arena (38), and tRNA aminoacylation. Finally, our analyses connected SUMO processes to phosphorylation ( Figure 1C), with cross-talk between different PTMs representing an emerging and exciting theme.

Validation of polySUMO2 receptors with diverse functions and binding characteristics
With a range of biological processes significantly enriched amongst the hits, we tested if the candidates were functionally interconnected. STRING network analysis revealed a significant enrichment of protein:protein interactions (en-richment P value=1.51×10 -12 ), and several gene clusters including a central hub of p53-associated processes (Figure 2A). Some, but not all, of these clusters were linked to SUMO-associated functions (known SUMO receptors highlighted with red borders in Figure 2A). For example, the coordinated functions of DAXX and USP7 to regulate p53 function (39) connected them to the central cluster, and PSMD4 linked to other proteasomal components not previously associated with SUMO binding. In this regard, both VCP (aka p97) and one of its interaction partners, NSFL1C, came up as hits. A different interaction partner of VCP, UFD1, functions as a SUMO receptor in yeast to help recruit the VCP complex to its targets (40), raising the possibility that the VCP complex takes on similar roles in humans. Other gene clusters formed by SUMO receptor candidates centred on nuclear functions including DNA repair, chromatin regulation and pre-mRNA splicing, consistent with enrichment of these pathways in our gene ontology analysis ( Figure 1C). Interestingly, several serine/threonine kinases (p21-activating kinases; PAKs) formed part of a gene cluster linking cytoplasmic cytoskeleton functions with nuclear signalling. Moreover, five aminoacyl tRNA synthetases formed a separate cluster, representative of the strong enrichment of tRNA-associated processes in the identified gene ontologies ( Figure 1C). While none of these proteins had previous SUMO receptor functions assigned to them, tRNA transcription is regulated by SUMOylation in response to stress in yeast (41). This raises the possibility that SUMOylation and SUMO receptor interactions, in tRNA-mediated protein translation, could help cells respond to stress, thereby offering a starting point to assess such functions mechanistically in the future.
To assess the validity of the identified receptor candidates, we performed SUMO-binding assays on a range of candidates using biolayer interferometry (BLI). Using tetraSUMO2 chains, we validated eight candidates distributed across a range of gene clusters and M-values (Figure 2B; Supplementary Table S1). These included XRCC4, a DNA repair protein; STMN1, a cytosolic protein regulated by p53 and recently hypothesised to function via SUMO binding (42); PAK3, a serine/threonine protein kinase; DARS and WARS, two aminoacyl-tRNA synthetases; RBBP5, a transcriptional regulator associated with histone methyltransferase complexes; SBB, a protein important for RNA metabolism (Figure 2A, genes with blue frames); and TCEAL6, a transcriptional elongation factor for which no putative SIMs were retrieved using the SIM prediction servers JASSA and GPS-SUMO (4,43). The validated receptors displayed distinct association and dissociation profiles, with some associating more stably with 4×SUMO2 than others ( Figure 2B, compare e.g. WARS with TCEAL6). The presence of at least one validated SUMO2 receptor in every identified gene cluster emphasises the validity of our screening approach. Moreover, the findings demonstrate the ability of our setup to identify receptors with distinct binding characteristics, featuring diverse biological functions linked to established and hitherto undiscovered aspects of SUMO binding and functionality.

XRCC4 preferentially binds polySUMO2/3 chains
To further define the SUMO-binding characteristics of one of the receptors arising from our screen, we selected the core NHEJ factor XRCC4 for follow-on studies (M-values ∼6; Figure 1B and Supplementary Table S1). SUMOylation is crucial for efficient NHEJ (44,45,46). Despite this importance, knowledge of SUMO receptor roles for core NHEJ factors has long been lacking and only recently started to come to light during the course of this study. XRCC4 is a 38 kDa protein mainly existing as a homodimer in cells and featuring a number of functionally and structurally distinct domains (25). Its N-terminal head domain is important for interaction with another NHEJ core factor, XLF, followed by a coiled-coil domain that mediates binding with the NHEJ ligating enzyme LIG4 or the nucleoskeleton protein IFFO1 (25,47,48). A flexible C-terminal tail contains various phosphorylation sites important for mediating interactions with further NHEJ-associated DNA repair factors (49) and for removal of XRCC4 from chromatin (50) ( Figure 3A). NHEJ is initiated in response to DSBs with Ku, a heterodimer formed by Ku70 and Ku80, recognising and binding broken DNA ends. Amongst several functions, DNA-bound Ku acts as a recruitment hub for other NHEJ factors e.g. the DNA damage response kinase DNA-PKcs, which together with Ku forms the holoenzyme DNA-PK ( Figure 3B). DNA-PK itself can recruit other factors important for preparing the broken DNA ends for ligation (51). XRCC4 is key for facilitating distinct aspects of NHEJ. By interacting with XLF, XRCC4 is implicated in tethering DNA ends and thus, in promoting synapsis and subsequent DNA-end ligation, although the importance of this seems to depend on cellular context (52,53,54,55,56,57). Moreover, interaction with XRCC4 stabilises LIG4 and promotes its enzymatic activity, thereby facilitating the final NHEJ ligation step ( Figure 3B) (58).
Given that different topologies of SUMO are associated with distinct functions (59), we first assessed if XRCC4 displayed preferential binding to different SUMO topologies. Surface plasmon resonance (SPR) assays revealed preferential binding of XRCC4 to enzymatically linked poly-SUMO2 chains ( Figure 3C, left) over SUMO1/2 monomers (mSUMO1/2) and SUMO2 dimers (diSUMO2; Figure 3C, right). In agreement with the high sequence identity between SUMO2 and SUMO3, XRCC4 bound polySUMO2 and polySUMO3 chains in a similar manner ( Figure 3D). It is noteworthy that the heterogeneous nature of these chains (3-8 SUMO moieties at different quantities) precluded calculation of their K D dissociation constants in molarity. Using 4×SUMO2 chains in NanoTemper microscale thermophoresis (MST) experiments, we retrieved a K D of 3.2±0.7 nM ( Figure 3E), which puts this interaction amongst the strongest known SUMO:receptor interactions (60). In this regard, we note that the strength of the XRCC4:4×SUMO2 interaction is specific for 4×SUMO2, with much weaker affinities applying to SUMO2 monomers (see NMR titrations below). These findings could be explained by local concentration effects or by the existence of multiple SIMs on XRCC4 that bind to distinct SUMO moieties of the same chain with increased avidity.     Since SUMO belongs to the ubiquitin/UBL family, we next investigated if XRCC4 showed a preference for binding to SUMO over ubiquitin topologies. We detected no/minor interactions of XRCC4 to a wide range of ubiquitin topologies ( Figure 3F). Next, we performed tetraSUMO2 chain (4×SUMO2) co-precipitations with recombinant XRCC4, demonstrating that the two proteins can also interact in solution ( Figure 3G). Additionally, XRCC4 bound to SUMO2 in cells; similar to the binding characteristics we established in vitro, ectopically expressed XRCC4 preferentially co-precipitated from cellular extracts with poly-SUMO2 chains over SUMO2 monomers (mSUMO2), irrespective of the presence or absence of ionizing radiation (IR)-induced DNA damage ( Figure 3H). The interaction was also detectable with endogenous XRCC4 ( Figure 3I). Overall, these findings demonstrate that XRCC4 can preferentially bind polySUMO2 chains in vitro and in cells, and in the absence or presence of DNA damage.

XRCC4 lacks functional consensus SIMs
To characterise if and how the positioning of SUMObinding regions on XRCC4 related to the structure and function of XRCC4's known domains ( Figure 3A), we next investigated if XRCC4 contained conventional SIM sequences, which feature a core of 4-5 hydrophobic residues that can be intersected or framed by negatively charged residues on one or both sides (Supplementary Figure S1a) (5,6,33). Indeed, JASSA and GPS-SUMO predicted five putative SIMs (pSIMs) in XRCC4: three located in its head domain (pSIM8, pSIM33 and pSIM123), one in its coiledcoil region (pSIM181), and one in the C-terminal part of the protein (pSIM257; Supplementary Figure S1b). However, the crystal structures of XRCC4 (25,61,62) suggested that pSIM8, pSIM33, and pSIM123 are important for the structural integrity of the XRCC4 head domain by forming extensive interactions with nearby XRCC4 residues. Moreover, pSIM33 is almost completely buried inside the XRCC4 head domain, rendering this motif inaccessible to surface interactions (61) (Supplementary Figure S1b), as also supported by hydrogen exchange experiments, showing that pSIM33 is inaccessible to solvent, and stably so for several days (14). Consistent with these realisations, mutation of pSIM8, pSIM33, or pSIM123 residues to alanines abolished XRCC4 interactions with polySUMO2/3 (Supplementary Figure S1c), in agreement with recently published work on pSIM33 (35). However, interaction with XLF (Supplementary Figure S1d), which binds to a distinct region on XRCC4's head domain (26,52,63,64) (compare Supplementary Figure S1b to Figure 3A), was also abrogated, suggesting that these mutations seriously affect the structural stability of the N-terminal domain. In line with this notion, the pSIM mutants started aggregating at markedly lower temperatures than the WT in response to thermal unfolding (>20 • C lower for pSIM8; >30 • C lower for pSIM33 and pSIM123), confirming the strong destabilising impact of the mutations on the XRCC4 fold (Supplementary Figure S1e). The stronger effects of pSIM33 and pSIM123 over pSIM8 are consistent with their more enhanced disruption of SUMO2 and XLF binding (Supplementary Figure S1c,d). By contrast, mutation of pSIM181 residues to alanines neither affected polySUMO2/3 nor XLF binding of XRCC4 (Supplementary Figure S1c,d).
In addition, deletion of XRCC4's C-terminal tail containing pSIM257 did not markedly affect polySUMO2 binding of XRCC4 (Supplementary Figure S1f). We conclude that XRCC4 most likely interacts with SUMO2/3 via one or multiple non-conventional, hitherto unidentified binding module(s).

Carbene footprinting determines distinct SUMO2-binding regions on XRCC4
In light of the absence of conventional SIMs in XRCC4, we employed a recently developed structural mass spectrometry approach, known as carbene footprinting (12,13,65), to map SUMO2 interaction regions along XRCC4 in an unbiased manner. The carbene footprinting methodology utilises covalent labelling of surface-exposed protein residues with a highly reactive carbene species, formed by photolysis of the corresponding diazirine. Labelling of the protein-of-interest both individually and with a binding partner, followed by proteolytic digestion and LC-MS analysis, allows differential labelling of the resulting peptides to be monitored. Reduced peptide labelling in the presence of a binding partner, indicates the residues of the peptide as potential binding sites due to surface masking. In addition, unmasking of peptide labelling can occur, and both masking and unmasking can further indicate interaction-induced conformational changes leading to a change in exposure of residues to solvent, and therefore labelling (12,13,65) (Figure 4A).
Carbene footprinting of XRCC4 in the presence of mSUMO2 revealed multiple XRCC4 tryptic peptides as potential SUMO2-interacting sites. Significant masking was detected in XRCC4's head domain for three peptides between residues 66-115, in the coiled-coil region (170-178 peptide; the preceding peptide was not detected in our analysis), and in the C-terminal tail in/around the 286-296 region. Significant unmasking occurred in the 8-26 region, an area on the head domain spatially proximal to the N-terminal part of the coiled-coil ( Figure 4B, Supplementary Table S3). XRCC4 truncations lacking up to 121 amino acid residues of the C-terminus were precipitated by GST-4×SUMO2 with similar levels compared to fulllength XRCC4 (Figure 4C), in agreement with the comparable polySUMO2 profiles we measured in SPR equilibrium analyses (Supplementary Figure S1f). These findings suggest that the masking of the 286-296 region was due to structural rearrangements of XRCC4's C-terminus rather than its direct involvement in SUMO binding. A 1-164 XRCC4 truncation (XRCC4 1-164 ) retained substantial polySUMO2 binding, consistent with XRCC4's head domain contributing to SUMO binding ( Figure 4D). In addition to the head domain, increased precipitation of XRCC4 1-180 by GST-4×SUMO2 compared to XRCC4  confirmed the presence of residues important for SUMO binding in the coiled-coil ( Figure 4E). Taken together, these findings point towards SUMO-interacting regions on XRCC4 in its head and coiled-coil domains, with potential allosteric changes occurring at/around positions 8-26 and in the flexible C-terminus (286-296 positions). To increase  [ P e p t id the resolution of the carbene footprinting approach in the head domain, we repeated the analyses with XRCC4 1-164 , which retrieved an extended set of labelled peptides ( Figure  4F, Supplementary Table S3). Overall, the experiment consolidated the effects we observed with full-length XRCC4, and strengthened our conclusion of there being at least two potential distinct SUMO-binding regions on the head domain localised to/around residues 66-71 and 103-107.

Conserved non-conventional and paralogue-selective SUMObinding module on XRCC4 head domain
Having narrowed down potential regions of SUMO binding to distinct and defined parts of XRCC4, we next performed NMR titrations of targeted XRCC4 truncations to map SUMO interactions at an increased--amino acid-level--resolution. To this end, we first established XRCC4 1-164 as the largest head domain-containing XRCC4 construct amenable for NMR analysis (14). Two amino acid residue stretches showed marked intensity loss in the 1 H-15 N BEST-TROSY spectra, consistent with specific interaction between mSUMO2 and XRCC4 via discrete binding sites. Residues 101-LKDVSFRLGSF-111 displayed the strongest effects, followed by residues 56-ADDMA-60 (henceforth termed SIM101 and SIM56, respectively), with both regions forming coherent and spatially proximal surface sites on XRCC4's head domain ( Figure 5A; Supplementary Figure S2a,b), with a roughly estimated K D in the high M to low mM range. Equivalent experiments with diSUMO2 and 4×SUMO2 highlighted the same amino acid stretches ( Figure 5B, Supplementary Figure S3a,b), further consolidating these regions as SUMO-binding surfaces on XRCC4. Additional affected residues likely reflect more extended contact regions due to the larger volumes occupied by the di-/4×SUMO2 topologies compared to mSUMO2. These findings confirmed our carbene footprinting results, highlighting peptide-level covalent fractional modification as a valuable technology for narrowing down interaction regions in an unbiased manner when little pre-existing information is available. In contrast to mSUMO2, addition of mSUMO1 resulted in fewer, different and substantially less pronounced changes in the XRCC4 1-164 1 H-15 N BEST-TROSY spectra at the same equimolar titration ratios as those of mSUMO2. Most of the residues in the SIM56 and SIM101 regions were unaffected by the presence of mSUMO1, and only showed minor chemical shift perturbations rather than the pronounced intensity changes induced by mSUMO2 (Supplementary Fig S3c). Similarly, addition of XRCC4 1-164 led to fewer, distinct and unsubstantial changes in the mSUMO1 1 H-15 N BEST-TROSY spectra at the same equimolar titration ratios compared to mSUMO2 (Supplementary Figure  S3d). Similar results were obtained for full-length XRCC4 (data not shown). The affected residues did not form discrete or coherent surface regions on XRCC4 (Supplementary Figure S3d). Together, these data demonstrated that binding of mSUMO1 to XRCC4 is substantially weaker than that of mSUMO2 and is non-specific in nature for XRCC4. In agreement with these findings, XRCC4 was preferentially co-precipitated by GST-mSUMO2 over GST-mSUMO1 in pulldown experiments ( Supplementary Fig-ure S3e). Collectively, these data demonstrate SUMO2 paralogue selectivity of XRCC4 for a specific site and on a protein-wide level. The surface implicated in SUMO2 binding is negatively charged, with a positively charged patch on one side (Supplementary Figure S3f), differentiating this surface from that of established SIM classes (4,59). Consistent with our NMR analyses, individual or combined mutation of SIM56 and SIM101 abrogated binding of XRCC4 1-164 to 4×SUMO2 ( Figure 5C). In contrast to the pSIM mutants, mutation of SIM56 and SIM101 did not have marked destabilising effects on the overall structural integrity of the mutated proteins, as indicated by their proton NMR spectra (Supplementary Figure S4a) and aggregation onset temperature determination (Supplementary Figure S4b). Given their close spatial proximity, SIM56 and SIM101 likely synergise to form a split SIM that facilitates SUMO2 binding via a non-conventional paralogue-selective SUMO2binding module on XRCC4's head domain.
SUMO receptors can bind to different surfaces on SUMO1 and SUMO2 (5,59). To analyse how XRCC4targeted SUMO2 surfaces correlate to the ones bound by other SUMO receptors, we compared the 1 H-15 N HSQC spectra of mSUMO2 in the absence and presence of increasing concentrations of XRCC4 1-164 . This revealed binding of XRCC4's head to the ␤ 2 /␣ 1 -groove on SUMO2, which is also targeted by other known SUMO receptors such as PIAS2 (5,6) ( Figure 5D, Supplementary Figure  S5a, b). Notably, several SUMO2 residues, affected by XRCC4 binding are not conserved in SUMO1, leading to alterations in charge (e.g. R36 in SUMO2 versus M40 in SUMO1) as well as in size (e.g. A23 in SUMO2 versus I27 in SUMO1; Supplementary Figure S5c), and resulting in charge distribution changes across the corresponding SUMO2/SUMO1 surfaces. For example, the relatively weak and evenly distributed positive charges on the XRCC4-bound SUMO2 surface (Supplementary Figure  S5d) match the homogenously distributed negative charges on the reciprocal XRCC4 surface (Supplementary Figure  S3f). By contrast, the equivalent SUMO1 surface possesses a strongly positively charged patch on one side (Supplementary Figure S5d), which could explain the unfavoured binding of SUMO1 to this XRCC4 region. Collectively, these analyses suggest that SUMO2 binding to XRCC4 is stabilised by ionic interactions that help achieve paralogue selectivity.
Given the unusual nature of the identified SUMO2:XRCC4 binding module, we performed photocrosslinking experiments to further probe the binding surfaces. Recombinant XRCC4 incorporating the unnatural photo-crosslinkable amino acid para-benzoylphenylalanine (BpF) (16) at R107, proximal to SIM101, specifically crosslinked to the same ␤ 2 /␣ 1 -groove of 4×SUMO2 after UV exposure ( Figure 5E, Supplementary Figure S6a). Vice versa, various SUMO2 topologies integrating BpF at R50 (18), proximal to the identified XRCC4-binding region, also crosslinked specifically to SIM101 of XRCC4 ( Figure 5F, Supplementary Figure  S6b). Together, these findings consolidate the nonconventional nature of the identified SUMO-binding module on XRCC4's head domain. Finally, to test if this SUMO-binding module is conserved in other proteins, we screened the human proteome for motifs similar to SIM101 (K-[SDE]-[VLI]-[DES]-[FVLI]), identifying STMN1 as a potential candidate, which we had previously validated as a SUMO receptor in our screen ( Figure 2B). Mutation of the XRCC4like SIM101 residues (43-KDLSL-47, SIM43) to alanines markedly decreased binding to 4×SUMO2 ( Figure 5G), confirming the conservation of XRCC4-like SIMs in other proteins. Moreover, amongst the >1000 proteins containing such sequences, known SUMO receptors were enriched, particularly when the XRCC4 SIM101-like residues were surrounded by acidic residues, suggesting the presence of supporting acidic residues for some of these SIMs, reminiscent of the acidic residues in SIM56 and their auxiliary role for XRCC4 SIM101.

SUMO2 interaction of XRCC4's head domain is incompatible with XLF binding
Because the SUMO2 interaction surface on the head domain of XRCC4 overlaps with XRCC4 binding to another NHEJ core factor, XLF ( Figure 6A), we next investigated if SUMO2 and XLF binding were compatible. To help visualise the complex formation indicated by the NMR intensity losses, these losses were used to generate interaction restraints for molecular docking, using HADDOCK (15). The resulting structural models clustered into two orientations, both with several surface-exposed hydrophobic residues (L101, V104 and F106) of SIM101 located in the ␤6-␤7 hairpin of XRCC4 stacked up with a hydrophobic patch centred on F32 on the reciprocal SUMO2 surface. The two orientations showed different stabilisation mediated by ionic interactions e.g. D103 (XRCC4) and K42 (SUMO2) in model 1, and D57 (XRCC4) with K42 (SUMO2) in model 2 ( Figure 6B, Supplementary Figure  S7a, Supplementary Table S4). Although not a truly rigid body docking, no substantive changes in the domains' architectures resulted from the HADDOCK forcefield. The production of two orientations by the docking procedure may be a result of the limited number of restraints, or might indicate that both interactions take place in solution. The latter scenario is compatible with two SUMO2 moieties from the same polySUMO chain interacting with each head domain of the XRCC4 dimer, but in different orientations. We generated a structural model on this basis, and equilibrated it using molecular dynamics calculations via GROMACS (5 ns, 2 fs steps). Notably, no longlived interactions were present in the simulations between the intervening SUMO2 moieties and XRCC4, consistent with the lack of NMR perturbations outside the major interaction surface formed by SIM56/SIM101. A 4×SUMO2 complex of this nature is consistent with the increased affinity of 4×SUMO2 for XRCC4, relative to mSUMO2. Both the mSUMO2 and 4×SUMO2 complexes are inconsistent with a tertiary complex including XLF ( Figure 6C), which suggests competition between these proteins for XRCC4. Moreover, mutation of SIM56 and/or SIM101 abrogated XRCC4 binding to both SUMO2 ( Figure 5C) and XLF ( Figure 6D), similar to a known XLF binding-deficient mutant (53) (2KE: K65 and K99 mutated to glutamic acids; Figure 6D,E), without majorly affecting the overall structural integrity of the mutant, as indicated by its proton NMR spectrum (Supplementary Figure S7b) and aggregation onset temperature determination ( Supplementary Figure S7c). In line with these findings, 4×SUMO2 showed a trend of competing with XRCC4-XLF interaction when added to whole cell extracts at 1 M concentration (=25 g) prior to GFP-XRCC4 pulldowns ( Figure 6F). Altogether, these findings support a SUMO2-binding model that is incompatible with simultaneous binding of XLF to XRCC4.

SUMO2 binding to XRCC4's coiled-coil is incompatible with LIG4 interaction
In addition to SUMO2 binding to XRCC4's head domain, our carbene footprinting and truncation studies pointed towards a SUMO2-binding region in the XRCC4 coiledcoil located in/around the 170-178 region. Having successfully assigned the majority of residues in XRCC4 1-164 , we extended our NMR analyses to XRCC4  . Taking the XRCC4 1-164 assignment as a basis we were able to assign the majority of the additional 16 residues and detected intensity loss and/or perturbation shifts after addition of mSUMO2 for E163, S167 and A168 ( Figure 7A, B), and possibly K164, C165 and V166, which could not be assigned, but were positioned in the centre of the affected residues ( Figure 7A). Indeed, mutation of these residues (SIM163) to alanines reduced SUMO binding to XRCC4, albeit to a lesser extent than mutation of SIM101, with double mutation of SIM163 and SIM101 completely abrogating SUMO binding ( Figure 7C), consistent with the carbene footprinting data. In contrast to SIM101 mutation, mutating SIM163 did not affect XLF binding, and LIG4 binding was unaffected in all SIM101/SIM163 single and double mutants (Supplementary Figure S7d). Interestingly, the interaction surface on SUMO2 did not markedly change based on intensity losses in the 1 H-15 N HSQC spectra of mSUMO2 after addition of full-length XRCC4 compared to XRCC4 1-164 (compare Figure 7D and Supplementary Figure S7e to Figure 5D and Supplementary Figure  S5a). We conclude that SIM56/SIM101 and SIM163 on the coiled-coil bind to comparable interaction surfaces on SUMO2 with similar surface charge profiles (compare Figure 7E to Supplementary Figure S5d). The presence of multiple SUMO-binding regions on XRCC4 opens the possibility for polySUMO2 chains not only crossing XRCC4's head domain but spanning its head and coiled-coil domains, with a minimum of three individual SUMO moieties required for non-distortional binding ( Figure 7F), providing an explanation for preferential binding of XRCC4 to longer SUMO topologies.
Strikingly, the affected coiled-coil region overlaps with XRCC4 binding to two other proteins important for NHEJ: LIG4 and IFFO1 ( Figure 7G). Our model for SUMObinding to XRCC4 SIM163 suggests that SUMO interaction in the coiled-coil region of XRCC4 is incompatible with XRCC4 interactions with LIG4 and IFFO1. To test this hypothesis, we performed GST-4×SUMO2 pulldown assays, demonstrating that XRCC4, but not LIG4, could be co-precipitated from whole cell extracts ( Figure 7H), while . Error bars represent 1 standard deviation from the plotted value, as calculated from the noise levels in the TROSY spectra using the standard error propagation formula. Grey bars indicate residues with too low intensity for reliable measurement (arbitrarily set to 1); lack of bars represents unassigned residues. Red and blue bars indicate assigned residues in XRCC4 1-180 and levels of intensity ratio loss of more or less than 30%, respectively. 4750 Nucleic Acids Research, 2022, Vol. 50, No. 8 LIG4 was successfully co-precipitated from whole cell extracts in GFP-XRCC4 pulldowns ( Figure 7I). Collectively, these data indicate that LIG4 and polySUMO2 binding to XRCC4 are incompatible and similar principles likely apply to IFFO1 binding. This does not necessarily mean that 4×SUMO2 prevents XRCC4 binding to LIG4 or IFFO1. Instead, or in addition, 4×SUMO2 binding to XRCC4 may affect functions of XRCC4 that are not related to LIG4-/IFFO1-XRCC4 complexes, with a fraction of XRCC4 having been shown to exist in cells without for example being bound to LIG4 (10).

XRCC4 SIM101 plays a role in non-homologous end-joining
To assess if the SUMO:XRCC4 interaction plays a role in NHEJ, we used a recently established GFP reporter platform (EJ7-GFP) that measures the efficiency of XRCC4dependent distal end-joining of broken DSB ends without indels ( Figure 8A) (66). To do so, we initially measured NHEJ efficiencies in HEK293 XRCC4 KO cells stably integrating the EJ7-GFP reporter. We complemented the cells with XRCC4--WT, SIM101 or SIM163, singly or doubly mutated. All XRCC4 SIM mutants fully complemented the NHEJ efficiencies to WT levels, indicating potential redundancies of the assessed SIMs (data not shown). Given that the SIM101 region also mediates binding to XLF, an NHEJ core factor known for its redundancy in NHEJ, we wanted to test if any of the XRCC4 SIMs were redundant with certain aspects of XLF function. To assess this, we generated a HEK293 XRCC4/XLF double KO cell line, stably integrating the EJ7-GFP reporter system ( Supplementary Figure S7f). While the immunoblot signals from each of the mutants were slightly reduced compared to XRCC4 WT, each of the mutants was detected at similar levels ( Figure  8B). By contrast, the mutants were distinct in their abilities to promote NHEJ. Complementation with WT XLF and XRCC4--WT, SIM101 or SIM163, singly or doubly mutated--reconstituted NHEJ efficiencies to similar levels ( Figure 8C), confirming our initial observations. By contrast, complementation with XRCC4--SIM101 singly or SIM101/SIM163 doubly (D) mutated--together with XLF L115D, a well-established XLF mutant unable to bind XRCC4, and diminished in its XRCC4/LIG4-stimulating capacity (53,67), was significantly less efficient (∼2-fold) in reconstituting NHEJ efficiency ( Figure 8C). Interestingly, XRCC4, singly mutated for SIM163, did not show any reductions in NHEJ efficiency in these settings ( Figure 8C). Collectively, these findings indicate that SIM101, but not SIM163, is important for promoting NHEJ in circumstances when certain aspects of XLF are compromised.

DISCUSSION
SUMOylations affect thousands of proteins regulating a gamut of cellular processes, but only tens of SUMO receptors decoding these SUMOylations have been validated. Using human proteome microarrays with fluorescently labelled SUMO topologies, we uncover >200 new binary polySUMO2/3 receptor candidates and validate a substantial fraction of them, markedly extending our known hu-man SUMO receptor pool. Given the involvement of the identified receptors in diverse cellular pathways, these results serve as a platform for breaking new ground in SUMO biology. Indeed, numerous further opportunities now await exploration to uncover mechanisms underlying established as well as under-studied areas of SUMO biology, including epigenetics, pre-mRNA splicing, transcriptional regulation, DNA repair, cytoskeleton organisation and protein synthesis in health and disease. Our screening pipeline is widely applicable to ubiquitin and other UBLs. By demonstrating its utility for paralogue-and topology-selective receptor discovery for polySUMO2, we provide a paradigm for narrowing down ubiquitin/UBL-binding regions for proteins lacking conventional binding modules, using a recently developed structural mass spectrometry technique--carbene footprinting--that can be applied independently of protein size, quantity of starting material, and interaction affinity, factors commonly limiting high-resolution analytical methodologies (12,13,65).
By combining carbene footprinting and mutational and structural analyses with genetic code expansion, we characterise XRCC4 as the first core NHEJ factor containing non-conventional SUMO-binding regions, revealing two distinct SUMO-binding modules along its sequence. NHEJ has long remained under-studied for its regulation by SUMOylation compared to other DSB and DNA repair pathways (9). Only recently--during the course of our work--has another study explored XRCC4 as a SUMO receptor focussing on pSIM33 and its neighbouring W43 as the relevant SUMO-binding region. The authors propose that disrupted SUMO binding contributes to the pathogenesis of the XRCC4 W43R patient mutation (35). However, W43 and L36 along with several other residues make up the hydrophobic core of XRCC4's head domain, consistent with extensive hydrogen exchange protection in this area (14). As such, these residues are surface-inaccessible (14) and critical for XRCC4's structural integrity and stability, consistent with the markedly decreased aggregation onset temperature we observed for the pSIM33 mutant (Supplementary Figure S1e). Indeed, XRCC4 protein levels are substantially reduced in W43R patient cells (68), making this scenario and the direct relevance of this region for SUMO binding highly unlikely.
Intriguingly, the XRCC4 SUMO-binding module we identified on its head domain features paralogue selectivity for SUMO2/3 over SUMO1, and as a whole, XRCC4 preferentially bound to polySUMO2 chains over shorter topologies. While SUMO binding relies on a hydrophobic patch in line with conventional SIMs, a positive charge at the core of the SUMO-binding module, K102, represents an unprecedented characteristic for this binding. Given the importance of acidic residues for mediating interactions with SUMO1 (5), this positive charge along with additional characteristics likely contributes to the paralogue selectivity we observed.
Despite the surprising features of XRCC4's SUMObinding surface, the reciprocal region on SUMO2 was similar to the one targeted by other SUMO receptors, albeit with different nuances in the precise residue involvement. These data highlight the versatility of SUMO to utilise the same region for interactions with a wide range of distinct SUMO-binding modules.
Notably, XRCC4-like SIM features are enriched in known SUMO receptors and can be detected in a large number of proteins overall. Indeed, we showed XRCC4 SIM101-like SUMO binding in another SUMO receptor identified in our screen, raising the possibility that this type of binding module contributes to SUMO decoding across wide and diverse areas in cell biology. Together with validating a receptor lacking both conventional and XRCC4like SIMs--TCEAL6--our results suggest an unanticipated spectral plasticity of SUMO-binding modules that has remained undiscovered, and for which our screen of binary SUMO receptors now provides a rich resource.
Mechanistically, the two SUMO-binding regions on XRCC4 overlap or involve common features with other XRCC4-interaction regions important for NHEJ. The use of common interaction sites has emerged as an intriguing concept to make efficient use of a limited number of bind-ing sites available on NHEJ core factors to provide functional redundancy. In line with this notion, we find that XRCC4's SIM101 becomes important for promoting NHEJ when certain features of XLF are impaired. How exactly SIM101 contributes to NHEJ under these circumstances, perhaps by stabilising certain NHEJ complexes, their conformations and/or their activity, will be interesting topics to address in future investigations. In this regard, it is noteworthy that only one of the SUMO-binding modules, SIM101, but not SIM163, participated in this function, raising the possibility that different XRCC4 SIMs and/or their combinations might be implicated in distinct aspects of NHEJ regulation and/or associated pathways, perhaps by recognising different SUMOylated substrates. In such circumstances, the identified interaction sites could lead to diversity that can help cells deal with different types of DNA damage arising in distinct chromatin contexts and with varying sets and/or levels of functional repair factors. Such scenarios could apply to, and be relevant for, different tissues and de-velopmental stages, as well as in different cancer settings due to differential regulation of NHEJ factors, their downregulation and/or their dysfunctioning.
Linking common binding sites to recognition of SUMOylations occurring in a spatiotemporally regulated manner such as in response to DNA damage (9), could help cells coordinate the use of common binding regions in an optimal manner, enabling them to target distinct repair complexes to the most appropriate types of DSBs in varying chromatin environments and at different repair stages, while preventing harmful competition between them. Our findings provide possible future avenues for exploring the mechanistic basis of these processes, which remain a puzzling phenomenon in NHEJ. Another possibility is that XRCC4 interactions with SUMO negatively regulate NHEJ by disrupting or preventing interactions with XLF, LIG4 and IFFO1. Such a mechanism could complement the recently demonstrated NHEJ barrier mediated by RIG1, which blocks XRCC4 interactions with XLF and LIG4 to counteract viral integration (69). Given the known involvement of SUMO in viral defence mechanisms, our work provides an attractive lead to explore such regulatory layers in future investigations. Alternatively, disruption of XRCC4 complexes after completion of repair could be important for finalizing NHEJ, in analogy to the release of Ku after repair has taken place (70,71,72), and in that way contribute to other mechanisms negatively regulating XRCC4 interactions (50,73).
Finally, we note that our work may have medical applications because targeting DDR and ubiquitin/UBL system components can be exploited to treat cancer. Indeed, targeting NHEJ at the level of XRCC4 interactions represents an attractive approach to sensitise cancer cells, commonly displaying cryptic DNA repair pathway defects including NHEJ, via synthetic lethality and/or other mechanisms (74). Similarly, given the importance of DSB repair pathway choice for determining CRISPR-Cas9 genome editing outcomes, targeting specific XRCC4 interactions important for NHEJ may also be relevant for increasing the efficiency of precise gene editing processes relying on homologydependent repair.

DATA AVAILABILITY
Assignments of XRCC4 1-180 residues 167-180 are deposited with BMRB code 50742. PDB files for the different XRCC4:SUMO models are available as supplementary files. Other original data are available upon request.