FuzPred: a web server for the sequence-based prediction of the context-dependent binding modes of proteins

Abstract Proteins form complex interactions in the cellular environment to carry out their functions. They exhibit a wide range of binding modes depending on the cellular conditions, which result in a variety of ordered or disordered assemblies. To help rationalise the binding behavior of proteins, the FuzPred server predicts their sequence-based binding modes without specifying their binding partners. The binding mode defines whether the bound state is formed through a disorder-to-order transition resulting in a well-defined conformation, or through a disorder-to-disorder transition where the binding partners remain conformationally heterogeneous. To account for the context-dependent nature of the binding modes, the FuzPred method also estimates the multiplicity of binding modes, the likelihood of sampling multiple binding modes. Protein regions with a high multiplicity of binding modes may serve as regulatory sites or hot-spots for structural transitions in the assembly. To facilitate the interpretation of the predictions, protein regions with different interaction behaviors can be visualised on protein structures generated by AlphaFold. The FuzPred web server (https://fuzpred.bio.unipd.it) thus offers insights into the structural and dynamical changes of proteins upon interactions and contributes to development of structure-function relationships under a variety of cellular conditions.


INTRODUCTION
After the initial discovery of complexes formed by intrinsically disorder ed proteins, ther e is a r ecognition that proteins can sample a wide range of states in their bound forms, ranging from ordered to disordered assemblies ( Figure 1 ) ( 1 ). Although disor dered comple xes were initially considered as non-specific or non-functional, ample e xperimental e vidence based on biophysics, structure anal ysis, functional m utagenesis demonstra tes tha t protein regions that remain conformationally heterogeneous in their specific complexes contribute to a wide-range of biological activities ( 2 , 3 ). Advances in structure determination techniques, in particular solution and single molecule methods, are enabling the characterisation of the conformational heterogeneity of disordered complexes ( Figure 1 ) ( 4 ).
It is also becoming evident that different binding modes are associated with distinct biological functions. Disorder ed r egions that undergo disor der-to-or der transitions and adopt well-defined structures upon binding usually serve as recognition elements, which can be identified based on transient conformations in their unbound forms. For example, the tumor suppressor p53 binds to Mdm2 ubiquitin ligase through a short ␣-helical segment, which can also be observed in solution ( Figure 1 ) ( 5 , 6 ). Protein regions that remain to be heterogeneous in the bound state usually coordina te dif ferent activities or pa thway components as well as organise higher-or der assemb lies ( 7 ). The forma tion and regula tion of dif ferent kinds of higher-order pr otein structures, fr om amyloid fibrils to signaling assemblies and liquid-like condensates, are all associated with conformationall y hetero geneous or fuzzy regions ( 7 ). For example, in the assembly of the AIM2 inflammasome, the linker region between the PYD and CARD domains serves as a switch to expose these domains for intermolecular interactions ( 8 ). The disor der-to-disor der binding mode provides a key contribution to protein phase separation ( 9 , 10 ).
The complexity of interactions is underscored by sampling multiplicity of binding modes under different cellular conditions. In particular, protein regions can be induced to adopt an order ed structur e upon binding, while can also remain heterogeneous under different conditions ( Figure 1 ). Actin polymerisation, for example, is assisted by WH2 domains, w hich remains partl y disordered upon interactions ( 11 ). WH2 domains are anchored by a single salt bridge, the stability of which is modulated by ionic strength. High ion concentrations weaken the charge interactions and increase mobility, leading to the departure of WH2 domains and elongation of the actin chain ( 12 ). At low ionic strength, in contrast, the salt bridge stabilises the complex with the actin monomer leading to sequestr ation. Structur al examples r epr esenting differ ent binding modes can be found in the Protein Data Bank (PDB) ( 13 ).
In this article, we describe the FuzPred web server ( https: //fuzpred.bio.unipd.it ), which pro vides tw o k ey sequencebased predictions concerning the interaction behavior of proteins ( Figure 1 ): (i) the probability to undergo disorderto-order transition or disorder-to-disorder transitions and (ii) the likelihood of sampling a multiplicity of binding modes. The w e b server thus provides insights into the spectrum of interactions underlying the complex cellular behaviors of proteins. W200 Nucleic Acids Research, 2023, Vol. 51, Web Server issue

Local sequence complexity determines the binding mode
Analysis of over 2000 specific protein complexes showed that the degree of order upon binding is weakly correlated to the presence of secondary structure elements ( 13 ). Importantly, different binding modes are associated with distinct contact patterns. The disor der-to-or der binding mode is associated with well-defined contacts, formed by residues with physico-chemical features distinct from their flanking residues ( 13 ). In contrast, disor der-to-disor der binding modes are characterised by heterogeneous, alternati v e contact patterns ( 14 ), formed by a set of residues that can establish chemically similar interactions. For example, highly polar and charged transcription factors can bind in shallow, hydrophobic clefts of their transactivators via multiple configurations, anchored by a few hydrophobic residues ( 15 ). The KIX domain of CREB-binding protein (CBP), for example, interacts with the kinase inducible domain (KID) of the cAMP response element binding protein CREB as well as in the interactions of cMyb transcription factor in disor der-to-disor der binding mode ( 16 ). In general, the binding modes reflect the entropy change upon binding, in particular disor der-to-or der binding mode corresponds to decr easing entrop y upon interactions. Ther efor e, the binding modes are applicable to both structured and disordered protein regions.
We evaluate the local sequence composition of the interaction site, which determines the binding mode ( 13 ), based on the difference between the composition of the putati v e interacting motif and its flanking sequence, defined as a local sequence bias ( Figure 1 ) ( 13 ). A few residues can generate a strong ordering bias in the binding site, leading to a well-defined structure and contact pattern. In contrast, similar residues or repetiti v e motifs can generate a weak bias leading to heterogeneous bound states, realised via a multiplicity of bound configurations and ambiguous contact patterns. These binding modes can be predicted based on the local sequence bias, which can be evaluated without considering a specific partner ( 13 ). The sequence bias is evaluated using a window of fiv e to nine r esidues r epr esenting a putati v e binding r egion (Figur e 2 A). To this end, the differences in the frequencies of the 20 amino acids in the binding window are combined and compared to the 20-residue N-and C-terminal flanking sequences ( 13 ). Similarly, the sum of the differences in the Kyte-Doolittle hydrophobicity are computed for the same flanking regions. In addition, the difference in tendencies to form a well-defined or disordered structure are also evaluated. The method was trained using a logistic r egr ession model with a scoring function comprising three terms deri v ed from the differences between the binding window and its flanking sequence, as previously described in detail ( 13 ). The performance to discriminate between disorder-toor der and disor der-to-disor der binding modes was evaluated over 2000 protein complexes resulting in an area under the curve (AUC) of 0.85 using all PDB data, and 0.92 using protein regions, which are represented at least in three complexes ( 13 ).

Evaluating the multiplicity of binding modes (MBM)
Small variations in the location of the binding site can lead to considerable changes in the local bias ( 14 , 17 ). These variations can be caused by post-translational modifications, availability of further binding partners, or the presence of co-factors, metabolites or ions. The impact of such variations on the binding mode can be evaluated using multiple potential binding sites with different length and position around the same r esidue (Figur e 2 A). This provides a distribution of binding modes, which are available for interactions of a gi v en r esidue (Figur e 2 B) ( 18 ). The median of such distribution characterises the most likely binding mode for a gi v en residue ( 18 ). The width of the distribution informs on the likelihood of sampling multiple binding modes. The MBM is quantified by the Shannon entropy computed from the binding mode distribution (Figure 2 B) ( 18 ). The MBM characterises context-dependence, the predicted impact of the cellular environment on the binding mode ( 19 , 20 ). The FuzPred method performs AUC of over 0.90 in distinguishing context-dependent regions (CDR) from disor der-to-or der and disor der-to-disor der regions using 750 protein complexes ( 18 ).

The interaction behavior is represented by a binding mode landscape
The binding mode landscape describes the interaction behavior of proteins by sim ultaneousl y characterising the binding mode and the MDM ( 18 , 20 ). The x-axis displays the MBM and the y-axis displays the probability to form disordered interactions (Figure 3 B). These two parameters inform on the most likely binding mode and its sensitivity to different partners or cellular conditions. Protein regions that usually remain heterogeneous in their assemblies are loca ted a t the upper part of the landscape. Protein regions that prefer undergoing disor der-to-or der transitions are loca ted a t the lower part of the landscape. Protein regions tha t exhibit similar binding modes in a variety of conditions are located in the left part of the landscape, and protein regions that change their binding modes with the cellular conditions can be found in the right part of the landscape (Figure 3 B).
The regions that form disordered complexes with a variety of partners (upper left quadrant) often dri v e protein phase separation ( 9 ). In contrast, regions which tend form disor dered assemb lies, but can be induced to form or dered structures can be found at the upper right quadrant ( 21 ). These regions often serve as hot-spots of aggregation ( 22 ). Regions in the lower right quadrant may undergo disorderto-order transitions upon binding, but can form polymorphic structures. These are typically the regions serving as amyloid cores ( 22 ). Finally, the regions in the lower left quadrant are regions that adopt a well-defined structure upon binding with all their interaction partners.

Pr edicted inter action char acteristics av ailable from the FuzPr ed w eb server
All predictions are based on solely the protein sequence, which can be provided for the input as a UniProt code ( 23 ) or the FASTA file (Figure 4 A). Importantly, no information To predict the binding features of residue W53, for example, we consider different possible binding sites with size varying between fiv e and nine residues. The local sequence bias is determined for all possible binding sites (windows), at all possible locations involving W53 (left panel). ( B ) Binding mode distribution analysis. This procedure generates the distribution of disor der-to-or der (b lue, right top) and disor der-to-disor der (orange, right top) binding modes as computed from 35 p DO and p DD values. The median of such p DO and p DD distributions provides the most likely p DO and p DD value, based on which the binding mode can be assigned ( 13 ). According to these predictions, W53 is likely to undergo disor der-to-or der transition upon binding. The information content of the p DO and p DD distributions, which can be computed as a Shannon entropy, informs on the likelihood to change the binding mode with the conditions ( 18 ). This is r epr esented by the multiplicity of binding mode (MBM) graph shown on the right bottom, middle panel), where the Shannon entropy is normalised into [0,1] range ( 14 ). The MBM graph indicates that W53 likely changes binding modes and can possibly serve as an aggregation hot-spot. on the partner identity is r equir ed. The r esults ar e displayed on a separate page.

Prediction of the binding mode
The binding mode is characterised by the disor der-to-or der or disor der-to-disor der transitions upon binding. The user can choose the type of structural transition using the two tabs on the top of the page (Figure 4 B). Disor der-to-or der tr ansitions char acterises the sequence based probability of ordering upon binding ( p DO , blue) (Figure 4 B). In case of disorder ed proteins, r egions with p DO ≥ 0.6 likely adopt a folded structure upon binding. In case of ordered proteins, regions with p DO ≥ 0.6 rigidify upon binding. Disorder-todisorder tr ansitions char acterises the sequence based probability of decreasing order upon binding ( p DD , orange) (Figure 4 B). Protein regions with p DD ≥ 0.6 are likely remain or become heterogeneous conformational ensemble in the bound assembly. Folded proteins regions with p DD ≥ 0.6 ma y unf old upon interactions. Protein regions with p DO and p DD in the range of 0.4-0.6 usually can sample both binding modes.
The graphs displaying the probabilities of disorder-toorder or disorder-to-disorder transitions are interacti v e (Figure 4 B). Moving the cursor above the columns the graph show the p DO or p DD values belonging to each data point, and the identity the gi v en residue. One can also zoom on selected regions of interest.

Prediction of the multiplicity of binding modes (MBM)
The likelihood that a protein region samples multiple binding modes under different cellular conditions is shown in the second panel below the graphs on the binding mode predictions (Figure 4 B). The MBM value is deri v ed from the Shannon entropy computed from the binding mode distributions in the presence of a series of hypothetical partners ( 21 ), normalised into the range of [0,1] ( 14 ). Protein regions with MBM ≥0.65 are sensiti v e to the cellular context and are expected to sample multiple binding modes ( 21 ). Protein regions with MBM < 0.55 likely sample one binding mode. The MBM graph is interacti v e (Figure 4 B) showing the identity of residues and the corresponding MBM values. The MBM graph is identical for the disorder-toorder or disorder-to-disorder transitions. The binding modes are classified based on the probabilities of disor der-to-or der (p DO , b lue, graphs) and disor der-to-disor der tr ansitions (p DD , or ange gr a phs); and the m ultiplicity of binding modes (MBM, green). Disor der-to-or der binding mode is r epr esented by the p53 oligomerisation domain (PDB:1c26, 325-356 residues, top left), which has a high probability of disor der-to-or der transition (p DO ) and low multiplicity of binding modes, thus pr efer entially samples ordered binding modes as observed (329-341 residues magenta). Context-dependent binding modes (top middle) are represented by the p53 transactivation region in complex with the general transcription factor TFIIH (41-62 residues, PDB:2ruk) and in complex HMGB1 (1-93 residues, PDB: 2ly4). The interacting regions (green: 49-55 residues, and blue: 56-60 residues) are predicted to have high MBM values, and a considerable probability to sample both ordered and disordered interactions. The disor der-to-disor der binding mode can be r epr esented the p53 transactivator region (2-61 residues, PDB:5phd, top right) in complex with the TAZ2 domain of the CBP / p300 coactivator, through a short peptide motif (2-14 residues, teal) exhibiting high p DD and low MBM values, in accord with the observed conformational heterogeneity. ( B ) The binding mode landscape characterises the interaction behavior. The binding mode landscape r epr esents the binding mode (p DD , y axis) together with its variability, as characterised by MBM (x-axis). The lower left quadrant ( p DD < 0.40; MBM < 0.55, r ed cir cles) r epr esents the classical view of binding, with pr efer entially order ed interfaces, as r epr esented by the p53 oligomerisation domain (PDB:1c26). The upper left quadrant ( p DD ≥ 0.60; MBM < 0.55, cyan pentagons) r epr esents pr efer entiall y hetero geneous complex es, as r epr esented by the p53 complex with TAZ2 of CBP / p300 (PDB:5phd). The lower right quadrant ( p DD < 0.40; MBM ≥ 0.65, light blue triangles) r epr esents polymorphic regions that form different structures in a context-dependent manner, for example amylogenic regions ( 45 ). The upper right quadrant ( p DD ≥ 0.60; MBM ≥ 0.65, green squares) r epr esents disorder ed binding r egions that can conditionally conv ert from disor dered to or der ed binding modes, like in case of aggr egation hot-spots ( 21 ). Context-dependent binding modes are shown in the right part of the landscape, r epr esented by 49-55 r esidues interacting with gener al tr anscription factor TFIIH (PDB:2ruk, gr een squar es) and 56-60 residues interacting with HMGB1 (PDB: 2ly4, blue triangles). The more stable binding modes have low MBM values and can be found in the left part of the binding mode landscape. These include the disor der-to-or der binding mode, r epr esented by the 329-341 residues of the oligomerisation domain (PDB:1c26, magenta circles), and the disor der-to-disor der binding mode r epr esented by the 2-14 r esidues in complex with with the TAZ2 domain of the CBP / p300 coactivator (PDB:5phd, cyan pentagons).

Identification of protein regions with differ ent inter action behaviors
Protein regions with particular interaction characteristics are defined below the binding mode and MBM graphs (Figure 4 B). At least fiv e consecuti v e amino acid residues with the same property ar e consider ed as a region, which is represented by a bar, with the same color as the corresponding graph. Disor der-to-or der r egions (DORs) ar e defined as p DO ≥ 0.6 (b lue), disor der-to-disor der regions (DDRs) as p DD ≥ 0.6 (orange). Context-dependent regions, defined as MBM ≥ 0.65. In addition, protein r egions, pr edicted to be disordered (Espritz score ≥ 0.3085 ( 24 )) in the unbound form are also shown. The boundaries of the regions are displayed above the bars as well as shown by moving the cursor above them. Alternati v ely, the sequence can be provided, using only standard amino acids. In case of modified sequence, the cross-links with other databases, such as UniProt, Pfam will not be displayed on the Results page. ( B ) Prediction of binding mode and multiplicity of binding modes. The user may choose from the upper tabs to display the residue-specific probabilities of disor der-to-or der (p DO , left) or disor der-to-disor der (p DD , right) transitions. In both cases, the multiplicity of binding modes (MBM) is displayed below. This graph, reflecting context-dependence of the interactions, does not depend on the binding mode analyzed. Regions undergoing disor der-to-or der (left, blue) or disorder-to-disorder transitions (right, orange) are displayed under the MBM graph. In both cases, context-dependent regions (MBM ≥ 0.65, light green) and disordered regions by the Espritz algorithm (light brown) are also shown. ( C ) Binding mode landscape. The binding mode (y-axis: disorderto-order, left; disorder-to-disorder, right) is displayed as a function of the likelihood of changing the binding mode (x-axis: multiplicity of binding modes, MBM). The characteristics of the four regions of the landscape are labelled, the grey areas have mixed (intermediate) pr operties. ( D ) Pr otein information. Cross-links to experimental databases of protein disorder and liquid-liquid phase separation, and sites of posttranslational modifications. ( E ) Visualisation of regions with different interaction characteristics. The disor der-to-or der (left, b lue), the disor der-to-disor der (orange, right) and the conte xt-dependent (green, middle) are displayed on structured generated by AlphaFold.

Binding mode landscape
The residue-specific interaction behaviors are shown on the binding mode landsca pe, w hich sim ultaneousl y displays the binding mode (y-axis) and the MBM, the multiplicity of binding modes (x-axis), which can be sampled in different cellular environments (Figure 4 C). The MBM values are normalised into the range of [0,1]. All residues are represented by symbols and their identity is shown by moving the cursor above the symbols. The graph is divided into four sections, corresponding to different interaction behaviors (Figure 4 C). The lower left quadrant corresponds to structured interaction elements that rigidify or adopt structure upon binding (low p DD , high p DO , low MBM). The upper left quadrant corresponds to protein regions that remain to be disordered under a wide variety of cellular conditions and partners (high p DD , low p DO , low MBM). These regions are distinguished in protein phase separation ( 9 ). The lower right quadrant displays the structured, polymorphic interaction elements (low p DD , high p DO , high MBM), for example amylogenic regions ( 22 ). The upper right quadrant shows those residues that dominantly sample disordered binding configurations, but can also be triggered to ordered bound states (high p DD , low p DO , high MBM). These r esidues ar e hot-spots for aggr egation ( 22 ). The boundaries of the regions are marked by gray. Protein regions falling into the borderlines usually exhibit a mixture of behaviors.

Sequence features and cross-links to other databases
The FuzPred server provides information on some sequence fea tures tha t may be r elevant to r egulate binding characteristics. In this panel, the sequence corresponding to the UniProt code or the sequence provided is shown ( 25 ). Below the experimentally observed fuzzy regions, with validated functional impacts are displayed as deri v ed from the FuzDB, the database of fuzzy interactions ( 4 ). Below post-transla tional modifica tion sites (PTMs) deri v ed from UniProt database are displayed by red dots ( 26 ). Positioning the cursor above the symbols will display the modified residue, the PTM type and the modifying enzyme. Evolutionary conserved protein domains, which are derived from the Pfam database ( 27 , 28 ), are shown in the last row (Figure 4 D).

Gr aphical r epr esentation of disor der -to-or der, disor der -todisorder and context-dependent regions
Protein regions with different interaction characteristics are visualised by Mol* ( 34 ) on the structures predicted by Al-phaFold ( 35 ) (AF, Figure 4 E), which are accessed by using 3D-Beacons network ( 36 ). Disorder-to-order regions are shown in blue and disor der-to-disor der regions in orange and context-dependent regions (MBM ≥ 0.65) in green (Figure 4 E). In case the predicted structure is not available in the AlphaFold database ( 37 ), the user may carry out the structur e pr edictions for the gi v en sequence in a separate window (Figure 4 E) through AlphaFold Colab provided by DeepMind and Google ( 38 ). The coor dinates deri v ed from the AlphaFold predictions ( 35 ), can then be uploaded to visualise the interaction properties estimated by FuzPred (Figure 4 E). The AlphaFold structures show disordered regions as extended chains, which may significantly deviate from in particular in their bound states ( 39 ).

Download options
The FuzPred prediction results, the residue-based p DO , p DD and MBM values can be downloaded in .tsv format via the 'Download' tab on the top right of the page. The coordinates of the disor der-to-or der, disor der-to-disor der regions and context-dependent regions can also be downloaded in .tsv format via the 'Download' tab on the top right of the page. The graph displaying the p DO , p DD and MBM values together with the bar r epr esentation of the different interacting regions can be saved as an image by the camera icon below the 'Download' tab on the top of the page A snapshot of the the colored AlphaFold structures repr esenting protein r egions of differ ent interaction behaviors can be generated using the 'Screenshot' ta b a bove the image, and the coordinates of the predicted structure can be downloaded. This option is useful for further graphical analysis including other features, for example posttranslational modification sites.

Tutorial page and r efer ences
The Tutorial provides a detailed description of the results, which are shown in the results page. This page is organised similarly to the above sections of the article. On the right side a navigator bar facilitates orientation on the page. The References assembles the literature on fuzziness, organised in a thematic manner. A cross-link to the FuzDrop server ( http://fuzdrop.bio.unipd.it ) ( 40 ) is provided to further explore the liquid-liquid phase separ ation char acteristics of the protein.

FuzPred server application areas
The FuzPred server has four main application areas, describing different interaction behaviors (Table 1 ).
(1) Identification of the binding modes of protein regions.
The probabilities of the disor der-to-or der ( p DO ) and disor der-to-disor der ( p DO ) transitions inform on the binding elements that adopt a well-defined structure upon binding and those that remain to be heterogeneous in the specific assembly.

) Identification of mutations inducing aggregation. Muta-
tions that cause protein aggregation can induce either unfolding (deposition) or ma tura tion of condensates (condensation). In the first case the mutation will shift the position in the binding landscape along the diagonal, whereas in the second case, the shift will be horizontal to the right, possibly in the lower part of the binding landscape.

CONCLUSIONS
It is increasingly recognised that proteins exhibit complex interaction behaviors in cells by sampling a wide range of binding modes from ordered to disordered bound states, which may vary depending on the cellular conditions. The FuzPred w e b serv er provides a comprehensi v e description of the interaction behaviors by predicting both the binding mode and the multiplicity of binding modes without the need of specifying the binding partner. Thus, the FuzPred w e b server identifies regions that are dominantly structured or disordered in their complexes as well as context-sensitive sites tha t alterna te between these binding modes. Such analysis, and the r epr esentation of the results on the structures predicted by AlphaFold, enables the identification of molecular recognition or regulatory sites as well as segments of proteins driving higher-order assemblies. The residue-based proba bilities ena ble the anal ysis of m utants, and elucidating the impact of disease-associated mutations on protein interactions.

DA T A A V AILABILITY
The authors confirm that the data in the article are publicly available.