DeepPPAPredMut: deep ensemble method for predicting the binding affinity change in protein–protein complexes upon mutation

Abstract Motivation Protein–protein interactions underpin many cellular processes and their disruption due to mutations can lead to diseases. With the evolution of protein structure prediction methods like AlphaFold2 and the availability of extensive experimental affinity data, there is a pressing need for updated computational tools that can efficiently predict changes in binding affinity caused by mutations in protein–protein complexes. Results We developed a deep ensemble model that leverages protein sequences, predicted structure-based features, and protein functional classes to accurately predict the change in binding affinity due to mutations. The model achieved a correlation of 0.97 and a mean absolute error (MAE) of 0.35 kcal/mol on the training dataset, and maintained robust performance on the test set with a correlation of 0.72 and a MAE of 0.83 kcal/mol. Further validation using Leave-One-Out Complex (LOOC) cross-validation exhibited a correlation of 0.83 and a MAE of 0.51 kcal/mol, indicating consistent performance. Availability and implementation https://web.iitm.ac.in/bioinfo2/DeepPPAPredMut/index.html.


Introduction
Protein-protein interactions (PPIs) are essential for many biological processes, such as cell signaling, metabolism, and gene expression (Gromiha 2020b).The binding affinity of proteinprotein complexes are dictated by various factors such as favorable noncovalent interactions (Gromiha 2020a), composition of amino acids at the interface for thermal adaptation (Ma et al. 2010) as well as conformational properties, ⍺-helical and β-strand tendencies, bulkiness, and the number of aromatic and charged residues (Yugandhar and Gromiha 2014a,b).Mutations in protein-protein complexes may disturb the functionality and affect the stability of the complex, leading to diseases (Nishi et al. 2013, Petukh et al. 2015, Pandey and Gromiha 2023).Although several experimental techniques are available to study the effect of mutation on binding affinity (change in binding free energy upon mutation, ΔΔG), these methods are labor-intensive, time-consuming, and expensive.Hence, several computational methods have been developed to predict the binding free energy changes upon mutation in protein-protein complexes (Gromiha et al. 2017).The available methods are broadly categorized into two classes based on input parameters, i.e. structure-based and sequence-based methods.
Further, sequence-based methods have been developed for predicting changes in binding affinity upon mutation.ProAffiMuSeq is the first sequence-based prediction method, focusing on mutations situated at the interface (Jemimah et al. 2020).Subsequently, SAAMBE-SEQ introduced a gradient-boosting decision tree model incorporating changes in physicochemical properties to predict alterations in binding affinity (Li et al. 2021).Despite these advancements, leveraging the information on comprehensive experimental datasets, sophisticated computational models of protein structures such as AlphaFold2 (Jumper et al. 2021) and RoseTTAFold (Baek et al. 2021), and deep learning methods enhance the efficiency and performance of predicting the changes in binding affinity caused by mutations.
In this work, we have addressed all these limitations and developed a deep ensemble model, DeepPPAPredMut, which takes a protein sequence and mutation as input from the user and predicts the change in the binding affinity upon mutation in the protein complex.Our method uses physicochemical properties, PSSM, and amino acid properties.In addition, we have used graph-based properties for mutation sites, such as the degree of the residue and hydrogen bond donor/acceptor of the residue.Our model showed a correlation and MAE of 0.83 and 0.51 kcal/mol, respectively in a set of 2591 mutants.

Dataset
To develop the prediction model, we utilized experimentally known binding affinity data collected from PROXiMATE (Jemimah et al. 2017) and SKEMPIv2 (Jankauskaite et al. 2019) and the literature.The SKEMPIv2 and PROXiMATE databases contain 7085 and 6293 single mutations from 348 and 173 protein-protein complexes, respectively and 3345 mutations in 131 complexes are common between them.Further, we have collected 473 mutations from 65 proteins from the literature.Initially, we consolidated data from these resources and excluded proteins with >25% sequence identity.Next, we refined the dataset by eliminating redundant mutations.Further, all the mutations, which are present in the validation dataset (see below) are removed from this dataset.The final dataset is comprising of 2591 mutations from 236 proteins.For data with binding affinity on dissociation constant (K d ), we calculated the binding free energy (ΔG) using the following formula: (1) R: gas constant; T: temperature in Kelvin; K d : binding affinity of the complex.Further, to calculate the change in binding free energy upon mutation (ΔΔG), we used the following equation: ΔG mut and ΔG wild are binding free energies of mutant and wild-type protein-protein complex, respectively.For validation purposes and to compare the model performance with existing state-of-the-art methods, we utilized the benchmark (validation) dataset created by Geng et al. (2019), which is used in other studies (Jemimah et al. 2020, Li et al. 2021, Behbahani et al. 2023).It includes 19 mutations from NM dataset for interleukin4-receptor complex (Benedix et al. 2009), S487 dataset with 487 mutations in 56 protein complexes (Jankauskaite et al. 2019) and 33 mutations in MDM2-p53 complex (Geng et al. 2019).The combined dataset contains a total of 473 mutations in 65 complex structures, which are not used for training and cross-validation.Supplementary Fig. S1 represents the distribution of ΔΔG in our dataset and benchmark dataset.
In our earlier studies, we observed that the binding affinity of protein-protein complexes depends on functional classes of proteins, which have different ranges of binding free energies (Yugandhar and Gromiha 2014a,b, Jemimah and Gromiha 2018, Nikam et al. 2022).Further, the classification of mutants based on functional classes of proteins enhanced the prediction performance (Jemimah et al. 2020).Hence, we divided the dataset into six subsets based on the functional class of each protein-protein complex: (i) antigen-antibody, (ii) enzyme-inhibitor, (iii) G-protein-containing, (iv) receptorcontaining, (v) other-enzyme containing, and (vi) miscellaneous.The distribution of ΔΔG values for individual protein functional classes is summarized in Fig. 1.We have noted a wide range of ΔΔG values across different classes.Specifically, receptor-containing complexes has the lowest ΔΔG value of − 2.25 kcal/mol, whereas the class "otherenzyme" has the lowest ΔΔG of − 3.75 kcal/mol.Enzymeinhibitor complexes have ΔΔG values, ranging from − 5.91 to 7.66 kcal/mol, with an average value of 0.95 kcal/mol.On the other hand, G-protein complexes have the range of − 1.78 to 5.33 kcal/mol with a similar average ΔΔG.

Feature calculation
To develop the model, we calculated various sequence and structure-based features.We considered the predicted structures from the AlphaFold database (Varadi et al. 2022) to calculate structure-based features.

Sequence-based features
We used a set of 103 amino acid properties (Supplementary Table S1), which are shown to be relevant for protein folding studies (Chaudhary et al. 2015).In addition, we calculated protein interface-based features for each amino acid from PIFACE (Cukuroglu et al. 2014), which are reported to be important to understand mutational effects on protein-protein complexes (Jemimah et al. 2020), PSSM matrix and conservation score using AACon (Manning et al. 2008, Valdar 2002).Further, we included mutation-based features such as net volume, flexibility of the residue, change in hydrophobicity, and fluctuation of amino acid upon mutation.These features have been used in the literature for predicting the effect of mutation on stability, folding rate and binding affinity (Pahari et al. 2020, Li et al. 2021, Gromiha 2024).The change in property value (ΔP) is computed as the difference between each property value of mutant (P mut ) and wild-type (P wild ) residues.

Structure-based features
To compute structure-based features, we utilized the predicted structures reported in the AlphaFold database (Varadi et al. 2022).With these predicted structures, we computed various amino acid features, such as solvent-accessible area, using Naccess (Hubbard and Thornton 1993), number of hydrogen bond acceptors and donors using HBPLUS (McDonald and Thornton 1994) and residue depth, using a python script adapted from the Biopython module.Furthermore, we calculated graph-based features for the mutation site, such as degree centrality, betweenness centrality, eigenvalue centrality, and closeness centrality.To compute graph-based features, we designed a graph in which each node corresponds to an amino acid.Edges are established between pairs of amino acids when their spatial distance is <8 Å.These features provide information regarding the connectivity of the mutated site, which could be helpful in predicting the change in binding affinity.In addition, we considered mutation position (core, interior, support, rim, and surface) as defined by Levy (2010).

Feature selection
Initially, we considered a set of 295 sequence and structurebased properties, which are relevant to protein-protein interactions.Further, we removed the highly correlated properties using a correlation coefficient cut-off > 0.85, which resulted in 90 properties.In addition, we utilized the "recursive feature elimination" from scikit-learn (Pedregosa et al. 2011) to select the topmost 50 important features.These selected features are used as input to the individual model development for each functional class.

Deep learning model
In this study, we developed a deep ensemble model to predict the effect of mutations on binding affinity.Deep learning is a division of machine learning that works by connecting a number of neurons in multiple layers to capture the nonlinear relationship between input features and the change in binding free energy (ΔΔG).In our previous work, we demonstrated the potential of deep learning to predict binding site residues accurately and to predict binding affinity (Nikam et al. 2022(Nikam et al. 2023)).Initially, we provided various curated features obtained after feature selection step to a Keras sequential model (https:// keras.io/),which consists of three layers (Input, hidden, and output layers).The input layer takes the calculated features and passes them to the hidden layer which has Relu as activation function, which, in turn, passes them to the output layer.We have adapted the hyperparameter optimization through grid search and the dimensionality of the output layer is optimized to 12.This 12D encoded vector serves as the input for the random forest model.We chose the random forest model as our final predictor due to its superior performance compared to other machine learning methods.A detailed comparison between the random forest and other approaches is provided in Supplementary Table S2.The overall structure of the prediction model is depicted in Fig. 2.

Performance validation
We utilized the 10-fold cross-validation technique to validate the performance of the method.The dataset was divided into 10 subsets, and during each iteration, 9 subsets were used for training, while one subset was used for testing the performance.This process was repeated 10 times to ensure reliable evaluation.
In addition, we used leave-out-one-complex cross-validation.In this approach, we removed mutations associated with one complex from the dataset and trained the model on the remaining data.Then, we tested the performance on the removed complex.We repeated this process for all available proteins in the dataset.This method allowed us to observe how well the model generalizes to unknown mutations in a new complex.Furthermore, the performance of the method has been assessed using the following measures: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi nΣx 2 À Σ x ð Þ 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi PCC or Pearson's correlation coefficient is denoted by R. MAE stands for mean absolute error, and ΔΔG exp (x) and ΔΔG pred (y) represent experimental and predicted ΔΔG values, respectively; n is the number of data.

Predicting the change in binding affinity upon mutation
We utilized the deep ensemble model for predicting the change in binding affinity upon mutation using various sequence and structure-based features as input (see Section 2).We classified the overall dataset into subsets based on their functions, which is shown to be important for improving the prediction performance (Yugandhar and Gromiha 2014a,b, Jemimah et al. 2020, Nikam et al. 2022).
The overall prediction performance of the method for training/test and LOOC cross validation is plotted in Fig. 3a  and b, respectively.Our model achieved a correlation of 0.97 with a MAE of 0.35 on training set and a correlation of 0.72 with a MAE of 0.83 on test set.Further, our model showed consistent overall performance on LOOC cross validation with a correlation of 0.83 and a MAE of 0.51.

Prediction based on the protein functional class
We evaluated the prediction performance for each protein functional class, and the results obtained with 10-fold cross validation along with overall performance (mean of all protein functional classes) is summarized in Table 1 along with number of features used in each class.Figure 4 shows the relationship between experimental and predicted ΔΔG values.

Antigen-antibody
The antigen-antibody dataset comprises 1032 mutations, and the experimental ΔΔG values span from − 4.91 to 7.6 kcal/ mol.Our model has demonstrated an impressive performance, achieving a correlation coefficient of 0.82 with a MAE of 0.53 kcal/mol during 10-fold cross-validation (Fig. 4a).In addition, in the LOOC cross-validation scenario, our model achieved a correlation coefficient of 0.83 with a MAE of 0.51 kcal/mol (Supplementary Fig. S2a).

Enzyme-inhibitor
Within this subset, there are 484 mutations, and the experimental ΔΔG values vary from − 5.91 to 7.66 kcal/mol.Our model displayed robust and reliable performance in both 10-DeepPPAPredMut fold and LOOC cross-validation, achieving a correlation coefficient of 0.86 with a MAE of 0.57 kcal/mol (Fig. 4b, Supplementary Fig. S2b).

G-protein-containing
The G-protein dataset encompasses 94 mutations for which experimental ΔΔG data are available and it ranges from − 1.78 to 5.33 kcal/mol.Remarkably, our model demonstrated a consistent performance in both 10-fold and LOOC cross-validation, yielding a correlation coefficient of 0.77 with a MAE of 0.47 kcal/mol (Fig. 4c, Supplementary Fig. S2c).

Receptor-containing
The receptor-containing complexes involve interactions between receptors and various ligands, including growth hormone, insulin, interleukin, viral proteins, plasminogen activator, and lysozyme.Within this category, the experimental ΔΔG values span from − 2.25 to 6.27 kcal/mol.Our model has showed consistent results, achieving a correlation coefficient of 0.81 and a MAE of 0.42 kcal/mol during 10-fold cross-validation, as depicted in Fig. 4d.Similarly, in LOOC, our model maintains a correlation of 0.81 with a MAE of 0.41 kcal/mol, as illustrated in Supplementary Fig. S2d.

Other-enzyme containing
Within this specific category, enzymes form complexes with noninhibitor proteins.The experimental ΔΔG values for this set of interactions span from − 3.75 to 5.90 kcal/mol.Our model demonstrated a correlation coefficient of 0.82 with a MAE of 0.59 kcal/mol during 10-fold cross-validation (Fig. 4e).In the LOOC scenario, the model maintained a strong correlation of 0.78 with a slightly increased MAE of 0.62 kcal/mol (Supplementary Fig. S2e).

Miscellaneous
The miscellaneous class includes protein complexes that do not fall into any of the previously mentioned categories.Within our dataset, there are 100 mutations, and the experimental ΔΔG values range from − 1.26 to 4.64 kcal/mol.Our model showed a strong performance, with a correlation coefficient of 0.86 between the experimental and predicted ΔΔG values and a MAE of 0.40 kcal/mol on10-fold cross-validation as depicted in Fig. 4f.Similarly, in LOOC, the model maintains a correlation of 0.82 with a MAE of 0.44 kcal/mol, as shown in Supplementary Fig. S2f.Further, we examined the performance of the method using two additional evaluation procedures such as (i) cross validations are performed randomly at the level of variations, allowing two different substitutions occurring at the same site to be placed in the training and testing set, respectively and (ii) splits are performed such that mutations occurring at the same site are all confined either in training or testing sets.The results obtained with these two scenarios are presented in Supplementary Table S3.We observed that overall correlation is 0.86 and 0.82, respectively, along with a MAE of 0.49 and 0.50 kcal/mol.

Importance of selected features
We utilized SHapley Additive exPlanations (SHAP) plots as a tool to gain insights into the significance of individual features selected by our method across all functional classes (Lundberg et al. 2018).In Supplementary Fig. S3a-f, we have visualized and presented the contributions of these important features for each specific functional class.
We observed that certain features play a prominent role in influencing the predictions.Notably, graph-based features like closeness centrality, degree centrality, and eigenvector centrality consistently exhibit substantial contributions across various functional classes.These features reflect the structural and connectivity aspects of the protein network and appear to be crucial in explaining the variations in ΔΔG values due to mutations.
In addition, we have found that other factors, such as the residue position within the protein, the hydrophobicity of the mutated residue at the mutation site, and the change in volume due to mutation, also make significant contributions.These features provide valuable information about the local environment and the physical changes induced by mutations, contributing to our understanding of the underlying mechanisms that affect binding affinity.
In summary, our SHAP plot analysis reveals that a combination of structural and biochemical features, including graph-based properties and residue-specific characteristics, plays a pivotal role in predicting the impact of mutations on binding affinity across different functional classes.These insights enhance our comprehension of the factors influencing protein interactions and mutation effects.

Performance comparison on blind datasets
To construct the validation dataset, we amalgamated mutations and predictions from three sources: the NM dataset, the S487 dataset, and the MDM_p53 dataset.This unified dataset comprises a total of 473 mutations extracted from 65 complex structures, distinct from those employed in the training and test datasets.It encompasses a wide range of ΔΔG values, spanning from − 4.38 to 6.08 kcal/mol, and comprehensively represents all functional classes.Our model achieved a correlation of 0.65 with a MAE of 0.83 kcal/mol on this benchmark dataset and the performance is consistent in all the three datasets.The overall performance of the model on the benchmark dataset is represented in Fig. 5.In addition, a comparison with existing methods is presented in Table 2. From Table 2, it is evident that our method (DeepPPAPredMut) outperforms other methods in the literature.

Webserver
We have developed a user-friendly web server to predict changes in binding affinity upon mutations and it is available at https://web.iitm.ac.in/bioinfo2/DeepPPAPredMut/index.html.It accepts the protein sequences in FASTA format, and mutations in a prescribed format, along with protein functional class from the dropdown menu.Upon submission, the server will compute the necessary features and display the ΔΔG (kcal/mol), along with information about whether the given mutation decreases or increases the affinity.For large scale analysis code will be provided on request.

Conclusion
We have developed DeepPPAPredMut, a robust deep ensemble model to predict the change in binding affinity in protein-protein complexes due to mutation.It exhibits remarkable predictive performance, as evidenced by its high correlation coefficients and minimal MAEs on both training and test datasets.Its reliability is further substantiated through Leave-outone-complex cross-validation, which simulates real-world conditions.The versatility of the method extends across diverse functional classes of proteins, catering to researchers with varying focus areas.Our use of SHAP-based feature analysis elucidates the critical factors such as structural and network-based features, driving predictions, offering valuable insights into the underlying mechanisms.Moreover, DeepPPAPredMut surpasses existing methods when evaluated on a benchmark dataset and includes a user-friendly web server, enhancing accessibility for researchers.a The PCC and MSE for iSee, FoldX, mCSM, BindProfX, ProAffiMuSeq, are taken from Jemimah et al. (2020) and SAAMBE-3D, SAAMBE-SEQ is taken from Li et al. (2021).

Figure 1 .
Figure 1.Distribution of ΔΔG values across different protein functional classes.

Figure 2 .
Figure 2. Overall workflow of the model.

Figure 3 .Figure 4 .6
Figure 3. Performance of the present method on (a) training and test, and (b) LOOC cross validation.

Table 2 .
Comparison of DeepPPAPredMut with other existing methods.a