DDMut: predicting effects of mutations on protein stability using deep learning

Abstract Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.


INTRODUCTION
Proteins are versatile and dynamic tools tailored by nature over the course of evolution to coordinate a range of biochemical processes central to life. They are involved in many biological processes, including cell signalling, proliferation, metabolism, and cell death (1)(2)(3)(4). It is, ther efor e, essential to understand how changes in the protein sequence might impact its structure, function and interactions, giving rise to different phenotypes.
Characterising the molecular consequences of mutations can pro vide k ey insights into their biological outcomes. Over the past few years, missense mutations have been extensi v ely studied due to the accumulation of data and their subtle effect on proteins ( 5 ). A single amino acid change at the protein sequence le v el can lead to local atomic changes in the 3D structur e, ther eby affecting the kinetics of protein folding, stability, flexibility and dynamics ( 6 ). Despite large advances in protein structural modelling tools, these changes, howe v er, ar e curr entl y poorl y ca ptured by protein structur e pr ediction tools.
Significant efforts have been invested into understanding and predicting the molecular consequences of mutations in protein coding regions, howe v er most approaches have been limited in their throughput (pre v enting genome wide and sa tura tion mutagenesis implementa tion), r estricted to pr edicting consequences of single point missense mutations, and are poorly predicti v e of stabilising mutations, essential for biotechnolo gical a pplications, due to inherent biases in the data ( 7 , 8 ).
To fill this gap, here we report DDMut, a user-friendly w e b server that implements our well validated concept of graph-based signatures within a novel deep learning framewor k (Figure 1 ), enab ling us to ra pidl y screen both single and multiple point mutations, with comparable performance on both stabilising and destabilising mutations. DDMut was made available as an easy-to-use w e b server and API, for seamless integration with analytical pipelines at https://biosig.lab.uq.edu.au/ddmut/ .

Datasets
T r aining set. DDMut training set for predicting the effects of single point mutations was curated from S2648 ( 9-12 ) (originally fr om Pr oTherm ( 13 )) and FirePr otDB ( 14 ). Redundant entries (at mutation le v el) in the blind test sets were removed from the training set if they have the same Uniprot ID and the same mutation. This is followed by removing the duplica tes (a t muta tion le v el) in each dataset, where the chosen G among the duplicates was measured under physiological conditions closest to the pH of 7 and the tempera ture of 25 • C . To balance the G distribution, the hypothetical re v erse muta tions were introduced into each da taset under the following scheme, where G is defined as the unfolding free energy: This led to our final DDMut training set S9028 (9028 mutations, across 153 proteins). The G distribution of S9028 is shown in Supplementary Figure S1.
For predicting the effects of multiple mutations, we removed duplicates and redundant entries (at multiplemutation le v el) from DynaMut2 training set ( 15 ), and included re v erse mutations. This led to our training set SM1242 (98 structures across 94 proteins). We also implemented a protein-le v el non-redundancy split, which led to the training set SM1218 (67 structures across 60 proteins).
Blind test sets. For single point mutations, the fiv e univ ersal non-redundant b lind test sets at protein or mutation le v el for most available protein stability pr edictors wer e used for the purpose of benchmark comparisons. S276 ( 16 ) and S669 ( 17 ) include proteins which have low sequence identity with the original ProTherm ( 13 ) dataset and are nonredundant at protein le v el, whereas S1342 ( 18 ) is a blind test non-redundant a t muta tion le v el, w hich means m utations in this dataset may occur on the same protein with mutations in the training set, but at different positions or the same position with different mutant residues. Deep mutational scanning (DMS) datasets from the CAGI5 challenge ( 19 ) (including variants for PTEN and TPMT) and Gerasimavicius et al. ( 20 ), which includes functional scores of 161,441 variants across 45 independent assays, were also evaluated. After removing redundant data within the same test set (at mutation le v el) and including the hypothetical re v erse mutations, our final blind test sets comprised of 552 (37 structures on 37 proteins), 1,304 (94 structures on 87 proteins), 2,024 (129 structures on 120 proteins) mutations. The overlaps between these three datasets at both mutation and protein le v el ar e shown in Supplementary Figur e S2. Since the CAGI5 challenge data infer protein stability changes from the abundance of EGFP fused to the mutant protein, PTEN (3,736 mutations) and TPNT (3,627 mutations), hypothetical re v erse muta tions were not included for these da tasets. Similarly, re v erse mutations were not included for the DMS datasets from Gerasimavicius et al. as they indicate the functional impacts of mutations and do not reflect the thermodynamic cycle of protein stability per se.
The blind test set for predicting m ultiple m utations was originally reported in DynaMut2 ( 15 ). Removing the duplicates and including the re v erse mutations led to our multiple point mutation blind test set SM420 (61 structures in 63 proteins, 420 double and triple mutations). Under the proteinle v el non-redundancy split, the blind test set SM444 has 44 structures across 44 proteins.
The wild-type structures for all the datasets were downloaded from Protein Data Bank ( 21 ), and the mutant structur es wer e generated from the corresponding wildtype using MODELLER ( 22 ) with its default minimisation pipeline. Both wild-type and mutant structures were utilised for generating the features for both forward and reverse mutations.

Feature engineering
Two sets of features were gener ated, gr aph-based signatures and complementary features: • The graph-based signatures were generated using mCSM ( 12 ), a Cut-off Scanning Algorithm ( 23 ) operated within a graph-based r epr esentation of local r esidue environment, to capture the distance patterns between pairs of atoms labelled with eight different pharmacophores (Hydrophobics, Positi v es, Negati v es, Hydrogen Acceptors, Hydrogen Donors, Aromatics, Sulphurs and Neutrals) • The complementary features include both sequenceand structure-based features: (a) sequence-based featur es wer e calculated using substitution matrices such as AAindex ( 24 ) to capture changes in physicochemical and biochemical properties, BLOSUM and PAM which are based on sequence alignment; (b) structure-based features include solvent accessibility, residue depth, secondary structure, atomic interactions between the residue of interest and its neighbouring residues calculated by Arpeggio, as well as the changes in interactions upon mutations ( 25 ). The tools used to calculate each set of complementary features are detailed in Supplementary  Table S1.
W124 Nucleic Acids Research, 2023, Vol. 51, Web Server issue Figure 1. DDMut Workflow. There were four steps involved in the methodolo gy. Firstl y, da tasets were cura ted from dif fer ent sour ces, and protein structur es were curated from RCSB PDB. Secondly, a set of features capturing both geometric and physicochemical properties were generated and normalised. These featur es wer e then input into neural networks, which wer e further optimised via tuning the hyperpar ameters and lay ers based on the tr aining performance, and validated on non-redundant blind test sets. Finally, the predicti v e models were made freely available as easy-to-use w e b interfaces.
The generated features were then normalised by their mean values and standard deviations.

Netw ork ar chitectur e
For both predicti v e tasks, single and multiple point mutations, DDMut models were trained using siamese networks (Supplementary Figur e S3), wher e tw o sub-netw orks with the same ar chitectur e and w eights w ere applied on the features calculated from forward and re v erse mutations separately. In each sub-network, graph-based signatures were processed with convolutional layers followed by a transformer encoder, whereas complementary features were processed with two dense layers. These two feature components were then conca tena ted along with residual connections, and then followed by a dense layer. In the end, a contrasti v e loss function adapted and modified from ( 26 ) was calculated to not only consider the errors between the predicted and actual G , but also take into account the antisymmetry (errors between forward and the corresponding re v erse mutations): where G For war d and G Re ve rse are predictions for forward and the corr esponding r e v erse mutations, y is the experimental G for the forward mutation. For a perfectly anti-symmetric and accurate model, the following rules will be satisfied: which results in a loss of 0. During the training phase, the hyperparameters of the ar chitectur e wer e fine-tuned based on the cross-validation performance on the training set. This process was carried out independently for single and multiple point mutation models. The evaluation metrics used include Pearson's ( r ), Kendall's ( k ) and Spearman's ( s ) correlations, root mean square error (RMSE), mean absolute error (MAE) and mean signed error (MSE).

WEB SERVER
We deployed DDMut as a freely available and user-friendly w e b server at https://biosig.lab.uq.edu.au/ddmut/ . The frontend is built using MaterializeCSS (version 1.0.0), and the backend uses the Flask module (2.0.3) from Python. The w e b server is hosted on a Linux machine running Nginx.

Input
DDMut can be used to predict Gs for both single point m utations and m ultiple m utations under two different options. Users ar e r equir ed to pr ovide a pr otein of interest by either uploading a PDB file or input a valid PDB accession code. Mutation details for the 'Single Mutation' option (Supplementary Figure S4) can be provided manually as a text string (in the format of wild-type residue one-letter code followed by residue position and mutant residue oneletter code) and Chain identifier, or by uploading a text file with a list of muta tions. Alterna ti v ely, users can run automated alanine scanning. For the 'Multiple Mutations' option (Supplementary Figure S5), mutations should be separated by a semi-colon for each entry ('A F7A;A V13M' as an example, w here two m utations F7A and V13M are on the same chain A). Here, we are considering double and triple m utations onl y. Although the w e b server does accept submissions for more than three sim ultaneous m utations, it is important to note that the model has only been validated on up to triple point mutations. Users should ther efor e ex er cise caution when submitting more than three simultaneous mutations. In both options, users may choose to include predictions for hypothetical re v erse mutations, and provide an email address which will be used to send notification once the job's results are ready.
To assist users with job submission, a help page is available at https://biosig.lab.uq.edu.au/ddmut/help .  Figure S7). For 'Alanine Scanning', pr edictions ar e mapped onto the protein sequence and 3D structure displayed using NGL viewer, and r esults ar e downloadable either in the format of a table, or a 3D protein structure with the predictions annotated on the b -factor column (Supplementary Figure S8).
For 'Multiple Mutations', results are summarised as a downloada ble ta ble, and users can select specific entries from the table to be highlighted in the interactive viewer with residue contacts (Supplementary Figure S9).

API
DDMut provides an API (A pplication Pro gramming Interface) to facilitate convenient integration into different research pipelines. A unique ID will be assigned to each single submitted job, and can be used to query the job status or access the w e bsite interface. Our API r equir es the same inputs as our w e bsite. More detailed explanations and examples using curl and Python can be found at https://biosig. lab.uq.edu.au/ddm ut/a pi .

VALIDATION
DDMut was able to accurately and robustly predict the effects of both single and multiple point mutations. The performance on the training sets (under 10-fold crossvalidation) and blind test sets are shown in Table 1 .

Predicting the effects of single point mutations
We evaluated the performance of DDMut on our training set comprising 9,028 single point mutations under 10fold cross-validations. Two different fold-splitting strategies were implemented, low redundancy at the amino acid le v el and low redundancy at the protein le v el. Our method achie v ed a Pearson's correlation of 0.77 (RMSE: 1.25 kcal / mol) under the amino acid low redundancy scheme, and 0.70 (RMSE: 1.37 kcal / mol) for the protein low redundancy split (Figure 2 A). The comparable performance between the two splits provided confidence in the robustness of the models. DDMut also achie v ed consistent performance across both forward and re v erse mutations (RMSE: 1.36 and 1.38 kcal / mol respecti v ely), with a Pearson's correlation of -0.93 between the forward and corresponding re v erse muta tions, indica ting high model anti-symmetry (Supplementary Figure S11). To build a more robust model capable of predicting muta tion ef fects on stability changes of a broader group of proteins, the hyperparameters in the neural network were tuned based on the cross-validation performance under the protein low redundancy split. The final model was trained on the entire training set, and then evaluated on blind test sets.
To fairly compare our model with other available methods, we tested DDMut on non-redundant blind test sets comprising 276, 1,304 and 2,024 mutations, and a DMS dataset from the CAGI5 challenge including variants for PTEN and TPMT. Our model achie v ed the top performance for three out of the four blind test sets (Figure 2 B), with consistent performance on both forward and the hypothetical re v erse mutations, and on both stabilising ( G ≥ 0 kcal / mol) and destabilising ( G < 0 kcal / mol) muta tions (Figure 2 B , Supplementary Tables S3-S9). This provided confidence in the generalisability of the DDMut model. We then further evaluated the capability of DDMut on predicting the functional scores of 161,441 variants across 45 independent DMS assays, ranging from protein abundance, protein binding, activity assays, growth experiments and viral replication ( 20 ). DDMut demonstrated competiti v e performance when compared to nine other protein stability predictors, while the performance is highly heter ogeneous acr oss different assay types (Supplementary Figure S12).
The contribution of each ar chitectur e component was evaluated using ablation studies. By disabling sub-blocks in the network ar chitectur e, we found all the components were essential for the final predictions. While the two dense layers contributed more to the final performance, the transformer encoder and convolutional layers contributed less (Supplementary Table S10). This can also be caused by the sets of features they processed respecti v ely. To further understand what makes a mutation to be stabilising ( G ≥ 0 kcal / mol) or destabilising ( G < 0 kcal / mol), we then evaluated the importance of each feature to the neural network by a model-agnostic approach, i.e. feature permutation importance. We randomly shuffled the feature values across all the mutations in the balanced blind test set S1304, while maintaining the mean and variance of the feature. Notably, the top two important features ( hydrophobic contacts; hydrophobic atoms) are both related to the changes in hydrophobicity upon muta tions, shuf fling each of them alone dropped the Pearson's correlation on forward mutations by around 0.05 (Supplementary Table S11).

Predicting the effects of multiple point mutations
We then evaluated the performance of DDMut on predicting the effects of double / triple point mutations. To keep consistency with data used by other methods, DDMut was trained and optimised on a dataset consisting of 1242 entries (SM1242), and performance was assessed using a blind test non-redundant at m ultiple-m utation le v el comprising 420 entries (SM420). On the training set SM1242, DDMut achie v ed Pearson's correlations of 0.69 (RMSE: 1.83 kcal / mol) under 10-fold cross validation using the amino acid low redundancy splitting scheme. On the blind test SM420, DDMut achie v ed a Pearson's correlation of 0.70 (RMSE: 1.84 kcal / mol) (Figure 2 C), outperforming previous methods on the forward mutations with consistent performance on both stabilising and destabilising mutations (Supplementary Table S12), and also on both double and triple point mutations (Supplementary Table S13).
Under the protein-le v el non-redundancy scheme, we trained and optimised DDMut on SM1218, and tested on SM444. On the training set SM1218, DDMut achie v ed a Pearson's correlation of 0.45 (RMSE: 2.17 kcal / mol) under 10-fold cross validation. On the blind test SM444 non-redundant at protein le v el, DDMut achie v ed a Pearson's correlation of 0.49 (RMSE: 2.45 kcal / mol) (Figure 2 D), comparable to performance across training. The performance on stabilising and destabilising mutations, and on double and triple point mutations are shown in Supplementary Tables S14 and S15, demonstrating that despite a drop in performance, DDMut outperforms previous methods including MAESTRO, FoldX, and DDGun. Due to the limited sample size for multiple mutations and a demand for scalability to unseen proteins, the final model deployed on our w e b serv er was built on the combined training and b lind test sets.

CONCLUSION
Here we present DDMut, a fast and accurate tool to investigate the effects of single and multiple missense mutations on protein stability. DDMut is a siamese network utilising both forward and the hypothetical re v erse mutations to account for model anti-symmetry, and integrating our well-established graph-based signatures with convolutional lay ers and tr ansformer encoder to better capture shortand long-range atomic interactions step-wisely within a localised 3D residue environment. DDMut achie v ed consistent performance on both stabilising and destabilising mutations, and outperformed other tools on different blindtest sets in terms of both accuracy and efficiency. By allowing users to perform alanine scanning, DDMut could also be potentially used for probing important residue side chains for protein folding and stability. We belie v e DDMut will be an invaluable tool for various applications such as detecting functional residues, inferring disease-associated missense mutations, and engineering more stable proteins. DDMut is freely available at https://biosig.lab.uq.edu.au/ ddmut/ .

DA T A A V AILABILITY
DDMut w e b-server and all the datasets used in this study ar e fr eely availa ble at https://biosig.la b.uq.edu.au/ddmut/ .

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.