DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability

Abstract Proteins are highly dynamic molecules, whose function is intrinsically linked to their molecular motions. Despite the pivotal role of protein dynamics, their computational simulation cost has led to most structure-based approaches for assessing the impact of mutations on protein structure and function relying upon static structures. Here we present DynaMut, a web server implementing two distinct, well established normal mode approaches, which can be used to analyze and visualize protein dynamics by sampling conformations and assess the impact of mutations on protein dynamics and stability resulting from vibrational entropy changes. DynaMut integrates our graph-based signatures along with normal mode dynamics to generate a consensus prediction of the impact of a mutation on protein stability. We demonstrate our approach outperforms alternative approaches to predict the effects of mutations on protein stability and flexibility (P-value < 0.001), achieving a correlation of up to 0.70 on blind tests. DynaMut also provides a comprehensive suite for protein motion and flexibility analysis and visualization via a freely available, user friendly web server at http://biosig.unimelb.edu.au/dynamut/.


INTRODUCTION
Proteins are dynamic macromolecules, whose function is intricately linked to their biological motions (1,2). We have shown previously that drug resistant and genetic disease mutations can both act through changes in protein conformational equilibria and dynamics (3)(4)(5)(6)(7). In order to fully understand the molecular consequences of a mutation it is, therefore, important to consider changes in protein dynamics. Despite their pivotal role, the computational cost of dynamics simulation has led to most structure-based ap-proaches for assessing mutations effects on protein structure and function relying upon static structures.
Normal Mode Analysis (NMA) is a computational approach that approximates the dynamics of a system around a conformation through harmonic motion. This has been used to generate possible movements and therefore provide valuable insights into protein motions, and their accessible conformational repertoires. Previous studies have shown that NMA can be a powerful tool to analyze protein structure-function relationship (8) and to predict the effects of single-point mutations on protein stability (9). Many NMA methods have been proposed (10)(11)(12)(13)(14) to address the lack of easy to use interfaces that limited their use to those with specialist knowledge. However, these are limited to the analysis of protein structures and do not provide approaches to evaluate the effect of mutations within their pipelines.
To fill this gap, we introduce DynaMut, a web server that introduces the dynamics component to mutation analysis. This is achieved by implementing and integrating well established normal mode approaches with our graph-based signatures in a consensus predictor for protein stability changes upon mutation, which we show optimizes overall prediction performance. DynaMut implements NMA through two different approaches, Bio3D (8) and ENCoM (9), providing rapid and simplified access to powerful and insightful analysis of protein motions. In addition, DynaMut also enables rapid analysis of the impact of mutations on a protein's dynamics and stability resulting from vibrational entropy changes. Integration of these two different approaches with other well-established methods and characteristics of the wildtype residue environment into a consensus prediction enables DynaMut to provide an accurate assessment of the impact of a mutation on protein stability, and provide a comprehensive suite for protein motion and flexibility analysis and visualization via an easy-to-use web interface (http: //biosig.unimelb.edu.au/dynamut/).
In this work, we used the previously established S2648 dataset (15)(16)(17)(18), derived from the ProTherm database (19). This dataset is comprised of 2648 different point-mutations across 131 globular proteins with experimentally determined structures whose impact on protein stability has been experimentally measured (602 stabilizing and 2046 destabilizing). The DynaMut training set comprises 2297 mutations randomly selected from the original dataset. A blind test set composed of 351 non-redundant mutations derived from the S2648 set was also compiled. This blind test set has been widely used in the literature (15)(16)(17)(18), enabling direct comparative performance of methods that quantify the impact of mutations on the folding free energy.
Previous studies have reported performance comparisons of difference methods on predicting changes in folding free energy ( G) using these datasets (20)(21)(22). Given the unbalanced nature of the original dataset, here we have considered the hypothetical reverse mutations (22) in order to build a more robust, balanced and self-consistent predictive method. The change in folding free energy is a thermodynamic state function, and it has been proposed that the change in folding free energy of a mutation from a wild-type protein to its mutant ( G WT→MT ) should be equivalent to the negative change in folding free energy of the hypothetical reverse mutation--from the mutant to the wild-type protein (-G MT→WT ) (16,(22)(23)(24). Including the hypothetical reverse mutations, our predictive model was trained using 4594 mutations and our blind test was comprised of 702 single-point mutations.

Normal mode analysis
NMA allows the study of harmonic motions in a system, providing insights into its dynamics and accessible conformations. It has been widely used for studies of protein dynamics as an alternative to more computationally intensive molecular dynamics approaches (25)(26)(27)(28). While molecular dynamics approaches provide motion trajectories for a given molecule over time, conformational fluctuations can be evaluated by NMA via superposition of normal modes (Eigenvectors) and their associated frequencies (Eigenvalues) (29). NMA can also use simplified representations of the protein structure, such as modeling the amino acids using their C␣ atoms, reducing computational cost. NMA has been successfully applied to the study of the effects of mutations on protein dynamics, with ENCoM (9) including the nature of the amino acids in the protein as an extra layer of information to compute the effects of single-point mutations on the vibrational entropy ( S) and protein stability.

Other structure-based approaches
Structure-based approaches to predict the impact of mutations on stability utilize protein structural information from the 3D space of a natively folded protein. Even though these structure-based methods are essentially based on the same structural data, they are built using broadly different, sophisticated, approaches, such as statistical potential func-tion energy calculations, used in SDM (16) and structural pattern mining approaches such as mCSM-Stability (18). The consensus method DUET highlighted that these approaches were complimentary, and that their integration provided more accurate and reliable predictions (17). This has been used to provide invaluable insights into disease and drug resistance mutations, and help guide protein engineering efforts (30)(31)(32)(33)(34)(35)(36)(37)(38)(39).

DynaMut--consensus predictions
Within DynaMut we have implemented a consensus estimate of changes upon mutation on protein folding free energy, which combines the effects of mutations on protein stability and dynamics calculated by Bio3D, ENCoM and DUET to generate an optimized and more robust predictor. Moreover, DynaMut includes a set of complementary information regarding the environment characteristics of the wild-type residue (e.g., relative solvent accessibility, residue depth and secondary structure) and graphbased signatures representing the wild-type structure. The graph-based signatures concept, used in the development of mCSM-Stability and to generate the consensus DUET predictions, has been widely applied to the study of protein structure, including protein-ligand interactions (40), and how mutations alter protein interactions with other molecules (23,24,(41)(42)(43). These were supplied as evidence for training the consensus predictor using Random Forest (44). Figure 1 shows the workflow used to train the consensus predictions. The DynaMut consensus prediction was trained under 10-fold cross validation, and validated using the non-redundant blind test set (Supplementary Materials). The machine learning algorithm, evaluation procedures, performance metrics and details on the methods used on the consensus prediction are described in Supplementary Materials.

WEB SERVER
We have implemented DynaMut as a user-friendly, freely available web server (http://biosig.unimelb.edu.au/ dynamut/). The server front end was built using Bootstrap framework version 3.3.7, while the back-end was built in Python via the Flask framework (Version 0.12.2). It is hosted on a Linux server running Apache.

Input
DynaMut can be used in two different ways, to either (1) analyze protein dynamics or (2) to analyze the effect of point mutations on protein dynamics and stability. For protein dynamics analysis (Supplementary Figure S1), the server requires the user to input a protein structure by either uploading a file in PDB format or by providing the four-letter accession code for any entry on the PDB database. In addition, users have the option to choose a specific force field, which is used to describe the molecular interactions within the structure for normal mode analysis. The force field options available are summarized in Supplementary Table S1 of Supplementary Materials.
Alternatively, for assessing the effects of mutations on protein dynamics and stability, two different input options Figure 1. Methodology workflow. The DynaMut methodology can be divided into four steps. In step 1, data was collected from the previously established S2648 subset of mutations with experimental evidence from ProTherm. In step 2, DynaMut combines the effects of mutations on protein stability and dynamics calculated by Bio3D, ENCoM and DUET. In addition, DynaMut also includes a set of complementary information regarding the environment characteristics of the wild-type residue (e.g. relative solvent accessibility, residue depth and secondary structure) and the graph-based signatures generated by mCSM. All these features are used as evidence for training supervised learning algorithms in step 3. After evaluating the performance of the predictive model, the consensus prediction was integrated into the DynaMut web server. are available (Supplementary Figure S2). The 'Single mutation' option requires the user to provide a PDB file or PDB accession code, the point mutation specified as a string containing the wild-type residue one-letter code, its corresponding residue number and the mutant residue one-letter code. The 'Mutation list' option allows users to upload a list of mutations in a file for batch processing. For both input options the user is also asked to specify the chain identifier in which the wild-type residue is located.
In order to assist users to submit their jobs for analysis and predictions, sample submission entries are available in both submission pages and a help page is available via the top navigation bar.

Output
For the analysis of protein dynamics, the results are displayed in four tabs. In the first tab (Supplementary Figure  S3), porcupine plots show the trajectory of movement according to the first non-trivial mode of the molecule. The second tab (Supplementary Figure S4) allows users to vi-sualize the non-trivial modes generated, including an animated plot that describes the motion of the molecule. Visual representations of deformation energy and atomic fluctuation are displayed on the third tab (Supplementary Figure  S5). Finally, the last tab shows the cross-correlation between residue movements as both a correlation matrix and the 3D structure of the submitted protein (Supplementary Figure  S6).
The mutational analysis results are also split into tabs to enable users to easily navigate the different analyses available for evaluating the effects of mutations on protein stability and dynamics. For the 'Single mutation' option, the server outputs the predicted change in stability (in kcal/mol), along with the variation in entropy energy between wild-type and mutant structures (in kcal/mol/K) in the first tab (Supplementary Figure S7). For comparison purposes, in a separate panel the changes in stability calculated by structure-based methods are shown (16)(17)(18). DynaMut enables visualization of the non-covalent molecular interactions calculated by Arpeggio (45) Table S3, Supplementary Figure  S9) in their respective 3D structures. For the 'Mutation list' option, the server output is summarized as a downloadable table, and users have the option to analyze each mutation separately, similar to the analysis of a single mutation (Supplementary Figure S10).
DynaMut also generates and makes available for download pymol sessions for flexibility analysis and for interresidue interactions for both wild-type and mutant structures to facilitate easy visualisation and figure preparation.

VALIDATION
The performance of DynaMut was compared to wellestablished methods that also provide measurements of effects of single-point mutations on protein stability. All mutations from the data set described previously were submitted to each tool and the Pearson's Correlation Coefficient and Root Mean Squared Error were used to assess the comparison among all methods. Moreover, outliers were considered based on the absolute difference between predicted and actual values of G. Since this definition can vary across the methods and for comparison purposes we defined G ≥ 0 as stabilizing and G < 0 as destabilizing. In the case that a method does not follow such definition, its results were adapted.

Performance on cross validation
Across the full training set (forward and reverse mutations), DynaMut achieved a Pearson's correlation of r = 0.67, and RMSE = 1.31 kcal/mol (r = 0.79 and = 0.01 on 90% of the data) under 10-fold cross validation. This correlation was significantly higher than the individual methods used in the consensus prediction (P-value < 0.0001). Supplementary Table S1 on Supplementary Materials summarizes the performance for all the methods during training of Dyna-Mut. Figure 2A shows the regression analysis for performance of DynaMut over the training set.

Blind test
The non-redundant blind test was used to evaluate the generalization of the consensus predictions. Across the complete blind test set of 702 mutations containing both forward and hypothetical reverse mutations, DynaMut obtained a Pearson's correlation coefficient of 0.70 (RMSE = 1.45; Figure 2B). After removing 10% outliers, DynaMut achieves a correlation of up to r = 0.79 (RMSE = 1.10; Figure 2B). This was significantly higher (P-value < 0.001) than comparable methods (Table 1). Looking specifically at those data points with experimental data, the original core 351 non-redundant mutations, DynaMut achieved a Pearson's correlation of r = 0.69 (RMSE = 1.39), significantly higher than the performance of either ENCoM, FoldX, SDM or Maestro, but lower than I-Mutant2, DUET and mCSM (P-value < 0.001; Table  1). Considering the hypothetical reverse mutations alone, DynaMut significantly outperformed all other algorithms tested, achieving a Pearson's correlation of 0.58 (RMSE = 1.51; Table 1).
Previous studies have highlighted that many machine learning based structural approaches are unbalanced, and can less accurately identify stabilizing mutations (16). We therefore considered method performance across stabilizing and destabilizing mutations separately (Supplementary Table S2). Considering the destabilizing mutations alone, DynaMut has a comparable correlation coefficient but higher RMSE (1.42) than mCSM (1.02), DUET (1.04) and iMutant2 (1.07), and outperformed the other methods tested. Across the stabilizing mutations, however, DynaMut achieved a correlation of r = 0.51 (RMSE = 1.48), significantly higher than all comparative methods (P < 0.01; Supplementary Table S3). This highlights that DynaMut provides the most accurate and balanced approach for the prediction of both destabilizing and stabilizing mutations.

CONCLUSION
Here, we present DynaMut, an integrated computational method that provides users with easy access to powerful and insightful analysis of protein motions and their changes upon mutation. By consolidating these insights with our graph-based signatures, DynaMut is able to accurately assess the effects of missense mutations on protein stability. This consensus approach allows for the more accurate and reliable prediction of both stabilizing and destabilizing mutations. DynaMut is a valuable tool for a wide variety of applications, ranging from protein functional analysis, optimization of stability and understanding the role of mutations in diseases. The method is freely available as a user friendly and easy to use web server at http://biosig.unimelb. edu.au/dynamut/.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.