-
PDF
- Split View
-
Views
-
Cite
Cite
Minghui Li, Franco L Simonetti, Alexander Goncearenco, Anna R Panchenko, MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions, Nucleic Acids Research, Volume 44, Issue W1, 8 July 2016, Pages W494–W501, https://doi.org/10.1093/nar/gkw374
- Share Icon Share
Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/.
INTRODUCTION
Crucial prerequisite for proper biological function is a protein's ability to establish highly selective interactions with macromolecular partners. Sequence variants that alter protein interactions may cause significant perturbations or complete abolishment of function, potentially leading to diseases. The current era of genome sequencing has unraveled a large number of human genetic variations, many of which may affect protein binding and function. However, these new advances are necessary but not sufficient for understanding the origins of allelic variations in human genes and mechanisms of genetic diseases and phenotypes (1). Although a majority of variants are likely to be neutral, a substantial fraction of them may explain the origins of many complex traits and diseases. One possible way to assess the effect of a mutation on protein binding affinity is to experimentally measure it. However, while site-directed mutagenesis methods are inexpensive and fast, surface plasmon resonance, isothermal titration calorimetry, FRET and other methods used to measure binding affinity can be time-consuming and costly. Therefore, the development of reliable computational approaches to predict changes in binding affinity upon mutations is urgently required. With recent advances in computational biology, several approaches have recently been proposed to offer a phenotypic classification of mutations into damaging and neutral categories, to calculate the impact of mutations on protein stability (2–4), but very few methods can actually predict the effects of point mutations on binding energy (5–10). Moreover, out of these few methods, even fewer are available as open access websites.
To address this need we present a new accurate computational method and web server, MutaBind (http://www.ncbi.nlm.nih.gov/projects/mutabind/), which is based on molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. MutaBind evaluates the effects of sequence variants and disease mutations (both interfacial and non-interfacial) on protein interactions; calculates the quantitative changes in binding affinity upon single missense mutations; produces models of mutant proteins and estimates the confidence of predictions. MutaBind was validated using different types of cross-validation and independent test sets from the 26th Critical Assessment of Predicted Interactions (CAPRI) (11) and compared to several other methods. MutaBind can be applied to a large number of tasks, including finding potential driver mutations in cancer, studying the effects of sequence variations on protein fitness in evolution and protein design.
MATERIALS AND METHODS
Experimental datasets of mutations for parameterization
The dataset used for parameterization was compiled from the SKEMPI database (12), which includes experimentally measured values of changes in binding free energy upon single and multiple amino acid substitutions (called ‘mutations’ hereafter) derived from the scientific literature for complexes with experimentally determined structures. SKEMPI contains all types of amino acid substitutions, not limited to alanine scanning. We used SKEMPI data and removed proteins without wild-type crystal structures, proteins measured by ‘unusual method’ as defined in SKEMPI or with modified residues at the binding interface. Then we eliminated SKEMPI entries with multiple mutations restricting our set to single mutations. There are some entries (211 mutations) where several experimental values are available for the same mutation. Since these values are not drastically different from each other, for these cases we used an average value of experimental changes in binding free energy. As a result, the experimental set used in this study for training included 1925 single mutations from 80 wild-type protein–protein complexes (it will be referred to as ‘Skempi’ hereafter). The number of mutations for each protein–protein complex is shown in Supplementary Figure S1. This set is very similar to the dataset used to test BeAtMuSiC method (7) (<5% difference) and performance of MutaBind does not change if the BeAtMuSiC test set is used for validation. Mutations, experimental and prediction data are accessible through ftp://ftp.ncbi.nih.gov/pub/panch/MutaBind.
Structure optimization protocol
Crystal structures of wild-type protein–protein complexes of Skempi set were obtained from the Protein Data Bank (PDB) (13). First we introduced a single mutation on the wild-type structure using the BuildModel module from FoldX (14) software package. Missing heavy side chain atoms and hydrogen atoms were added for the wild type and mutant using VMD program (15) based on the topology file from the CHARMM36 force field (16). Then a 100-step energy minimization in the gas phase was carried out for both wild type and mutant using harmonic restraints (with the force constant of 5 kcal mol−1 Å−2) applied on the backbone atoms of all residues. Minimization was done only for protein–protein complexes, and protein structures of binding partners were retained assuming the rigid-body binding. The energy minimization were carried out with NAMD program version 2.9 (17) using the CHARMM36 force field (16). A 12 Å cutoff distance for non-bonded interactions was applied to the systems. Lengths of hydrogen-containing bonds were constrained by the SHAKE algorithm (18). The current structure optimization protocol was chosen based on its accuracy and speed.
Calculating changes in binding affinity
Our goal is to design a method to assess the effects of mutations on protein–protein binding. There are different ways by which mutations can impact binding. A mutation may change the components of protein–protein interaction energies, may affect the solvation of a complex, may change the folding free energy of each of the partners and may directly disrupt binding hotspot sites (19). We analyzed different protein sequence and structural features (Supplementary Table S1) and found that only 10 features contributed significantly to the quality of the multiple linear regression model (MLR) for the calculation of |${\rm{\Delta \Delta }}G$| value (change in binding affinity upon mutation). The model was parameterized using the ‘Skempi’ set. The features which contribute significantly to the quality of the model are described below.
|${\rm{\Delta }}E_{{\rm vdw}}^{{\rm wt}}$| and |${\rm{\Delta }}E_{{\rm vdw}}^{{\rm mut}}$| are Van der Waals interaction energies for wild-type and mutant protein complexes respectively. They are calculated as differences between Van der Waals energies of a complex and each interacting partner as |${\rm{\Delta }}E\ = {E_{{\rm com}}}\ - {E_{{\rm part}1}} - {E_{{\rm part}2}}$| using ENERGY module of CHARMM program (20). The minimized structure of wild-type or mutant complex was used for the calculation.
|${\rm{\Delta }}G_{{\rm solv}}^{{\rm wt}}$| and |${\rm{\Delta }}G_{{\rm solv}}^{{\rm mut}}$| are the differences between polar solvation energies of a complex and each interacting partner (|${\rm{\Delta }}G\ = {G_{{\rm com}}}\ - {G_{{\rm part}1}} - {G_{{\rm part}2}}$|) in water for wild-type and mutant complexes respectively. These terms are calculated from solving the Poisson-Boltzmann equation with PBEQ module (21) of CHARMM program using the minimized structure of wild-type or mutant complex.
|${\rm{\Delta \Delta }}G_{{\rm fold}}$| is the difference between unfolding free energies of mutant and wild-type protein complexes (|${\rm{\Delta \Delta }}G_{{\rm fold}} = {\rm{\Delta }}G_{{\rm fold}}^{{\rm mut}} - {\rm{\ \Delta }}G_{{\rm fold}}^{{\rm wt}}$|), calculated using BuildModel module of FoldX software (14). FoldX calculates unfolding free energy using empirical force field. This term may account for those cases where partners are unfolded in unbound states and can only fold upon binding to each other so the binding affinity cannot be explicitly calculated.
|$SA_{{\rm com}}^{{\rm wt}}$| and |$SA_{{\rm part}}^{{\rm wt}}$| are solvent accessible surface areas of the wild-type residues in the mutated sites in the complex and unbound state respectively. They are calculated using DSSP program (22) for crystal structure of wild-type complex.
|$CS$| is the conservation score of the mutated site calculated using PROVEAN program (23) which accounts for the fact that binding hotspots (sites contributing the most to the energy of binding) are usually evolutionarily conserved. PROVEAN also takes into account the sequence context of the mutated site and therefore accounts for the alignment quality around a site of interest.
|${\rm{\Delta }}_{{\rm Pro}}^{{\rm wt}}$| and |${\rm{\Delta }}_{{\rm Pro}}^{{\rm mut}}$| terms account for the ability of proline's cyclic structure to introduce constraints on the main-chain dihedral angles which, in turn, can be structurally important for stability or binding. |${\rm{\Delta }}_{{\rm Pro}}$| is equal to 1 or 0 if proline is present or absent in the mutated site in wild-type or mutant proteins.
In addition, Random Forest (RF) supervised learning method was applied, and the final prediction of |${\rm{\Delta \Delta }}G$| by MutaBind was calculated as an average of two |${\rm{\Delta \Delta }}G$| values produced by MLR and RF. Contributions of each term to the MLR and RF models are shown in Supplementary Table S2.
If we train and test our model on the ‘Skempi’ set, the Pearson correlation coefficient between experimental and calculated changes in binding free energies yields R = 0.78 (Supplementary Figure S2a). In addition, we noticed that the performance of MutaBind in estimating the effects of mutations in protease–inhibitor complexes (named as ‘SkempiPI’ hereafter, it includes 862 single mutations from 16 protease–inhibitor complexes) was significantly higher than for other types of complexes with a Pearson correlation coefficient of 0.86 (Supplementary Figure S2b). Therefore, we parameterized our model separately for protease–inhibitor complexes on ‘SkempiPI’ set so that it is possible to specify this type of model on the MutaBind website to obtain more accurate predictions if a query protein complex belongs to protease–inhibitors. All correlation coefficients reported in the paper were significantly different from zero with P-values of less than 0.01.
MutaBind takes about 15–30 min to perform calculations for a single mutation in a protein complex with 300 residues running on a single processor core, and it requires additional 3–5 min for each additional mutation per complex.
VALIDATION
Evaluating the performance of MutaBind using cross-validation
Our goal is to construct a computational method that can yield a good prediction accuracy for diverse and large sets of single mutations. In many cases, overfitting may occur when the parameters of computational methods are tuned to minimize the mean square deviations of predicted from experimental values in the training set, thus leading to the decreased generalized performance (24). At the same time the training set should be as comprehensive as possible. To address this issue, we performed five types of cross-validation. In ‘CV1’ cross-validation (Figure 1A) we randomly chose 80% of all mutations from the ‘Skempi’ set as training and used the remaining 20% mutations for testing; the procedure was repeated 100 times. In case of ‘CV2’ cross-validation, 50% mutations were used for training and remaining mutations for testing, also repeated 100 times. The average Pearson correlation coefficients were R = 0.77 and R = 0.76 for ‘CV1’ and ‘CV2’ respectively with small standard errors of 0.001–0.002 (Figure 1A). Since a distribution of the number of mutations over proteins is not uniform (Supplementary Figure S1), we performed the third type of cross-validation (‘CV3’ cross-validation) trying to take this bias into account. Namely, we produced a subset including 532 mutations from 80 protein complexes by sampling up to ten mutations for each protein complex from ‘Skempi’, the procedure was repeated 10 times. Then 80% mutations were randomly chosen from each subset as training and the remaining mutations for testing, this procedure was also repeated 10 times. It resulted in an average cross-validated correlation of R = 0.71 and SE = 0.02 (Figure 1A). The same procedure was performed for ‘SkempiPI’ set and the results are shown in Figure 1B. Two other types of cross-validation are described in the following section.
Pearson correlation coefficients between experimental and calculated |${\rm{\Delta \Delta }}G$|for three types of cross-validation tests for ‘Skempi’ (A) and ‘SkempiPI’ sets (B). See ‘Validation’ section for details.
Evaluating the performance of MutaBind using leave one complex or binding site out validation
Since the prediction accuracy of mutational effects largely depends on sequence and structure of a protein complex, we performed a ‘leave-one-complex-out’ procedure (‘CV4’ cross-validation). Namely, we trained the parameters on experimental |${\rm{\Delta \Delta }}G$| values of mutations from 79 protein complexes and then applied the model to mutations from the remaining one protein complex. This procedure was repeated for each complex. The Pearson correlation coefficient between experimental and computed |${\rm{\Delta \Delta }}G$| values using this procedure was 0.68 with RMSE of 1.41 kcal mol−1 (Figure 2A and Table 1). It yielded the following linear regression function: |${\rm{\Delta \Delta }}G_{{\rm exp}}$| = 1.21|$*{\rm{\Delta \Delta }}G_{{\rm MutaBind}} - 0.28$|, therefore the predicted values are almost on the same scale as experimental ones. The predictions achieved high accuracy for protease–inhibitor complexes (R = 0.76 and RMSE = 1.48 kcal mol−1) (Figure 2A and Table 1). In addition, we performed a ‘leave-one-binding-site-out’ validation (‘CV5’ cross-validation) where not only a complex in the validation set was removed from the training set, but also all other complexes with the identical/similar binding site. Namely, all ‘hold-out’ complexes which had identical/similar binding sites defined in the SKEMPI database were removed from the training set and then testing was performed on complexes which did not have similar binding sites (12). Even though the model was parameterized and tested using completely different non-overlapping sets of binding sites, the correlation between experimental and estimated values of binding affinity changes was still statistically significant with R = 0.57 and RMSE = 1.57 kcal mol−1 (Figure 2B and Table 1). Prediction errors for different types of mutations in ‘CV4’ set are shown in Supplementary Figure S5.
Correlation between experimental and predicted values of changes in binding affinity for all mutations in Skempi (black) and SkempiPI (orange) set using CV4 and CV5 cross-validations corresponding to ‘leave-one-complex-out’ (A) and ‘leave-one-binding-site-out’ (B) procedures, respectively.
Comparison of methods’ performances on Skempi and SkempiPI sets
Test set . | Method . | R . | RMSE (kcal mol−1) . | Slope . |
---|---|---|---|---|
Skempi | MutaBind(CV4) | 0.68 | 1.41 | 1.21 |
MutaBind(CV5) | 0.57 | 1.57 | 1.27 | |
BeAtMuSiC | 0.39 | 1.81 | 0.75 | |
FoldX | 0.40 | 2.12 | 0.41 | |
MMPBSA | 0.44 | 6.45 | 0.12 | |
SkempiPI | MutaBind (CV4) | 0.76 | 1.48 | 1.17 |
BeAtMuSiC | 0.44 | 2.16 | 0.88 | |
FoldX | 0.40 | 2.57 | 0.38 | |
MMPBSA | 0.65 | 5.66 | 0.27 |
Test set . | Method . | R . | RMSE (kcal mol−1) . | Slope . |
---|---|---|---|---|
Skempi | MutaBind(CV4) | 0.68 | 1.41 | 1.21 |
MutaBind(CV5) | 0.57 | 1.57 | 1.27 | |
BeAtMuSiC | 0.39 | 1.81 | 0.75 | |
FoldX | 0.40 | 2.12 | 0.41 | |
MMPBSA | 0.44 | 6.45 | 0.12 | |
SkempiPI | MutaBind (CV4) | 0.76 | 1.48 | 1.17 |
BeAtMuSiC | 0.44 | 2.16 | 0.88 | |
FoldX | 0.40 | 2.57 | 0.38 | |
MMPBSA | 0.65 | 5.66 | 0.27 |
R: Pearson correlation coefficient between experimental and predicted ΔΔG values. RMSE: root-mean square error. The last column shows the slope of the regression line between experimental and predicted ΔΔG values. All correlation coefficients are statistically significantly different from zero (P-value << 0.01).
Test set . | Method . | R . | RMSE (kcal mol−1) . | Slope . |
---|---|---|---|---|
Skempi | MutaBind(CV4) | 0.68 | 1.41 | 1.21 |
MutaBind(CV5) | 0.57 | 1.57 | 1.27 | |
BeAtMuSiC | 0.39 | 1.81 | 0.75 | |
FoldX | 0.40 | 2.12 | 0.41 | |
MMPBSA | 0.44 | 6.45 | 0.12 | |
SkempiPI | MutaBind (CV4) | 0.76 | 1.48 | 1.17 |
BeAtMuSiC | 0.44 | 2.16 | 0.88 | |
FoldX | 0.40 | 2.57 | 0.38 | |
MMPBSA | 0.65 | 5.66 | 0.27 |
Test set . | Method . | R . | RMSE (kcal mol−1) . | Slope . |
---|---|---|---|---|
Skempi | MutaBind(CV4) | 0.68 | 1.41 | 1.21 |
MutaBind(CV5) | 0.57 | 1.57 | 1.27 | |
BeAtMuSiC | 0.39 | 1.81 | 0.75 | |
FoldX | 0.40 | 2.12 | 0.41 | |
MMPBSA | 0.44 | 6.45 | 0.12 | |
SkempiPI | MutaBind (CV4) | 0.76 | 1.48 | 1.17 |
BeAtMuSiC | 0.44 | 2.16 | 0.88 | |
FoldX | 0.40 | 2.57 | 0.38 | |
MMPBSA | 0.65 | 5.66 | 0.27 |
R: Pearson correlation coefficient between experimental and predicted ΔΔG values. RMSE: root-mean square error. The last column shows the slope of the regression line between experimental and predicted ΔΔG values. All correlation coefficients are statistically significantly different from zero (P-value << 0.01).
Comparison of MutaBind with other methods
We compared our method with three other methods, BeAtMuSiC (7), MMPBSA (25) and FoldX (14). BeAtMuSiC is a machine learning method, which uses a combination of different statistical potentials to predict |${\rm{\Delta \Delta }}G$| values. It has the shortest processing time and calculation for one mutation takes less than a second on its web server. It has been shown to outperform many other approaches in the 26th round of CAPRI and this is the reason why we chose this method for comparison. The Molecular Mechanics Poisson–Boltzmann Surface Area (MMPBSA) method has been previously shown to yield good agreement with experimental studies in determining protein stability, binding affinity and ranking of docking templates (26,27). It should be mentioned that MMPBSA is not explicitly trained on any set to predict the effects of mutations on binding affinity. FoldX uses an empirical energy function, which is parametrized on experimental changes of unfolding free energy, it is fast and takes about 5 min for a protein of about 300 residues long. Although it is not parameterized to predict changes in binding energy, it is very powerful in predicting changes of unfolding free energy (14). We used FoldX to calculate the binding energy as |${\rm{\Delta \Delta }}G_{{\rm bind}} = {\rm{\Delta }}G_{{\rm fold}}^{{\rm com}} - {\rm{\Delta }}G_{{\rm fold}}^{{\rm part}1} - {\rm{\Delta }}G_{{\rm fold}}^{{\rm part}2}.$|
We then applied all methods to Skempi and SkempiPI sets and calculated Pearson correlation coefficients between experimental measurements (|${\rm{\Delta \Delta }}G_{{\rm exp}}$|) and predictions. Table 1 shows that MutaBind performs superior to other methods on these test sets in predicting quantitative values of ΔΔG as evident from the values of correlation coefficients and root-mean-square errors. It should be mentioned that machine learning methods assume that the conformation of a protein does not change upon mutation, although in many cases it does. MutaBind, on the other hand, does not make such assumption and simulates structures of mutant proteins. Previously, we developed a method for predicting binding affinity changes upon mutations which used a modified MMPBSA, statistical scoring energy functions and structure minimization protocol with explicit solvent model without restraints on the backbone atoms (5). This former protocol was time consuming although it used only five parameters in calculating binding energy differences. It was applied to predict the effects of cancer mutations on the binding between CBL ubiquitin ligase and E2 conjugating enzyme, where predicted binding affinity changes were successfully compared with the experiments in cancer and non-cancer cell lines (28). For comparison, MutaBind uses a 100-step energy minimization in the gas phase that considerably increases the calculation speed.
Evaluating the performance of MutaBind using CAPRI targets
We performed two other independent tests using a dataset from the 26th round of the blind prediction experiments CAPRI (11), which allowed us to directly assess the performance of our method in comparison with 22 other approaches. CAPRI set is composed of two targets (T55 and T56), de novo designed influenza inhibitors (HB36.4 and HB80.3) in complex with hemagglutinin (HA) (29). T55 includes 1007 mutations at 53 different positions (‘CAPRI1’) and T56 includes 855 mutations at 45 positions (‘CAPRI2’). These sets of experimental data include enrichment values calculated using deep sequencing by taking the binary logarithm of the ratio of number of times the variant sequence was observed after and before the selection for binding (29,30). Although the enrichment value is not a direct measurement of the change in binding affinity, these two measures are well correlated with each other (29). Protein complexes HB36.4-HA (T55) and HB80.3-HA (T56) have not been crystallized, however, structures of very close homologs are available. We built models for T55 and T56 protein complex structures by introducing one (N64K) mutation on HB36.3-HA crystal structure (PDB ID: 3R2X (31)) and five mutations (K12G, I17L, I21L, K35A and K42S) on HB80.4-HA crystal structure (PDB ID: 4EEF (29)), respectively. CAPRI test sets were not used in our model selection or parameterization.
We applied all methods to ‘CAPRI1’ and ‘CAPRI2’ sets and calculated Pearson and Kendall correlation coefficients between experimental measurements (enrichment values) and predictions. As can be seen on Figure 3, MutaBind compares very well to other methods as evident from the values of correlation coefficients and root-mean-square errors (Supplementary Table S3). It should be mentioned that none of the methods (including MutaBind) could achieve high prediction accuracy for ‘CAPRI1’ and ‘CAPRI2’ sets (Figure 3), probably because the enrichment value is not a direct measurement of binding affinity changes and T55 and T56 do not have wild-type crystal structures.
Kendall's tau rank correlation coefficients between predicted |${\rm{\Delta \Delta }}G$| and measured enrichment values for MutaBind, MMPBSA, FoldX, BeAtMuSiC and other 21 prediction methods of the 26th round of CAPRI experiments http://www.ebi.ac.uk/msd-srv/capri/round26/. MutaBind is shown in black.
Evaluating the performance of MutaBind to predict deleterious effects of mutations
The requirement to predict the quantitative values of binding affinity changes is rather stringent. A much easier task, attempted by many studies, is to classify mutations based on their effects into deleterious or neutral (see definition on Supplementary Figure S3 caption). Figure 4A and Supplementary Figure S3 demonstrate that the performance of MutaBind is notable in estimating deleterious effects (highly destabilizing) for all test sets and neutral effects for ‘Skempi’ and ‘CAPRI1’ sets (but not ‘CAPRI2’ set). It should be mentioned that since the number of highly stabilizing mutations was very small (Supplementary Figure S3d), the MutaBind prediction accuracy could not be reliably assessed for these mutations.
(A) ROC curves for predictions of deleterious mutations for different methods applied on Skempi set. AUC = 0.85, 0.81, 0.70, 0.72, 0.71 for MutaBind (cross-validation CV4), MutaBind (cross-validation CV5), BeAtMuSiC, MMPBSA and FoldX, respectively. The AUC statistic was calculated as an area under the curve. (B) Comparison of MutaBind with other methods for prediction of interfacial and non-interfacial mutations for Skempi set. Only statistically significant correlation coefficients are shown.
As was shown previously (5), mutations located on the interface region have on average larger effects on protein–protein interactions and are better predicted compared to non-interface mutations. Importantly, MutaBind yields statistically significant correlation for all targets in predicting non-interfacial mutations (Figure 4B and Supplementary Figure S4). As judged by the values of correlation coefficients, MutaBind is superior to BeAtMuSiC in this category, although the number of non-interfacial mutations with experimental values of |${\rm{\Delta \Delta }}G_{{\rm exp}}$| is also limited (Supplementary Figure S4d).
MutaBind classifies a mutation as deleterious if its predicted ΔΔG is higher or equal to 1.57 kcal mol−1. This threshold corresponds to 18% FPR and 82% TPR which minimizes the value of error ER =|$\sqrt {{{( {1 - {\rm TPR}} )}^2} + {\rm FP}{{\rm R}^2}} $| to compensate retrieval sensitivity and specificity. To define the confidence of prediction for deleterious interfacial mutations, we constructed ROC curves (Supplementary Figure S6) for predicted deleterious interfacial mutations and defined a deleterious mutation with high confidence if predicted |${\rm{\Delta \Delta }}$|G was higher or equal to 2.24 kcal mol−1 (corresponds to the minimum in ER) and low confidence deleterious mutation if |${\rm{\Delta \Delta }}G$| was lower than 2.24 and higher or equal to 1.57 kcal mol−1. Similarly, MutaBind defines a neutral interfacial mutation with high confidence if |${\rm{\Delta \Delta }}G$| is lower or equal to 0.86 kcal mol−1 and low confidence neutral interfacial mutation if predicted |${\rm{\Delta \Delta }}G$| is higher than 0.86 and lower than 1.57 kcal mol−1. We defined the confidence of all non-interfacial mutations as low.
WEB SERVER
Server input
The main requirement of the web server is the availability of 3D structure of a protein–protein complex. The users can either provide the protein PDB code, then structures of biological assemblies will be retrieved from the Protein Data Bank, or they can upload their own file with the coordinates. In either case, the structure file should contain at least two protein chains. If a protein complex is classified as protease–inhibitor, a special model optimized for the protease-inhibitor complexes can be specified in the option at the bottom of the entry page.
After the structure has been correctly retrieved, the server will display a 3D view of the complex colored by chains or partners using the GLmol software. Each chain is listed with the corresponding protein name. At the second step two interaction partners should be defined. The user can assign one chain or multiple chains to either Partner 1 or Partner 2, but both partners should include at least one chain. Only the selected chains of two partners will be taken into account during the calculation. Interacting partners are defined if the interface size between them is more than 100 Å2. Interface size is calculated as a difference between the solvent accessible surface areas of proteins in complex and unbound partners.
The third step is to select mutations (Figure 5). Up to 16 single mutations can be selected for one submission. Each mutation will be treated independently. After the chain and the mutated position are selected, they can be visualized in the wild-type complex using the 3D viewer.
Left corner: the entry page of MutaBind server; right corner: the third step for selecting mutations, wild-type residue (L45) in a mutated site is shown in the 3D viewer; and bottom: final results table and alignment of homologous binding sites.
Server output
For each mutation of a protein–protein complex, MutaBind server provides the following results:
ΔΔG (kcal mol−1), predicted change in binding affinity induced by mutation. Positive and negative signs correspond to destabilizing and stabilizing mutations predicted to decrease and increase binding affinity respectively.
Interface (yes/no), MutaBind defines a residue to be located on a protein–protein interface if residue's solvent accessibility in the complex is lower than in the corresponding unbound partners.
Deleterious (yes/no), MutaBind server classifies a mutation as deleterious if |${\rm{\Delta \Delta }}G$| is higher or equal to 1.57 kcal mol−1. This threshold corresponds to the minimum value of ER to compensate retrieval sensitivity and specificity.
Confidence (high/low), for deleterious classification of interfacial mutations. For non-interfacial mutations prediction confidence is listed as low.
Coordinates of the minimized mutant structure are provided for download.
Protein–protein binding sites in protein complexes homologous to the query are identified using Inferred Biomolecular Interactions Server at NCBI (IBIS) server (32). It allows testing mutations of aligned binding site residues in homologous proteins in MutaBind.
Results can be viewed directly on the browser (Figure 5) or downloaded as a plain text file.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Intramural Research Program of the National Library of Medicine at the U.S. National Institutes of Health (to M.L., A.G., A.R.P.); Argentine Government/BEC.AR (to F.L.S.); Argentine Fulbright Commission (to F.L.S.); National Science Foundation [NSF PHY11-25915, in part]. Funding for open access charge: National Institutes of Health.
Conflict of interest statement. None declared.
REFERENCES
Comments