-
PDF
- Split View
-
Views
-
Cite
Cite
Maria Kadukova, Karina dos Santos Machado, Pablo Chacón, Sergei Grudinin, KORP-PL: a coarse-grained knowledge-based scoring function for protein–ligand interactions, Bioinformatics, Volume 37, Issue 7, March 2021, Pages 943–950, https://doi.org/10.1093/bioinformatics/btaa748
- Share Icon Share
Abstract
Despite the progress made in studying protein–ligand interactions and the widespread application of docking and affinity prediction tools, improving their precision and efficiency still remains a challenge. Computational approaches based on the scoring of docking conformations with statistical potentials constitute a popular alternative to more accurate but costly physics-based thermodynamic sampling methods. In this context, a minimalist and fast sidechain-free knowledge-based potential with a high docking and screening power can be very useful when screening a big number of putative docking conformations.
Here, we present a novel coarse-grained potential defined by a 3D joint probability distribution function that only depends on the pairwise orientation and position between protein backbone and ligand atoms. Despite its extreme simplicity, our approach yields very competitive results with the state-of-the-art scoring functions, especially in docking and screening tasks. For example, we observed a twofold improvement in the median 5% enrichment factor on the DUD-E benchmark compared to Autodock Vina results. Moreover, our results prove that a coarse sidechain-free potential is sufficient for a very successful docking pose prediction.
The standalone version of KORP-PL with the corresponding tests and benchmarks are available at https://team.inria.fr/nano-d/korp-pl/ and https://chaconlab.org/modeling/korp-pl.
Supplementary data are available at Bioinformatics online.
1 Introduction
Binding processes at physiological conditions are driven by thermodynamics laws. Even though their physics is well understood at the theoretical level, practical application of these laws to computational docking problems requires exhaustive thermodynamic sampling. This makes most of the corresponding approaches computationally prohibitive. A popular alternative consists in avoiding exhaustive sampling and approximating binding free energies with knowledge-based and statistical potentials (Debroise et al., 2017; Huang and Zou, 2006, 2010; Kadukova and Grudinin, 2017; Neudert and Klebe, 2011; Velec et al., 2005; Verdonk et al., 2003; Chen, 2019). These are directly parameterized against available experimental data, rather than derived from the first principles. However, very often these potentials are not physical. For example, it is easy to demonstrate that the binding free energy cannot be decomposed into a sum of pairwise interactions, as the desolvation term is not pairwise-additive (Ben-Naim, 1997). Also, the performance of statistical potentials is much better in docking exercises and rather moderate in screening tests (Li et al., 2018; Su et al., 2019). These observations, as well as a moderate performance of classical statistical potentials on some of the popular docking benchmarks, both protein–protein and protein–ligand, have triggered further community research in multiple directions. They include the development of coarse-grained and orientation-dependent scoring functions (Elhefnawy et al., 2015; Karasikov et al., 2019; Lopez-Blanco and Chacon, 2019; Neudert and Klebe, 2011; Wang et al., 2013; Zhang and Zhang, 2010).
In addition to the knowledge-based potentials that are most often derived in a statistical and unsupervised manner, a considerable number of scoring functions are based on other principles (Liu and Wang, 2015; Shen et al., 2020). These include physics-based potentials (Brooks et al., 1983; Case et al., 2005; Ewing et al., 2001) that approximate energy terms and require very careful calibration, as well as a variety of scoring functions obtained using the principles of supervised machine learning. Starting from the classical empirical scoring functions that were trained to fit experimental binding constants with a linear combination of several physics-based descriptors (Böhm, 1994; Debroise et al., 2017; Friesner et al., 2006; Quiroga and Villarreal, 2016; Trott and Olson, 2010; Wang et al., 2002), more and more complex methods based on non-linear models and diverse descriptors have been developed (Ashtawy and Mahapatra, 2018; Jiménez et al., 2018; Karlov et al., 2020; Li et al., 2013; Lu et al., 2019; Ragoza et al., 2017; Shen et al., 2020; Wallach et al., 2015; Wang and Zhang, 2017). Although some of these often demonstrate high performance in affinity prediction and virtual screening, they are also subject to a number of flaws. Indeed, while classical statistical potentials tend to be biased toward the number of contacts between the two molecules, learning on a relatively small number of available high-quality binding constants introduces biases towards experimental affinities. Very complex models, especially those from deep learning, may also introduce overfitting. For example, some recent architectures demonstrate excellent results on the DUD-E virtual screening benchmark if they are trained on a part of it. However, their performance is rather average if they are trained on other data sources (Chen et al., 2019). Surprisingly enough, the classical empirical AutoDock Vina scoring function and its modifications, while being physically interpretable, still achieve stable state-of-the-art results in both pose and affinity predictions.
Protein–ligand methods usually describe molecules using the all-atom representation. Therefore, incorrect positioning of sidechains inside the binding pockets may introduce steric clashes with the ligands and produce false-positive predictions of binding poses. Some of the methods can include optimization of the sidechains in the conformation search (DeLuca et al., 2015; Marze et al., 2018; Trott and Olson, 2010), but this makes the docking process much more computationally expensive. Furthermore, slight inaccuracies in the positioning of the backbone atoms may introduce significant inaccuracies in the positions of the sidechains. A possible way to circumvent this problem is to model a protein molecule without explicit positioning of its sidechains. Indeed, such representations have already been successfully used in various protein structure prediction applications (Karasikov et al., 2019; Kryshtafovych et al., 2019; Liwo et al., 2002; Lopez-Blanco and Chacon, 2019; Senior et al., 2019; Zheng et al., 2019).
Motivated by the excellent results obtained in protein and loop modeling with a sidechain-independent potential KORP (Lopez-Blanco and Chacon, 2019), we propose to adapt its methodology to protein–ligand interactions. The success of KORP is rooted in the consideration of the full six-dimensional (6D) joint probability distribution function that only depends on the relative orientation between protein residues. For the protein–ligand interactions, we reduce the dependence of the pairwise potential to a 3D joint probability of observing an interacting ligand atom at a given relative position and orientation from a protein residue. The proposed method, called KORP-PL, does not require protein sidechain atoms, and only three backbone atoms of the protein residue are needed. As a result, it is relatively fast, as each interaction involves only the computation of two spherical angles and a single distance. Despite its seeming simplicity, our approach yields state-of-the-art results.
2 Materials and methods
2.1 The KORP-PL model

Schematic view of the relative orientation of a ligand molecule to a protein residue. The residue is represented with a 3D oriented frame built from three backbone atoms. The relative orientation of a ligand atom is described by two spherical angles, θ, the polar angle between the r and z vectors, and , the azimuthal angle between x and the projection of r into the xy plane
The total number of protein residue types is equal to 20 and corresponds to the 20 standard amino acids. The set of 37 ligand atom types comprises 8 carbon types, 12 nitrogen types, 7 oxygen types, 4 sulfur types, 2 phosphorus types and 4 types describing halogens (see Supplementary Table S1 for more details). Each ligand atom type is assigned using the Knodle library (Kadukova and Grudinin, 2016) in the same manner as we did for the Convex-PL scoring function (Kadukova and Grudinin, 2017).
2.2 Training data
We derived KORP-PL using structures of protein–ligand complexes deposited in the PDBBind 2016 general dataset (Wang et al., 2005). We excluded 373 structures intersecting with those from the CASF-2013 and CASF-2016 benchmarks. This resulted in 12 910 selected examples. Also, there were no intersections between PDBBind 2016 and examples from the D3R challenges that we use to compile our benchmark. We did not specifically preprocess the input structures. We did not remove homologous receptor structures since their bound ligands can be very diverse. In fact, previously we did not find any effect of excluding structures in the training set homologous to the ones in the test set on the prediction accuracy (Kadukova and Grudinin, 2017). Nonetheless, we provide additional computational experiments excluding the test set structures from the training set at different levels of similarity.
We collected the statistics using interactions within the range of radial distances r of (2 Å, 11 Å). This statistics were divided into 12 bins. The angular statistics were collected into 180 equiareal bins using a uniform angular sampling tessellation described elsewhere (Beckers and Beckers, 2012).
2.3 Reweighing the potential for binding affinity predictions
2.4 CASF benchmarks
We assessed KORP-PL on a recent CASF-2016 benchmark (Su et al., 2019), and a smaller but more widely used CASF-2013 benchmark (Li et al., 2018). These benchmarks are the sets of, respectively, 285 and 195 high-quality crystal structures with the corresponding binding affinities. Four different metrics are used in these benchmarks defined as docking power, scoring power, ranking power and screening power. Docking power corresponds to the ability of a scoring function to predict the native or the best near-native docking pose among a set of computer-generated configurations. Scoring functions are evaluated by the number of the top-ranked predictions (top-1, top-2 and top-3) below a predefined cutoff distance from the crystal structure (1.0, 2.0 and 3.0 Å). Scoring and ranking powers measure the quality of affinity prediction of complexes with known co-crystal structures. Scoring power assesses the correlation of scoring function predictions with the experimental binding affinity data. Ranking power is related to the capability of a scoring function to correctly rank a set of known ligands for a target protein. In CASF-2016, where five known ligands are available for each target protein, it is measured by Spearman’s correlation coefficient. However, in CASF-2013, only three ligands per protein are available and ranking power is represented with two numbers characterizing success rates of either correct ranking of all the given ligands, or finding the most affine one. Finally, screening power is related to the ability of a scoring function to identify true binders for a target protein among a set of small molecules. CASF benchmarks suggest two metrics to evaluate this ability. Enrichment factor (EF) is calculated as a ratio between the total number of true binders observed among a fraction of top-ranked candidates (1%, 5% and 10%) and the total number of true binders multiplied by this fraction. It represents the ability of a scoring function to correctly find active compounds compared to a random selection. ‘Best binder success rate’ is a success rate of identifying the highest-affinity binder among the 1%, 5% or 10% of top-ranked ligands over all the test cases.
2.5 D3R benchmarks
A number of community-wide blind protein–ligand docking challenges were held throughout recent years. For example, the CSAR (Carlson et al., 2016) initiative was carried out in 2010–2014. Later on, it was continued and further developed by the Drug Design Data Resource (D3R) (Gathiaka et al., 2016). The aim of these challenges was the evaluation of docking protocols on previously unpublished structural data. After all participants have submitted their predictions, co-crystal structures become revealed and submissions get evaluated. A considerable effort was made by the D3R community to host data from the previous challenges. In particular, this resource contains all user submissions and answers, i.e. native structures and binding constants, from the recent three blind challenges, namely Grand Challenge 2 (Gaieb et al., 2018), Grand Challenge 3 (Gaieb et al., 2019) and Grand Challenge 4 (Parks et al., 2020). Unfortunately, user submission data from the first D3R challenges is not publicly available.
Thus, we compiled a benchmark from the user submissions and published answers of the three blind challenges. Similar to the CASF benchmarks, it contains pose and affinity prediction exercises. However, this benchmark is different from CASF in several aspects. Unlike the CASF benchmarks, which were created from the data deposited in the Protein Data Bank (Rose et al., 2017), experimental data for each of the D3R challenge targets were provided by a single research group. Co-crystal structures were also visually inspected by the challenge organizers and participants. This allows us to expect higher quality and consistency of this data, especially for the binding constants, which are less trustworthy in the CASF benchmarks, and PDBBind in general. On the contrary, data from the D3R Challenges provides smaller diversity of both proteins and small molecules, since each of the three challenges was focused on one protein target binding with compounds of several chemical series. For example, the affinity prediction test made from the D3R Challenge data is closer to the CASF ranking test than to the scoring one.
For the pose prediction tests, we collected all available user submissions from the pose prediction stages of the three challenges. RMSD values were obtained from the D3R website when possible, otherwise, we computed them using a modified version of symmetry-adapted RDKit’s GetBestRMS() function, in which we disabled the ligand alignment, and PyMol’s (Schrödinger, 2020) align function to superpose each protein to its native structure. We excluded several submissions listed in Supplementary Table S15 because of various errors and clustered the rest of submissions with a 0.1 Å threshold without the binding pocket alignment. This was mainly done to remove very similar or equivalent docking poses that were often present in submissions from the same users. Finally, we measured the pose prediction success rates on each test separately with and without the inclusion of the native structures. For the affinity prediction tests, we selected only the native structures and then measured the Spearman’s correlation coefficients between predicted and experimental binding constants for each of the Grand Challenges. When the ligand was present in several chains of the co-crystal structure, we scored all of the available complexes and took the average. The number of available submissions and binding constants is summarized in Supplementary Table S14.
2.6 DUD-E benchmark
The DUD-E benchmark (Mysinger et al., 2012), the successor of the DUD benchmark (Huang et al., 2006), is a very popular approach for assessing virtual screening abilities of various scoring functions and docking protocols. It consists of 102 targets, a set of active compounds per target known to bind it, and 50 inactive compounds, or decoys, per each active one. The total number of active compounds for all 102 targets equals to 22 886. For each target, one protein–ligand complex is provided and can be used for the identification of the binding pocket and molecular docking. The benchmark also contains 3D conformers of all the active and inactive compounds. Unlike the CASF benchmarks and the D3R-based benchmark that we have derived specifically for structure-based scoring functions assessment, evaluation on DUD-E requires a pose sampling stage. Therefore, we first performed molecular docking using AutoDock Vina with default settings except for the exhaustiveness that was set to 10, and then re-scored the obtained poses with KORP-PL and KORP-PLw.
We should note that the DUD-E benchmark contains several targets with co-factors that seem to be crucial for binding. We have excluded from the evaluation 12 complexes listed in Supplementary Table S22 that contain HEM, NAD, NAP, FAD, ADP and FMN, since KORP-PL is not parametrized to predict interactions with co-factors.
3 Results and discussion
3.1 CASF benchmarks
Figure 2 shows the results obtained on the docking, scoring, ranking and virtual screening tests from the CASF-2016 benchmark. The results obtained on the exercises from the CASF-2013 can be found in Supplementary Figure S1 and Supplementary Tables S3–S5. We can see that KORP-PL performs exceptionally well in the pose prediction exercise, despite being a coarse-grained scoring function. Indeed, for the CASF-2016 benchmark, its success rate in finding a near-native pose within 2 Å RMSD as the best prediction is 85.6%. This is better than the success rates of all other tested scoring functions.

CASF-2016 benchmark results. (a) The success rate of finding a near-native pose within 2 Å RMSD in Top 1 (blue), Top 2 (green) and Top 3 (yellow) predictions. Native poses are excluded. (b) Pearson’s correlation with confidential values between predicted scores and experimental . Scoring functions sharing the same gray bar are not distinguishable at in the post-hoc Friedman test (Su et al., 2019). (c) Spearman’s rank correlation with confidential values among the 57 clusters. (d) EFs computed considering 1% (blue), 5% (green) and 10% (yellow) of the top-ranked compounds. (e) The success rate of identifying the highest-affinity binder among the 1% (blue), 5% (green) or 10% (yellow) top-ranked ligands. All results except KORP-PL and Convex-PL were taken from the Supplementary Information of the CASF-2016 benchmark paper (Su et al., 2019). The results of KORP-PL, KORP-PLw and Convex-PL are available in Supplementary Tables S3–S5
Figure 2d,e demonstrates the top-ranked performance of KORP-PL in both screening tests. These results are especially notable if considering the EF metric, where all other tested scoring functions perform rather poorly. For example, CASF-2016 Top1% EF for KORP-PL is 22.23, while the third-best Top1% EF is 11.91 for ChemPLP@GOLD. Figure 2b compares KORP-PL binding affinity predictions. They turned out to be worse than average. As a consequence, ranking power results (Fig. 2c) are also worse or close to average when compared with the other scoring functions. To investigate the reasons leading to such rather poor performance, we plotted binding affinities predicted by KORP-PL versus the experimental binding constants. Figure 3 shows them colored according to the hydrophobic scale of the protein binding pockets suggested by Su et al. (2019). We can see that KORP-PL often underestimates affinity values for complexes with hydrophobic pockets. We suppose that it happens due to the way we compute the reference state inherited from the original KORP 6D potential. Indeed, the 6D residue–residue interactions have a strong angular dependence, which is not the case for the protein–ligand setting. For example, the subtraction of the angular average in Eq. 2 will result in a near-zero potential for non-directional contributions. This is precisely the case for some of the hydrophobic interactions. It motivated us to introduce the reweighing scheme (see Eq. 4), which allowed us to partially compensate for this effect. Indeed, the KORP-PLw potential performed considerably better than KORP-PL on the scoring tests. However, its performance is still far from perfect and this is a subject for further investigation. We should also note that moderate performance of various scoring functions in affinity prediction tasks can be partially explained by the fact that experimental uncertainties of binding affinity data in current databases are often larger than one order of magnitude (Wätzig et al., 2015). Such significant scatter is the result of different methodologies and accuracy of binding assays used in different research groups. Supplementary Table S13 contains further analysis of the correlation between the KORP-PL scores and a number of ligand properties computed for the CASF-2016 complexes.

Scatter plot of KORP-PL scores versus the constants from CASF-2016 benchmark. Each point is colored according to the hydrophobicity of the protein binding pocket (H-scale, in logD units) as defined in Su et al. (2019). The Pearson correlation coefficients between KORP-PL scores and constants, computed for three different H-scale groups, are: 0.63 for H-scales between -0.80 and -0.35, 0.45 for H-scales between -0.35 and -0.15, and 0.31 for H-scales between -0.15 and 0.8
CASF benchmarks are derived from the PDBBind database and contain complexes similar to our training set. Thus, it is interesting and important to learn how much our results can overfit the input data. Therefore, we ran additional experiments and modified the training set by augmenting it with the intersection with the test set, and also removing a number of complexes based on the protein (Ritchie et al., 2012; Zhang and Skolnick, 2004) and ligand (Landrum, 2006) shape similarity. After, we recomputed the CASF docking and screening tests to investigate the possible overfitting. These results are listed in Supplementary Tables S6–S12 and discussed in Supplementary Information. Overall, removing the closest complexes (pocket TM-score > 0.8 and ligand shape Jaccard distance < 0.2) affects the metrics only marginally. Further elimination of about a thousand of more distant complexes (pocket TM-score > 0.5 and ligand shape Jaccard distance < 0.4) worsens the overall performance. Notably, high-quality docking predictions (Q1) are affected more than the low-quality ones (Q2–3). This indicates that for a successful high-resolution pose prediction, the training set must contain complexes with interactions that somewhat resemble those in the test set. Indeed, any statistical (Boltzmann in our case) approximation is limited if some features are not present or their distribution is unbalanced in the training set.
3.2 D3R benchmarks
Figure 4 demonstrates very good performance of KORP-PL in all pose prediction exercises derived from the D3R Challenges. KORP-PL also showed good results in the Grand Challenge 2 and Grand Challenge 4 affinity ranking tasks. However, we obtained near-zero correlations in affinity prediction of the cathepsin S complexes from the Grand Challenge 3.

D3R pose prediction and scoring results. Success rates of finding a pose within 2 Å RMSD from the native conformation among the 1%, 5% and 10% of top-ranked poses are shown in blue, green and yellow, respectively. Scoring power is represented by the Spearman’s correlation coefficient between the predicted and experimental binding constants. These success rates are computed with respect to the actual number of ligands, for which the poses with the desired RMSD values were present in the user submissions. Due to this fact, for example, the KORP-PL success rate in Grand Challenge 2 is higher when the native poses are excluded. The results of KORP-PL, KORP-PLw, Convex-PL, AutoDock Vina and ΔSAS evaluation are listed in Supplementary Tables S16–S21. The pose prediction stage of all the three challenges was called ’Stage 1’, the affinity prediction stage was called ’Stage 2’. However, receptor flexibility turned out to be a considerable issue for many approaches (Kadukova and Grudinin, 2018), and in both Grand Challenge 3 and Grand Challenge 4, Stage 1 was subdivided into Stage 1a, where neither ligand, nor receptor 3D structure was known, and Stage 1 b, where the receptor 3D structure was revealed. We evaluated these stages separately
D3R Grand Challenge 3 pose prediction test turned out to be an interesting case. In this exercise, the binding site is exposed to solvent and is surrounded by water molecules in the co-crystal structure as well as in some of the user submissions. We should specifically mention that we do not consider explicit water molecules. KORP-PL showed excellent results in the pose prediction exercise compared to AutoDock Vina and Convex-PL scoring functions. Although we cannot directly compare the pose prediction results with the full protocols evaluated in the challenge, only a few of them were successful, especially if no visual inspection and ligand-based methods were used (Gaieb et al., 2019). This means that the selection of correct binding poses for the cathepsin S inhibitors could be a challenge for many scoring functions. For example, as shown in Figure 4, Convex-PL failed in many cases to detect the correct binding mode, while AutoDock Vina and the simplistic ΔSAS were almost completely incapable of doing it. This could be caused by a combination of the following reasons. First, we believe that by its design, KORP-PL is able to better catch directed interactions from target complexes, such as hydrogen and halogen bonding, and π-stacking (Salentin et al., 2015). Second, all the incorrect poses are located deeper in the binding pocket, forming more contacts than the native conformation. Most of the scoring functions tend to be biased towards the total number of protein–ligand contacts, which could lead to incorrect predictions for Convex-PL, Vina and ΔSAS. As we have already discussed, KORP-PL underestimates some of the non-orientational hydrophobic interactions. In this particular case of D3R Grand Challenge 3, it helps to predict ligand positions that are not very buried in protein pockets.
3.3 DUD-E benchmark
To evaluate the performance of KORP-PL in large-scale virtual screening tasks, we have assessed it on 90 targets from the DUD-E benchmark. As shown in Table 1 and Figure 5, KORP-PL and KORP-PLw outperform AutoDock Vina in all the metrics, being almost twice better if considering the EFs. This makes KORP-PL comparable to some recent structure-based deep-learning models that demonstrate excellent virtual screening performance (Ragoza et al., 2017). However, according to Chen et al. (2019) that we have mentioned in the introduction, such scoring functions tend to achieve high performance on the DUD-E benchmark only if they have been originally trained on it, and thus probably learn hidden biases such as the decoy selection criteria that were used upon the benchmark construction.

ROC AUC scores and 5% EFs computed for the 90 targets from the DUD-E dataset with KORP-PL and AutoDock Vina
ROC AUC scores, 5% EFs, and BEDROC (Truchon and Bayly, 2007) values computed for the 90 targets from the DUD-E dataset
Scoring function . | ROC AUC . | EF5% . | BEDROC . | |||
---|---|---|---|---|---|---|
Median . | Average . | Median . | Average . | Median . | Average . | |
AutoDock Vina | 0.731 | 0.714 | 3.691 | 4.528 | 0.234 | 0.264 |
KORP-PL | 0.816 | 0.785 | 9.083 | 8.637 | 0.502 | 0.472 |
KORP-PLw | 0.818 | 0.786 | 8.839 | 8.423 | 0.458 | 0.465 |
Scoring function . | ROC AUC . | EF5% . | BEDROC . | |||
---|---|---|---|---|---|---|
Median . | Average . | Median . | Average . | Median . | Average . | |
AutoDock Vina | 0.731 | 0.714 | 3.691 | 4.528 | 0.234 | 0.264 |
KORP-PL | 0.816 | 0.785 | 9.083 | 8.637 | 0.502 | 0.472 |
KORP-PLw | 0.818 | 0.786 | 8.839 | 8.423 | 0.458 | 0.465 |
Note: Twelve targets with co-factors in the binding pocket were excluded from the 102 original targets. It is important to note here that our results for AutoDock Vina are slightly lower than those reported in Ragoza et al. (2017), where the median and average ROC AUC, and median and average EF5% are equal to 0.740, 0.717, 4.228 and 4.485, respectively. This could be caused by the differences in the binding pocket detection or other docking protocol settings. Per-target evaluation results can be found in Supplementary Table S23.
ROC AUC scores, 5% EFs, and BEDROC (Truchon and Bayly, 2007) values computed for the 90 targets from the DUD-E dataset
Scoring function . | ROC AUC . | EF5% . | BEDROC . | |||
---|---|---|---|---|---|---|
Median . | Average . | Median . | Average . | Median . | Average . | |
AutoDock Vina | 0.731 | 0.714 | 3.691 | 4.528 | 0.234 | 0.264 |
KORP-PL | 0.816 | 0.785 | 9.083 | 8.637 | 0.502 | 0.472 |
KORP-PLw | 0.818 | 0.786 | 8.839 | 8.423 | 0.458 | 0.465 |
Scoring function . | ROC AUC . | EF5% . | BEDROC . | |||
---|---|---|---|---|---|---|
Median . | Average . | Median . | Average . | Median . | Average . | |
AutoDock Vina | 0.731 | 0.714 | 3.691 | 4.528 | 0.234 | 0.264 |
KORP-PL | 0.816 | 0.785 | 9.083 | 8.637 | 0.502 | 0.472 |
KORP-PLw | 0.818 | 0.786 | 8.839 | 8.423 | 0.458 | 0.465 |
Note: Twelve targets with co-factors in the binding pocket were excluded from the 102 original targets. It is important to note here that our results for AutoDock Vina are slightly lower than those reported in Ragoza et al. (2017), where the median and average ROC AUC, and median and average EF5% are equal to 0.740, 0.717, 4.228 and 4.485, respectively. This could be caused by the differences in the binding pocket detection or other docking protocol settings. Per-target evaluation results can be found in Supplementary Table S23.
3.4 Computational details
KORP-PL is implemented in C++ and available as a binary for macOS and Linux operating systems. It takes about 25 ms on a single core of Linux Intel(R) Xeon(R) CPU E5-2609 @ 2.40 GHz to score a protein–ligand complex from the CASF-2013 core set containing a single ligand pose of 25 heavy atoms on average. However, energy computation itself takes only 2 ms and the rest of the runtime is spent on the complex file parsing. As the method does not require positions of the sidechain atoms, it can be readily applied to scoring protein models represented only by their backbones.
4 Conclusion
This article presents KORP-PL—a novel knowledge-based scoring function for protein–ligand interactions based on the backbone-only receptor and full-atom ligand representations. The receptor representation is adopted from the KORP scoring function, which was designed to model interactions in a protein molecule with a set of oriented coordinate frames built on each protein residue. KORP-PL interaction potential is then derived using statistics of relative orientations and positions of ligand atoms in the local coordinate systems of protein residues. We have demonstrated for the first time that a coarse-grained sidechain-free protein representation can be successfully used for very accurate predictions of ligand binding poses. Indeed, KORP-PL shows excellent pose prediction and screening results in CASF-2013 and CASF-2016 benchmarks, and even in pose prediction benchmarks compiled from the D3R Grand Challenges. KORP-PL also demonstrates outstanding results in the DUD-E virtual screening benchmark, where it considerably outperforms AutoDock Vina. Our affinity prediction performance is, however, lower than average, and much more work is required to advance developments in this direction. Overall, this work proposes a very efficient solution to circumvent the long-standing problem of sampling protein sidechain conformations in molecular docking. This paves the way for the development of a new generation of flexible docking approaches.
Funding
This work was partially supported by the Russian Foundation for Basic Research (RFBR), project #18-54-00030, Belarusian Republican Foundation for Fundamental Research (BRFFR), project #X18P-098, Spanish grants BFU2016-76220-P and PID2019-109041GB-C21 (AEI/FEDER, UE), and Inria associate team Flexmol. K.S.M. was supported by CAPES grant PE 88881.207869/2018-01.
Conflict of Interest: none declared.