- Split View
-
Views
-
Cite
Cite
Aeri Lee, Dongsup Kim, CRDS: Consensus Reverse Docking System for target fishing, Bioinformatics, Volume 36, Issue 3, February 2020, Pages 959–960, https://doi.org/10.1093/bioinformatics/btz656
- Share Icon Share
Abstract
Identification of putative drug targets is a critical step for explaining the mechanism of drug action against multiple targets, finding new therapeutic indications for existing drugs and unveiling the adverse drug reactions. One important approach is to use the molecular docking. However, its widespread utilization has been hindered by the lack of easy-to-use public servers. Therefore, it is vital to develop a streamlined computational tool for target prediction by molecular docking on a large scale.
We present a fully automated web tool named Consensus Reverse Docking System (CRDS), which predicts potential interaction sites for a given drug. To improve hit rates, we developed a strategy of consensus scoring. CRDS carries out reverse docking against 5254 candidate protein structures using three different scoring functions (GoldScore, Vina and LeDock from GOLD version 5.7.1, AutoDock Vina version 1.1.2 and LeDock version 1.0, respectively), and those scores are combined into a single score named Consensus Docking Score (CDS). The web server provides the list of top 50 predicted interaction sites, docking conformations, 10 most significant pathways and the distribution of consensus scores.
The web server is available at http://pbil.kaist.ac.kr/CRDS.
Supplementary data are available at Bioinformatics online.
1 Introduction
Target identification is a key early step for discovering clinically relevant targets of chemical compounds in the field of drug discovery and development (Chan et al., 2010; Schenone et al., 2013). Although high-throughput experimental techniques are becoming available, an experimental procedure is time-consuming and expensive endeavor. Accordingly, there has been an urgent need for developing a practical computational tool to investigate a small molecule by identifying its interaction sites and some web tools are available (Peon et al., 2019).
Inverse or reverse docking is a powerful technique for in silico target fishing against ligands in a database of target proteins (Lee et al., 2016). The objective of reverse docking is to predict true targets among many clinically relevant protein targets. However, it has been known that the scoring functions of current docking programs have scoring bias toward the proteins with certain properties, which hinders accurate retrieval of target structures in reverse docking (Luo et al., 2017).
One way to unravel this problem is to employ machine-learning scoring functions (Wojcikowski et al., 2017; Yasuo and Sekijima, 2019). Another approach is to exploit consensus scoring method (Luo et al., 2017). Consensus scoring evaluates poses of the docked ligand with multiple scoring functions and combines the docking scores to improve the success rates. It has been reported that applying consensus scoring scheme which is incorporating with dissimilar types of scoring functions has proven to perform better than using a single scoring function (Cheng et al., 2009). Hence, an increased probability of the ratio of true targets can be expected by using multiple scoring functions if one wants to identify targets for a compound of interest by applying docking.
Consequently, we have constructed a web-based server named Consensus Reverse Docking System (CRDS), which conducts quantitative screening of ligand interaction sites by reverse docking using consensus scoring and provides ranks with docked ligand–receptor structures, ranks of three of each algorithms, pathway analysis results and the complete set of consensus scores (see Supplementary Fig. S1).
2 Materials and methods
2.1 Consensus Docking Score
We adopted three types of scoring functions, which are GoldScore from GOLD version 5.7.1 (a force field-based) (Verdonk et al., 2003), Vina from AutoDock Vina version 1.1.2 (a combination of empirical and knowledge-based) (Trott and Olson, 2010) and LeDock from LeDock version 1.0 (a combination of physics and knowledge-based) (Wang et al., 2016). To combine three docking values into a single score named Consensus Docking Score (CDS), we first normalized the docking scores derived from each scoring methods using min-max scaling approach, and the sum of the normalized three docking values were arranged in descending order (see Supplementary Fig. S2).
2.2 Target database
It is desirable to execute reverse docking in a large number of diverse target space. We were able to build a human protein target database resulting in a total of 5254 druggable binding sites from the sc-PDB (resolution < 2.5 Å) (Desaphy et al., 2015). The analysis on the frequency of unique UniProt IDs showed that these 5254 protein structures consisted of 869 different UniProt IDs. For more detailed results, see Supplementary Figs S9 and S10.
3 Validation results
Performances of our server were validated in two different aspects, target fishing and virtual screening. We first demonstrated that consensus scoring scheme was able to retrieve more number of known target proteins within top 10 highest scoring proteins than each individual scoring functions [CDSs (n = 242), GOLD (n = 119), Vina (n = 123) and LeDock (n = 186)] when tested on 122 ligands with 6365 known targets compiled from DrugBank (http://www.drugbank.ca) and BindingDB (http://www.bindingdb.org) (see Supplementary Fig. S3 and Table S1). Another experiment to evaluate the reliability of the consensus scores to perform virtual screening using DUD-E dataset showed that the CDS achieved the highest area’s under the curve scores (0.77) when compared to three exiting scoring functions (see Supplementary Fig. S4). Furthermore, docking-based target prediction approach is most useful for targets with little ligand information because similarity-based methods such as quantitative structural activity relationship cannot be applied to those cases. Therefore, we looked for such cases and demonstrated that our docking-based consensus scoring method was effective for those targets with little ligand information (see Supplementary Material).
4 Web server
4.1 Input
The input window in our job submission page requires a job name, an email address and an ID from public chemical compound databases. A Tripos Mol2 file (mol2) format or a Structure Data File (sdf) format of a newly synthesized small molecule or a natural compound can be uploaded. Currently, the amount of time necessary to complete a job varies from 7 to up to 20 h depending on the molecular size and the loading of the server. Users can monitor the progress of their job on ‘Queue’ page.
4.2 Output
The web link to the results is reported to the user via email or through ‘Queue’ page. The first result section delineates the top 50 predicted interaction sites along with their corresponding PDB IDs, the CDSs, the ranks of Gold, Vina and LeDock, UniProt IDs, gene symbols and description of PDBs. The visualization buttons for binding pose of the ligand are provided. In addition, all complex structures are downloadable. The second section presents the top 50 predicted interaction sites of each algorithms along with their docking types, docking scores, PDB IDs, UniProt IDs, gene symbols and description of PDBs. The third section displays the pathway frequencies that are based on the mapping analysis of UniProt IDs of top 50 structures to pathway data in Reactome (http://reactome.org/) (Fabregat et al., 2018). The 10 most meaningful pathways that the predicted 50 gene sets are involved in are illustrated on a pie chart. The fourth result section shows a total distribution of consensus scores.
5 Conclusion
We developed a large scale of predictive modeling tool named CRDS through the implementation of reverse docking with consensus scoring which can help finding probable interaction sites of small molecules such as existing drugs and natural products. We expect that the predicted drug interaction sites can be prioritized for identification of novel binding sites or used in extended applications for drug repurposing or adverse drug effect investigation.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grants (2017M3A9C4065952, 2019R1A2C1007951) funded by the Korea Government (MSIT).
Conflict of Interest: none declared.
References