AutoDock CrankPep: combining folding and docking to predict protein–peptide complexes

Zhang, Yuqi; Sanner, Michel F

doi:10.1093/bioinformatics/btz459

Abstract

Motivation

Protein–peptide interactions mediate a wide variety of cellular and biological functions. Methods for predicting these interactions have garnered a lot of interest over the past few years, as witnessed by the rapidly growing number of peptide-based therapeutic molecules currently in clinical trials. The size and flexibility of peptides has shown to be challenging for existing automated docking software programs.

Results

Here we present AutoDock CrankPep or ADCP in short, a novel approach to dock flexible peptides into rigid receptors. ADCP folds a peptide in the potential field created by the protein to predict the protein–peptide complex. We show that it outperforms leading peptide docking methods on two protein–peptide datasets commonly used for benchmarking docking methods: LEADS-PEP and peptiDB, comprised of peptides with up to 15 amino acids in length. Beyond these datasets, ADCP reliably docked a set of protein–peptide complexes containing peptides ranging in lengths from 16 to 20 amino acids. The robust performance of ADCP on these longer peptides enables accurate modeling of peptide-mediated protein–protein interactions and interactions with disordered proteins.

Availability and implementation

ADCP is distributed under the LGPL 2.0 open source license and is available at http://adcp.scripps.edu. The source code is available at https://github.com/ccsb-scripps/ADCP.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Protein–peptide interactions are essential to many biological functions (Petsalaki and Russell, 2008). Thus, peptide-based therapeutic approaches have recently attracted increasing interest (Fosgerau and Hoffmann, 2015; Lau and Dunn, 2018). Moreover, many protein–protein interactions, especially the ones involving intrinsically disordered proteins, are mediated by peptide-like segments (Stein and Aloy, 2008; Wright and Dyson, 2015). Predicting protein–peptide interactions using automated docking methods remains challenging mainly due to the significantly larger number of rotatable bonds in peptides, making them more flexible than small drug-like molecules. Small molecule docking methods have been shown to perform rather poorly for peptides longer than five amino acids (Hauser and Windshügel, 2016; Rentzsch and Renard, 2015). Meanwhile, efforts have been put into developing accurate and efficient peptide docking methods (Ciemny et al., 2018; London et al., 2013). These methods can be segregated into the following three categories: template docking, ensemble docking and de novo methods (see Table 1).

Table 1.

Summary of three categories of peptide docking methods

Category	Flexibility	Description	Examples
Template docking	None or little	Use sequence-based homology model to predict docking poses	GalaxyPepDock
Ensemble docking	Conformation ensemble	Prepare a conformation ensemble to describe peptide flexibility and then dock the conformations back into receptor	HPepDock MDockPep pepATTRACT
De novo method	Fully flexible	Model peptide flexibility with the respect to the receptor	*ADCP* AnchorDock CABS-Dock FlexPepDock HADDOCK

Category	Flexibility	Description	Examples
Template docking	None or little	Use sequence-based homology model to predict docking poses	GalaxyPepDock
Ensemble docking	Conformation ensemble	Prepare a conformation ensemble to describe peptide flexibility and then dock the conformations back into receptor	HPepDock MDockPep pepATTRACT
De novo method	Fully flexible	Model peptide flexibility with the respect to the receptor	*ADCP* AnchorDock CABS-Dock FlexPepDock HADDOCK

Open in new tab

Table 1.

Summary of three categories of peptide docking methods

Category	Flexibility	Description	Examples
Template docking	None or little	Use sequence-based homology model to predict docking poses	GalaxyPepDock
Ensemble docking	Conformation ensemble	Prepare a conformation ensemble to describe peptide flexibility and then dock the conformations back into receptor	HPepDock MDockPep pepATTRACT
De novo method	Fully flexible	Model peptide flexibility with the respect to the receptor	*ADCP* AnchorDock CABS-Dock FlexPepDock HADDOCK

Category	Flexibility	Description	Examples
Template docking	None or little	Use sequence-based homology model to predict docking poses	GalaxyPepDock
Ensemble docking	Conformation ensemble	Prepare a conformation ensemble to describe peptide flexibility and then dock the conformations back into receptor	HPepDock MDockPep pepATTRACT
De novo method	Fully flexible	Model peptide flexibility with the respect to the receptor	*ADCP* AnchorDock CABS-Dock FlexPepDock HADDOCK

Open in new tab

The success of template docking methods for docking peptides (Lee et al., 2015) depends on the availability of homologue structures for both the receptor and the peptide, thus limiting the range of their applicability. Ensemble docking methods sample peptide conformations as a pre-processing step without knowledge of the receptor. Next, these conformations are docked rigidly or semi-rigidly into the receptors (Schindler et al., 2015; Yan et al., 2016; Zhou et al., 2018a, b). While, these methods yield good accuracy for small and medium sized peptides (typically ≤ 9 amino acids), their success rates tend to drop rapidly with longer peptides. Finally, de novo methods sample the peptide’s conformation on-the-fly during the docking (Alam et al., 2017; Ben-Shimon and Niv, 2015; Kurcinski et al., 2015; Raveh et al., 2011; Trellet et al., 2013). While de novo methods yield high accuracy and are less affected by the length of the peptides, these methods tend to be computationally expensive and often rely on lengthy molecular dynamics simulations to refine solutions.

Here we present AutoDock CrankPep or ADCP in short, an efficient de novo method for protein–peptide docking that folds the peptide in the potential energy landscape created by a given receptor. ADCP provides an efficient and accurate way to dock flexible peptides into rigid receptors. We show that it achieves 85.7% success rate on the LEADS-PEP dataset within its top 10 predictions. Furthermore, while existing peptide docking methods have typically limited themselves to peptides with less than 16 amino acids, we evaluate ADCP’s ability to dock a set of peptides ranging in length from 16 to 20 amino acids. For these peptides, ADCP achieves re-docking success rates of 64% and 91% when considering the top or top 5 solutions, respectively. These results indicate that ADCP is a robust peptide docking tool that can be used to model protein–protein interactions mediated by protein segments such as loops or disordered fragments.

2 Materials and methods

Small molecule docking methods typically perform best with ligands containing less than 20 rotatable bonds (Hauser and Windshügel, 2016; Rentzsch and Renard, 2015). Peptides with five or more amino acids can easily exceed this number. A medium sized peptide of 10 amino acids typically has around 40 rotatable bonds, rendering these methods ineffective.

CRANKITE is an efficient software package originally developed for protein and peptide conformation sampling and folding (Burkoff et al., 2012; Podtelezhnikov and Wild, 2005, 2008; Várnai et al., 2013). It samples the conformational space of proteins or peptides using a Metropolis Monte Carlo (MC) search and a Gō-Type representation of amino acid side-chains (Takada, 1999; Taketomi et al., 2009). CRANKITE can rapidly explore the conformational space of sequences of amino acids by performing the two MC moves illustrated in Figure 1: (i) a crankshaft motion along two randomly selected C_α atoms and (ii) a rotation near the end of the chain.

Fig. 1.

Open in new tab Download slide

CRANKITE’s Monte Carlo moves. A crankshaft motion along two selected C_α atoms or a rotation near the end of the chain

ADCP combines CRANKITE’s conformation sampling ability with the grid-based AutoDock representation of a rigid receptor (Huey et al., 2007; Morris et al., 2009) to concurrently optimize the peptide conformation and its interactions with the receptor, thus yielding docking poses. ADCP was implemented based on CRANKITE. The notable modifications and additions are as follows: (i) the addition of new MC moves to boost the exploration of peptide position and orientation relative to the receptor; (ii) the addition of an energy term based on the AutoDock affinity grids to describe the peptide-receptor interactions; (iii) the use of a rotamer library (Dunbrack Jr and Cohen, 1997) to interactively construct side-chain atoms; and (iv) the addition of a pose cache swapping mechanism to enhance the search.

The overall workflow of the MC procedure implemented by ADCP is depicted in Figure 2. First, a randomly selected MC move is applied to alter the current pose. The altered pose is then scored, and the move is either accepted or rejected based on a metropolis-like MC criterion. If the move is rejected, the pose before the move is restored and another move is attempted. If it is accepted, the altered pose becomes the current one and is used to update the cache of docking poses. This procedure repeats until one of the termination criteria is met. More details about the various elements of this workflow are provided below.

Fig. 2.

Open in new tab Download slide

Flow chart of AutoDock CrankPep Monte Carlo procedure

Input : ADCP requires a description of the receptor and the peptide. The receptor is represented by affinity maps calculated using AutoGridFR (Ravindranath et al., 2015). AutoGridFR produces a single zip file that contains affinity maps for all atom types in the peptide calculated by AutoGrid4 (Morris et al., 2009), along with metadata about the docking box (e.g. the size and position of the box, a list of favorable locations in the affinity maps called translational points, etc.). The peptide can be specified as a 3D structure in the PDB file format or by its sequence of amino acids. In the latter case, a starting conformation is constructed automatically. This initial conformation can be generated in an extended or alpha helical conformation using lowercase and uppercase letters respectively. The user can also specify the maximum number of MC steps for the simulation. As conventional docking methods can dock short peptides with reasonable accuracy, ADCP was designed to support peptides with five or more amino acids.

Monte Carlo moves : During the MC search the docking pose is modified using MC moves. We extended the original set of MC moves with: (i) a local translation to perturb the peptide position; and (ii) a translational jump that translates the peptide to move a central peptide atom called the ‘root atom’ to a new ‘translational point’ in the docking box (Supplementary Fig. S1). The ‘translational points’ are a set of positions with high affinities identified by AutoGridFR (Ravindranath and Sanner, 2016). These moves were added to facilitate the peptide’s exploration of the potential field created by the receptor. Each MC move will trigger the reconstruction of the all-atom representation of the side-chains using a rotamer library (see below).

Scoring Function : The scoring function of ADCP consists of two components: a score for the conformation of the peptide; and a score for the interaction between the peptide and the receptor. We will refer to these scores as the internal and the interaction scores, respectively. The internal score is based on CRANKITE’s Gō-Type potential (Podtelezhnikov and Wild, 2005, 2008) where each side-chain is represented by a single bead. We extended the internal score function with a new term based on Ramachandran propensities for backbone φ and ψ angles (Lovell et al., 2003). These Ramachandran propensities are transformed into energies according to the Boltzman distribution. The interaction score between the peptide and the protein is calculated using the AutoDock affinity grids. Calculating this score requires a full-atom representation of the peptide which is constructed using a rotamer library (Dunbrack Jr. and Cohen, 1997). Every time a pose is scored, we iterate over the peptide either forward (N-to C-terminus) or backwards (C- to N-terminus) with the same probability. For each amino acid, we construct all rotameric conformations and score them using the affinity grids while avoiding clashes with the peptide backbone and already built side-chains. The energetically most favorable side-chain conformation is selected to represent this amino acid. Once the full-atom representation of the peptide is built, the interaction score between the peptide and the receptor is obtained by summing up the scores from the affinity grids for all atoms in the peptide.

Monte Carlo Criterion : ADCP uses a metropolis-like MC criterion given in Equation 1 to accept a move

exp (- k_{T} Δ E \times HillClimb) > random

(1)

where ΔE is the score difference between the pose before and after the MC move; k_T is a temperature factor that can be used to adjust the probability of the MC criteria; and random is a random number ranging from 0.0 to 1.0. We found that the traditional metropolis MC criterion did not yield an efficient exploration of solution space. The HillClimb term was introduced to boost the search power. If the score worsens by more than 5.9 kcal/mol (10 k_BT at room temperature), HillClimb is set to 0.05, otherwise HillClimb remains at 1. With this hill climbing feature, a score increase of 5.9 kcal/mol has a 60.6% probability to be accepted, and a score increase of 11.8 kcal/mol has 36.8% chance to be accepted and so on. Using a HillClimb factor ranging from 0.05 to 0.20 does not affect the results substantially.

Pose cache update : The presence of the receptor creates a complex energy landscape for the peptide to fold while maximizing its interaction with the receptor. We found that the traditional MC search often got stuck in local minima. Thus, we implemented the pose cache to maintain a pool of docking poses encountered during the search and restart the search under certain circumstances. Every pose accepted by the Metropolis criterion is compared with the ones present in the cache and can: either be added to the cache (e.g. if it has the best energy found so far) or it can replace an entry in the cache (e.g. a similar solution is in the cache but with a worse energy). More details on this process are provided in Chart S1. Every time a pose improves on the best score seen so far, it is appended to the output multi-model pdb file. If the search generates 100 000 consecutive poses each having a score 3 kcal/mol higher than the best score found so far, or the best score has not improved for one million steps, the search restarts from a randomly selected pose from the cache.

Termination criteria: The program stops if the maximum number of steps is reached or the best score has not been improved for 10 million steps.

Using the software: The affinity maps are calculated using AutoGridFR for a user-defined docking box. The results presented here were docked with a 4 Å padding on every side of the peptide. We also performed redockings with 6 and 8 Å padding, the latter often covering the bulk of the receptor. While we observed minor changes in the ordering of the docking poses due to the stochastic nature of the algorithm, these changes do not modify the reported sampling success rate. The stochastic nature of the MC search is usually addressed by performing multiple searches called replicas. The number of replicas is specified by the user in ADCP. Unless otherwise specified, the results presented here were obtained using 80 replicas with 60 simulations started from extended conformations and 20 simulations started from helical conformations constructed from amino acid sequence. Each replica performed 3 million MC steps per amino acid in the peptide (i.e. 15 million steps for a 5-mer peptide and 36 million steps for 12-mer peptide, etc.). While we routinely observed conformational changes between helical conformation and extended conformation and vice versa during the MC runs, we found that statistically, starting the MC with a mix of initial conformation speeds up the process of identifying the correct solution. Users can customize these parameters according to their specific needs. We suggest using more replicas and longer simulations for larger peptides and/or larger docking boxes.

After all replicas finish their search, the docking poses are clustered to produce the final docking poses. The clustering can be performed using the AutoDockFR clustering algorithm (Morris et al., 2009; Ravindranath et al., 2015), or using pairs of peptide-receptor residues in contact. See the Supplementary Information for a detailed description of the contact-based clustering algorithm.

2.1 Datasets

The peptiDB (London et al., 2010) and LEADS-PEP (Hauser and Windshügel, 2016) protein–peptide datasets have been widely used for benchmarking peptide docking methods (Raveh et al., 2011; Tubert-Brohman et al., 2013; Zhou et al., 2018a). PeptiDB contains 102 protein–peptide complexes varying from 3 to 15 amino acids. We benchmarked ADCP with the Glide SP-PEP dataset (Tubert-Brohman et al., 2013), a subset of peptiDB comprised of 19 high-quality, non-α-helical complexes ranging from 5 to 11 amino acids. The Glide SP-PEP dataset has been used to benchmark FlexPepDock, Glide SP-PEP and HPepDock. The LEADS-PEP dataset is a more recent, and manually curated dataset of 53 complexes with peptides ranging from 3 to 12 amino acids. In this study we used the subset of 42 complexes from LEADS-PEP containing peptides with 5 or more amino acids. We consider the peptides in these datasets as medium-sized peptides for docking purposes.

Current available peptide docking methods are mostly tested on peptides with 15 amino acids or less. To further test ADCP, we compiled at set of 11 protein peptide complexes from the protein data bank (PDB) (Berman et al., 2000) with longer peptides ranging in length from 16 to 20 amino acids. These structures were obtained by selecting PDB entries with crystallographic resolution of 2.2 Å or better and containing a peptide comprised of 16–20 standard amino acids. The peptides in this set are neither cyclic nor covalently bound to the receptor; they have no significant clashes between peptide and receptor atoms and have no significant contacts between the peptide and crystal mates of the receptor. These complexes are listed in Supplementary Table S1.

We found apo structures for 5 of the 11 protein–peptide complexes (Supplementary Table S1) by retrieving all PDB entries sharing the uniport accession code of the protein in the protein–peptide complex (holo protein). The sequences of the apo-candidates and holo proteins were then aligned to the UNIPROT (UniProt Consortium, 2019) sequence yielding a mapping of the apo-candidate sequence to the holo sequence. We only retained entries with 70% or more of the apo sequence mapped to the holo sequence. Next, we structurally aligned the apo-candidate structure with the holo protein and we identified overlaps between the peptide and atoms in the apo-candidate structure. We retained entries in which the peptide only overlap with water, ions , additives or apo receptor chain. For the three complexes 2IVZ, 5UWI and 6CIT, which resulted in more than one apo structure (2, 12 and 12 respectively), we selected the apo structure with the best crystallographic resolution.

2.2 Success metrics

All atom Root-Mean-Square Deviation (RMSD) is typically used to assess success while docking small molecules. As ligands grow larger this metric becomes less appropriate and RMSD of backbone atoms (N, CA, C) has been used for assessing docking success rate for small peptides (Irving et al., 2001; Méndez et al., 2003). For instance, (Hauser and Windshügel, 2016; Zhou et al., 2018a) used a 2.5 Å backbone RMSD cutoff to define successful peptide redocking on the Lead-Pep dataset. Other studies (Raveh et al., 2011; Tubert-Brohman et al., 2013) used the iRMSD (interface RMSD) defined as the RMSD of the backbone atoms of the ‘interface residues’. Interface residues are amino acids of the peptide having their C_β atom within 8 Å of any receptor C_β atom. Poses with iRMSD values under 3.0 Å are typically considered to be successful dockings. To facilitate the comparison with other methods, we used the same metric as used in previously published studies, i.e. backbone RMSD for comparison on the LEADS-PEP dataset and iRMSD for the Glide SP-PEP dataset.

When docking longer peptides, RMSD-based metrics do not provide a precise measure for success. For these cases, we assess success using native contacts: a metric borrowed from the protein–protein docking field (Irving et al., 2001; Méndez et al., 2003). Native contacts are defined as the list of all pairs of peptide-receptor amino acids located within 5 Å of each other. Similar to (Méndez et al., 2003; Peterson et al., 2017; Yan et al., 2016), we identify successful redockings of peptides ranging from 16 to 20 amino acids in length when the docking poses reproduce more than 50% of the native contacts (i.e. f_nc > 0.5).

3 Results and discussion

We compared the success rate of ADCP with previously published results from leading peptide docking methods on the datasets containing medium size peptides (i.e. up to 12 amino acids). We also demonstrate that ADCP achieves remarkable success rates in docking longer peptide (i.e. up to 20 amino acids).

3.1 Accurate docking of medium size peptides

HPepDock (Zhou et al., 2018a, b) is a recent peptide docking method that falls into the ensemble docking category. It uses MODPEP (Yan et al., 2017) to generate 1000 peptide conformations and then docks these peptide conformations semi-rigidly using MDock (Huang and Zou, 2006). HPepDock has been shown to achieve better accuracy than traditional small-molecule docking methods as well as other leading peptide docking methods including FlexPepDock, Glide-SP-PEP, HADDOCK and pepATTRACT (Zhou et al., 2018a, b). As such, it can be considered the state of the art at the time of writing this paper. Here we compare the success rates of ADCP with the HPepDock results on these datasets using the same metrics. For the complexes from the LEADS-PEP dataset, ADCP consistently outperforms HPepDock as shown in Figure 3. Considering the top 10 solutions, ADCP achieves 85.7% success rate for this dataset compared to 66.7% for HPepDock at 2.5 Å RMSD cutoff. ADCP significantly improves success rate at 1.0 Å RMSD cutoff, predicting 76.2% of the complexes with a sub-angstrom backbone RMSD precision within its top 1000 predictions, and 100% of the complexes have a correctly docked pose (backbone RMSD ≤ 2.5 Å). The complex-specific comparison is provided in Supplementary Table S2.

Fig. 3.

Open in new tab Download slide

The comparison of success rates for ADCP and HPepDock with different success criteria. Blue solid line represents the success rate for ADCP if the top 1000 solutions are considered and blue dashed line represents the success rate if the top 10 solutions are considered. Red solid and dashed lines represent the success rate for HPepDock if the top 1000 solutions and the top 10 solutions are considered, respectively (Color version of this figure is available at Bioinformatics online.)

HPepDock’s performance deteriorates significantly for the longer peptides in this dataset (Table 2). While it’s overall success rate at 2.5 Å backbone RMSD for the top 10 solution is 66.7% (28/42), when considering only peptides with 9 or more amino acids the success rate drops to 35% (7/20). ADCP on the other hand maintains its 85% overall success rate on the subset of peptides with 9 or more amino acids.

Table 2.

Success rates at 2.5 Å backbone RMSD considering the top 10 solutions

bbRMSD ≤ 2.5 Å	ADCP	HPepDock
All	85.7%	66.7%
Longer peptides	85.0%	35.0%

Note: Out of the 42 peptides, 20 have more than 8 amino acids (48%) and are classified as ‘longer peptides’.

Open in new tab

Table 2.

Success rates at 2.5 Å backbone RMSD considering the top 10 solutions

bbRMSD ≤ 2.5 Å	ADCP	HPepDock
All	85.7%	66.7%
Longer peptides	85.0%	35.0%

Note: Out of the 42 peptides, 20 have more than 8 amino acids (48%) and are classified as ‘longer peptides’.

Open in new tab

A possible explanation for this could be that for longer peptides, the conformational space, which increases exponentially with the length of the peptide, eventually requires a prohibitively large numbers of conformers to be used in ensemble docking methods. ADCP explores the peptide's conformational space during the docking process thus it is less affected by increasing peptide lengths. ADCP maintains a consistent success rate across the peptides lengths in this dataset.

On the smaller Glide SP dataset, ADCP performs similarly to other methods for both holo (19 complexes) and apo (10 complexes) receptor conformations, as shown in Table 3.

Table 3.

Success rates on the Glide SP-PEP dataset

iRMSD		ADCP	FlexPepDock	Glide SP-PEP	HPepDock
holo	≤2 Å	13	13	11	12
holo	≤3 Å	15	13	13	15
apo	≤2 Å	4	6	4	5
apo	≤3 Å	8	6	4	8

Note: Among the 19 complexes in the dataset, 10 receptors have apo conformation available. Here a docking is deemed successful if one of the top 10 solutions has an interface RMSD (iRMSD) lower than 2.0 Å or 3.0 Å respectively.

Open in new tab

Table 3.

Success rates on the Glide SP-PEP dataset

iRMSD		ADCP	FlexPepDock	Glide SP-PEP	HPepDock
holo	≤2 Å	13	13	11	12
holo	≤3 Å	15	13	13	15
apo	≤2 Å	4	6	4	5
apo	≤3 Å	8	6	4	8

Note: Among the 19 complexes in the dataset, 10 receptors have apo conformation available. Here a docking is deemed successful if one of the top 10 solutions has an interface RMSD (iRMSD) lower than 2.0 Å or 3.0 Å respectively.

Open in new tab

3.2 Reliable docking long peptides and protein segments

Currently available peptide docking methods have mostly been tested and benchmarked on small and medium-sized peptides with up to 15 amino acids in length. However, a considerable portion of protein–protein interactions are mediated by flexible protein loops, disordered chains segments, or intrinsically disordered proteins, involving sequences that can easily exceed 15 amino acids. Therefore, we tested ADCP on a set of 11 complexes containing peptides with 16–20 amino acids in length. For these dockings we performed 80 MC simulations, allotting 7×N million MC moves (where N is the number of amino acids in the peptide) to each run. The docking poses from the 80 MC runs were clustered using contacts with a cutoff value of 80%. Results are shown in Table 4.

Table 4.

Docking results for long peptides and protein segments

a

DoF: degrees of freedom including translation, rotation and rotatable bonds.

The cells are colored based on f_nc from red (0.0) to yellow (0.5) and to green (1.0).

Open in new tab

Considering only the top-ranking solution, ADCP identifies solution with at least 50% native contacts for 7 out of 11 complexes (63.6%). Within the top 5 solutions, the success rates increase to 90.9% (10 out of 11 complexes). Figure 4 shows some examples of docked pose with respect to the crystal structure along with the fraction of reproduced native and backbone RMSD values.

Fig. 4.

Open in new tab Download slide

Comparison of selected docked poses (cyan) and experimentally determined structures (magenta). The receptor surface is provided for context and colored using a polarity scheme developed by Dr. D. Goodsell. These Figures are generated using PMV 1.5.7 (Sanner, 1999) and MSMS (Sanner et al., 1996) (Color version of this figure is available at Bioinformatics online.)

For the 5 complexes for which apo structures are available, ADCP obtained good solutions for 2 complexes with f_nc of more than 0.5 and partially docked solutions for 2 complexes with f_nc of more than 0.3 within its 5 predictions. Docking failed for the 1CM1 complex in which the protein forms a tunnel where a substantial conformational change in the apo structure greatly shrinks the width of the tunnel, preventing the peptide from binding there. See Supplementary Table S3 for more details.

ADCP demonstrates a robust ability to dock long peptides and protein segments. With more MC replicas and more steps for each replica, ADCP could potentially be applied to even longer peptide-like segments.

It is noteworthy that the current scoring function in ADCP has not been optimized or calibrated for protein–peptide interactions. The interaction energy between peptide and receptor relies on AutoDock4 affinity maps that were initially developed and calibrated for docking small, drug-like molecules. While we have started incorporating peptide-specific elements, such as a potential for Ramachandran backbone angles into our scoring function, further refinements of the current scoring function could improve docking success rates. Alternatively, re-scoring top-ranked poses with scoring functions designed for protein–peptide interactions (Huang and Zou, 2008; Spiliotopoulos et al., 2016) could also improve the ranking of the docking predictions.

3.3 Timing

ADCP is computationally efficient compared with other de novo methods. Based on 3 million MC moves per amino acid, a MC search typically takes from ∼10 min for a 5-mer to ∼1 h for a 12-mer, using a single thread on an Intel Xeon E5-1620 processor (2014). These times are rough averages and can vary depending on the peptide sequence. See Supplementary Table S2 for detailed timing results on LEADS-PEP dataset. Each MC simulation can be run independently and they can execute in parallel locally or on high performance computing clusters. The time for the final clustering is a function of the clustering cutoff but typically only takes a few minutes. While HPepDock and other ensemble docking methods requires less computational resources, ADCP achieves better success rates especially for longer peptides. HPepDock remains a powerful peptide docking tool for medium size peptides.

4 Conclusions

In this paper, we presented ADCP, a novel approach for predicting protein–peptide interactions for peptides of substantial length. The approach leverages an algorithm developed initially for protein folding and combines it with a representation of a rigid receptor developed for docking. With a success rate of 85.7% on the LEADS-PEP dataset when considering the top 10 predictions, ADCP outperforms leading peptide docking approaches. Moreover, we show that ADCP is able to dock peptides with up to 20 amino acids to their receptors. ADCP expands peptide docking to the prediction of certain types of protein–protein interactions, e.g. disordered tails or flexible protein loops interacting with itself or another protein.

Acknowledgements

The authors thank Drs Wild, Podtelezhnikov and Várnai for the original implementation of CRANKITE and many helpful discussions and suggestions. The authors also thank the members of the Center for Computational Structural Biology for many fruitful discussions. This is manuscript 29784 from The Scripps Research Institute.

Funding

The research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM096888 to Dr. M. F. Sanner. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflict of Interest: none declared.

References

Alam

N.

et al. (

2017

)

High-resolution global peptide-protein docking using fragments-based PIPER-FlexPepDock

.

PLoS Comput. Biol

.,

13

,

e1005905.

Ben-Shimon

A.

,

Niv

M.Y.

(

2015

)

AnchorDock: blind and flexible anchor-driven peptide docking

.

Structure

,

23

,

929

–

940

.

Berman

H.M.

et al. (

2000

)

The Protein Data Bank

.

Nucleic Acids Res

.,

28

,

235

–

242

.

Burkoff

N.S.

et al. (

2012

)

Exploring the energy landscapes of protein folding simulations with bayesian computation

.

Biophys. J

.,

102

,

878

–

886

.

Ciemny

M.

et al. (

2018

)

Protein–peptide docking: opportunities and challenges

.

Drug Discov. Today

,

23

,

1530

–

1537

.

Dunbrack

R.L.

Jr ,

Cohen

F.E.

(

1997

)

Bayesian statistical analysis of protein side-chain rotamer preferences

.

Protein Sci

.,

6

,

1661

–

1681

.

Fosgerau

K.

,

Hoffmann

T.

(

2015

)

Peptide therapeutics: current status and future directions

.

Drug Discov. Today

,

20

,

122

–

128

.

Hauser

A.S.

,

Windshügel

B.

(

2016

)

LEADS-PEP: a benchmark data set for assessment of peptide docking performance

.

J. Chem. Inf. Model

.,

56

,

188

–

200

.

Huang

S.-Y.

,

Zou

X.

(

2008

)

An iterative knowledge-based scoring function for protein–protein recognition

.

Proteins Struct. Funct. Bioinform

.,

72

,

557

–

579

.

Google Scholar

Crossref

WorldCat

Huang

S.-Y.

,

Zou

X.

(

2006

)

Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking

.

Proteins Struct. Funct. Bioinform

.,

66

,

399

–

421

.

Google Scholar

Crossref

WorldCat

Huey

R.

et al. (

2007

)

A semiempirical free energy force field with charge-based desolvation

.

J. Comput. Chem

.,

28

,

1145

–

1152

.

Irving

J.A.

et al. (

2001

)

Protein structural alignments and functional genomics

.

Proteins Struct. Funct. Bioinform

.,

42

,

378

–

382

.

Google Scholar

Crossref

WorldCat

Kurcinski

M.

et al. (

2015

)

CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site

.

Nucleic Acids Res

.,

43

,

W419

–

W424

.

Lau

J.L.

,

Dunn

M.K.

(

2018

)

Therapeutic peptides: historical perspectives, current development trends, and future directions

.

Bioorg. Med. Chem

.,

26

,

2700

–

2707

.

Lee

H.

et al. (

2015

)

GalaxyPepDock: a protein–peptide docking tool based on interaction similarity and energy optimization

.

Nucleic Acids Res

.,

43

,

W431

–

W435

.

London

N.

et al. (

2013

)

Peptide docking and structure-based characterization of peptide binding: from knowledge to know-how

.

Curr. Opin. Struct. Biol

.,

23

,

894

–

902

.

London

N.

et al. (

2010

)

The structural basis of peptide–protein binding strategies

.

Structure

,

18

,

188

–

199

.

Lovell

S.C.

et al. (

2003

)

Structure validation by Cα geometry: ϕ,ψ and Cβ deviation

.

Proteins Struct. Funct. Bioinform

.,

50

,

437

–

450

.

Google Scholar

Crossref

WorldCat

Méndez

R.

et al. (

2003

)

Assessment of blind predictions of protein–protein interactions: current status of docking methods

.

Proteins Struct. Funct. Bioinform

.,

52

,

51

–

67

.

Google Scholar

Crossref

WorldCat

Morris

G.M.

et al. (

2009

)

AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility

.

J. Comput. Chem

.,

30

,

2785

–

2791

.

Peterson

L.X.

et al. (

2017

)

Modeling disordered protein interactions from biophysical principles

.

PLoS Comput. Biol

.,

13

,

e1005485.

Petsalaki

E.

,

Russell

R.B.

(

2008

)

Peptide-mediated interactions in biological systems: new discoveries and applications

.

Curr. Opin. Biotechnol

.,

19

,

344

–

350

.

Podtelezhnikov

A.A.

,

Wild

D.L.

(

2008

)

CRANKITE: a fast polypeptide backbone conformation sampler

.

Source Code Biol. Med

.,

3

,

12

.

Podtelezhnikov

A.A.

,

Wild

D.L.

(

2005

)

Exhaustive Metropolis Monte Carlo sampling and analysis of polyalanine conformations adopted under the influence of hydrogen bonds

.

Proteins Struct. Funct. Bioinform

.,

61

,

94

–

104

.

Google Scholar

Crossref

WorldCat

Raveh

B.

et al. (

2011

)

Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors

.

PLoS One

,

6

,

e18934

.

Ravindranath

P.A.

et al. (

2015

)

AutoDockFR: advances in protein–ligand docking with explicitly specified binding site flexibility

.

PLoS Comput. Biol

.,

11

,

e1004586

.

Ravindranath

P.A.

,

Sanner

M.F.

(

2016

)

AutoSite: an automated approach for pseudo-ligands prediction—from ligand-binding sites identification to predicting key ligand atoms

.

Bioinformatics

,

32

,

3142

–

3149

.

Rentzsch

R.

,

Renard

B.Y.

(

2015

)

Docking small peptides remains a great challenge: an assessment using AutoDock Vina

.

Brief. Bioinform

.,

16

,

1045

–

1056

.

Sanner

M.F.

(

1999

)

Python: a programming language for software integration and development

.

J. Mol. Graph. Model

.,

17

,

57

–

61

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Sanner

M.F.

et al. (

1996

)

Reduced surface: an efficient way to compute molecular surfaces

.

Biopolymers

,

38

,

305

–

320

.

Schindler

C.E.M.

et al. (

2015

)

Fully blind peptide–protein docking with pepATTRACT

.

Structure

,

23

,

1507

–

1515

.

Spiliotopoulos

D.

et al. (

2016

)

dMM-PBSA: a new HADDOCK scoring function for protein–peptide docking

.

Front. Mol. Biosci

.,

3

,

46

.

Stein

A.

,

Aloy

P.

(

2008

)

Contextual specificity in peptide-mediated protein interactions

.

PLoS One

,

3

,

1

–

10

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Takada

S.

(

1999

)

Go-ing for the prediction of protein folding mechanisms

.

Proc. Natl. Acad. Sci. USA

,

96

,

11698

–

11700

.

Google Scholar

Crossref

WorldCat

Taketomi

H.

et al. (

2009

)

Studies on protein folding, unfolding and fluctuations by computer simulations

.

Int. J. Pept. Protein Res

.,

7

,

445

–

459

.

Google Scholar

Crossref

WorldCat

Trellet

M.

et al. (

2013

)

A unified conformational selection and induced fit approach to protein–peptide docking

.

PLoS One

,

8

,

e58769.

Tubert-Brohman

I.

et al. (

2013

)

Improved Docking of Polypeptides with Glide

.

J. Chem. Inf. Model

.,

53

,

1689

–

1699

.

UniProt Consortium (

2019

)

UniProt: a worldwide hub of protein knowledge

.

Nucleic Acids Res

.,

46

,

506

–

515

.

OpenURL Placeholder Text

WorldCat

Várnai

C.

et al. (

2013

)

Efficient parameter estimation of generalizable coarse-grained protein force fields using contrastive divergence: a maximum likelihood approach

.

J. Chem. Theory Comput

.,

9

,

5718

–

5733

.

Wright

P.E.

,

Dyson

H.J.

(

2015

)

Intrinsically disordered proteins in cellular signalling and regulation

.

Nat. Rev. Mol. Cell Biol

.,

16

,

18.

Yan

C.

et al. (

2016

)

Fully blind docking at the atomic level for protein–peptide complex structure prediction. Structure

24

,

1842

–

1853

.

Yan

Y.

et al. (

2017

)

Efficient conformational ensemble generation of protein-bound peptides

.

J. Cheminform

.,

9

,

59

.

Zhou

P.

et al. (

2018a

)

Hierarchical flexible peptide docking by conformer generation and ensemble docking of peptides

.

J. Chem. Inf. Model

.,

58

,

1292

–

1302

.

Zhou

P.

et al. (

2018b

)

HPEPDOCK: a web server for blind peptide–protein docking based on a hierarchical algorithm

.

Nucleic Acids Res

.,

46

,

W443

–

W450

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Associate Editor:

Download all slides

Month:	Total Views:
June 2019	56
July 2019	22
August 2019	26
September 2019	38
October 2019	55
November 2019	41
December 2019	49
January 2020	95
February 2020	37
March 2020	37
April 2020	19
May 2020	12
June 2020	51
July 2020	30
August 2020	34
September 2020	32
October 2020	47
November 2020	46
December 2020	22
January 2021	27
February 2021	31
March 2021	34
April 2021	112
May 2021	94
June 2021	90
July 2021	92
August 2021	130
September 2021	107
October 2021	162
November 2021	96
December 2021	119
January 2022	177
February 2022	126
March 2022	150
April 2022	191
May 2022	166
June 2022	144
July 2022	158
August 2022	165
September 2022	175
October 2022	165
November 2022	172
December 2022	149
January 2023	112
February 2023	141
March 2023	165
April 2023	168
May 2023	114
June 2023	95
July 2023	155
August 2023	110
September 2023	102
October 2023	114
November 2023	144
December 2023	164
January 2024	140
February 2024	141
March 2024	115
April 2024	93

Article Contents

AutoDock CrankPep: combining folding and docking to predict protein–peptide complexes

Abstract

1 Introduction

2 Materials and methods

2.1 Datasets

2.2 Success metrics

3 Results and discussion

3.1 Accurate docking of medium size peptides

3.2 Reliable docking long peptides and protein segments

3.3 Timing

4 Conclusions

Acknowledgements

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

AutoDock CrankPep: combining folding and docking to predict protein–peptide complexes

Abstract

1 Introduction

2 Materials and methods

2.1 Datasets

2.2 Success metrics

3 Results and discussion

3.1 Accurate docking of medium size peptides

3.2 Reliable docking long peptides and protein segments

3.3 Timing

4 Conclusions

Acknowledgements

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only