- Split View
-
Views
-
Cite
Cite
Boris Vishnepolsky, Malak Pirtskhalava, Comment on: ‘Empirical comparison of web-based antimicrobial peptide prediction tools’, Bioinformatics, Volume 35, Issue 15, August 2019, Pages 2692–2694, https://doi.org/10.1093/bioinformatics/bty1023
- Share Icon Share
Abstract
Supplementary information: Supplementary data are available at Bioinformatics online.
In Bioinformatics article, ‘Empirical comparison of web-based antimicrobial peptide prediction tools’ (Gabere and Noble, 2017), a comparison of different antimicrobial peptide prediction tools was described. This is an important task as it allows scientists to use the corresponding tools for their purposes. The authors consider ten tools divided into three groups classified by target species: antimicrobial, antibacterial and bacteriocins. For each group, the authors consider their own test sets. Our comments concern the threshold-based comparison of tools included in the first group. In this group, the authors consider six prediction tools: CAMP3(RF) (Waghu et al., 2016), CAMP3(SVM) (Waghu et al., 2016), ADAM (Lee et al., 2015), DBAASP (Vishnepolsky and Pirtskhalava, 2014), AMPA (Torrent et al., 2012) and MLAMP (Lin and Xu, 2016). For tools’ testing, common sets of sequences were used. We would like to note that these tools have some specific areas of application and so not all sequences in the datasets can be used as input for each tool. Some tools have particular limitations on input sequences: ADAM, DBAASP, CAMP3 (RF), CAMP3 (SVM) do not allow using non-standard amino acids. DBAASP has limitation on peptide size (<100 amino acids). Although some other tools [CAMP3(RF) and CAMP3(SVM)] do not limit sequence size, the predictive models are relied on training and test sets (Waghu et al., 2016) constructed on sequences of length <100. Therefore, we think that optimal test sets suitable for most tools must contain sequences with no more than 100 amino acids and without non-standard amino acids. It is not correct to test tools on sequences that cannot be taken as input (the programs give error) and calculate metrics relative to set of full sequences in datasets.
To make reliable assessments, new test sets, meeting the above-mentioned requirements, were created from the datasets (DAMPD and APD3) of the original paper. According to the original paper, positive sets of the DAMPD and APD3 benchmark were downloaded from DAMPD (Seshadri et al., 2012) and APD3 (Wang et al., 2016) databases, respectively, and became unredundant by using CD-HIT software (Li and Godzik, 2006). The corresponding negative sets were constructed on the basis of randomly extracted sequences from the UniProt database (The Uniprot Consortium, 2015), which were not annotated as AMPs (Gabere and Noble, 2017). After taking into account the above-mentioned requirements, new benchmarks O-DAMPD-P and O-DAMPD-N were created from DAMPD and APD3, respectively. O-DAMPD-P set consists of positive (464 sequences) and negative (2362 sequences) sets selected from DAMPD (Supplementary Tables S1and S2). O-APD3-P set consists of positive (1682 sequences) and negative (8409 sequences) sets selected from APD3 (Supplementary Tables S3 and S4).
For comparison of the different tools, the following performance measures were used: sensitivity [Sens = TP/(TP + FN)], specificity [Spec = TN/(TN + FP)], precision [Pres = TP/(TP + FP)] and balanced accuracy: [Bal acc = (Sens + Spec)/2], where TP is true positive, TN is true negative, FP is false positive and FN is true negative.
The prediction results on O-DAMPD and O-APD3 have been presented in Tables 1 and 2. Most metrics have close values to the original paper, but some differences still occur. Specificity and balance accuracy for CAMP3(RF) and CAMP3(SVM) have higher values than those in the original paper. It can be explained by the fact that CAMP3(RF) and CAMP3(SVM) predict almost all long sequences as antimicrobial.
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 433 | 381 | 31 | 1981 | 2826 | 93.32 | 83.87 | 53.19 | 88.60 |
CAMPR3(SVM) | 422 | 410 | 42 | 1952 | 2826 | 90.95 | 82.64 | 52.04 | 86.80 |
ADAM | 433 | 845 | 31 | 1517 | 2826 | 93.32 | 64.23 | 33.88 | 78.78 |
MLAMP | 338 | 481 | 126 | 1881 | 2826 | 72.84 | 79.64 | 41.27 | 76.24 |
DBAASP | 306 | 238 | 158 | 2124 | 2826 | 65.95 | 89.92 | 56.35 | 77.94 |
AMPA | 216 | 253 | 250 | 2109 | 2826 | 46.55 | 89.29 | 46.06 | 67.92 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 433 | 381 | 31 | 1981 | 2826 | 93.32 | 83.87 | 53.19 | 88.60 |
CAMPR3(SVM) | 422 | 410 | 42 | 1952 | 2826 | 90.95 | 82.64 | 52.04 | 86.80 |
ADAM | 433 | 845 | 31 | 1517 | 2826 | 93.32 | 64.23 | 33.88 | 78.78 |
MLAMP | 338 | 481 | 126 | 1881 | 2826 | 72.84 | 79.64 | 41.27 | 76.24 |
DBAASP | 306 | 238 | 158 | 2124 | 2826 | 65.95 | 89.92 | 56.35 | 77.94 |
AMPA | 216 | 253 | 250 | 2109 | 2826 | 46.55 | 89.29 | 46.06 | 67.92 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 433 | 381 | 31 | 1981 | 2826 | 93.32 | 83.87 | 53.19 | 88.60 |
CAMPR3(SVM) | 422 | 410 | 42 | 1952 | 2826 | 90.95 | 82.64 | 52.04 | 86.80 |
ADAM | 433 | 845 | 31 | 1517 | 2826 | 93.32 | 64.23 | 33.88 | 78.78 |
MLAMP | 338 | 481 | 126 | 1881 | 2826 | 72.84 | 79.64 | 41.27 | 76.24 |
DBAASP | 306 | 238 | 158 | 2124 | 2826 | 65.95 | 89.92 | 56.35 | 77.94 |
AMPA | 216 | 253 | 250 | 2109 | 2826 | 46.55 | 89.29 | 46.06 | 67.92 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 433 | 381 | 31 | 1981 | 2826 | 93.32 | 83.87 | 53.19 | 88.60 |
CAMPR3(SVM) | 422 | 410 | 42 | 1952 | 2826 | 90.95 | 82.64 | 52.04 | 86.80 |
ADAM | 433 | 845 | 31 | 1517 | 2826 | 93.32 | 64.23 | 33.88 | 78.78 |
MLAMP | 338 | 481 | 126 | 1881 | 2826 | 72.84 | 79.64 | 41.27 | 76.24 |
DBAASP | 306 | 238 | 158 | 2124 | 2826 | 65.95 | 89.92 | 56.35 | 77.94 |
AMPA | 216 | 253 | 250 | 2109 | 2826 | 46.55 | 89.29 | 46.06 | 67.92 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 1593 | 1266 | 89 | 7143 | 10 091 | 94.71 | 84.94 | 55.72 | 89.83 |
CAMPR3(SVM) | 1525 | 1480 | 157 | 6929 | 10 091 | 90.67 | 82.40 | 50.75 | 86.54 |
ADAM | 1550 | 3270 | 132 | 5139 | 10 091 | 92.15 | 61.11 | 32.16 | 76.63 |
MLAMP | 1290 | 1900 | 392 | 6509 | 10 091 | 76.69 | 77.41 | 40.44 | 77.05 |
DBAASP | 1084 | 785 | 598 | 7624 | 10 091 | 64.44 | 90.66 | 56.00 | 77.55 |
AMPA | 654 | 806 | 1028 | 7603 | 10 091 | 38.89 | 90.42 | 44.79 | 64.66 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 1593 | 1266 | 89 | 7143 | 10 091 | 94.71 | 84.94 | 55.72 | 89.83 |
CAMPR3(SVM) | 1525 | 1480 | 157 | 6929 | 10 091 | 90.67 | 82.40 | 50.75 | 86.54 |
ADAM | 1550 | 3270 | 132 | 5139 | 10 091 | 92.15 | 61.11 | 32.16 | 76.63 |
MLAMP | 1290 | 1900 | 392 | 6509 | 10 091 | 76.69 | 77.41 | 40.44 | 77.05 |
DBAASP | 1084 | 785 | 598 | 7624 | 10 091 | 64.44 | 90.66 | 56.00 | 77.55 |
AMPA | 654 | 806 | 1028 | 7603 | 10 091 | 38.89 | 90.42 | 44.79 | 64.66 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 1593 | 1266 | 89 | 7143 | 10 091 | 94.71 | 84.94 | 55.72 | 89.83 |
CAMPR3(SVM) | 1525 | 1480 | 157 | 6929 | 10 091 | 90.67 | 82.40 | 50.75 | 86.54 |
ADAM | 1550 | 3270 | 132 | 5139 | 10 091 | 92.15 | 61.11 | 32.16 | 76.63 |
MLAMP | 1290 | 1900 | 392 | 6509 | 10 091 | 76.69 | 77.41 | 40.44 | 77.05 |
DBAASP | 1084 | 785 | 598 | 7624 | 10 091 | 64.44 | 90.66 | 56.00 | 77.55 |
AMPA | 654 | 806 | 1028 | 7603 | 10 091 | 38.89 | 90.42 | 44.79 | 64.66 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 1593 | 1266 | 89 | 7143 | 10 091 | 94.71 | 84.94 | 55.72 | 89.83 |
CAMPR3(SVM) | 1525 | 1480 | 157 | 6929 | 10 091 | 90.67 | 82.40 | 50.75 | 86.54 |
ADAM | 1550 | 3270 | 132 | 5139 | 10 091 | 92.15 | 61.11 | 32.16 | 76.63 |
MLAMP | 1290 | 1900 | 392 | 6509 | 10 091 | 76.69 | 77.41 | 40.44 | 77.05 |
DBAASP | 1084 | 785 | 598 | 7624 | 10 091 | 64.44 | 90.66 | 56.00 | 77.55 |
AMPA | 654 | 806 | 1028 | 7603 | 10 091 | 38.89 | 90.42 | 44.79 | 64.66 |
The most considerable difference between the results presented in the original paper and the results on the new datasets appeared for DBAASP in the case of O-DAMPD dataset. The main reason is that the authors miscalculated the number of correctly predicted peptides on DAMPD dataset. In the original paper, the authors state that on DAMPD dataset, DBAASP correctly predicts 121 peptides (TP). In fact, the value of TP is 306. It can be easily checked from https://dbaasp.org/prediction. We can also note that on the prediction results of DBASSP can influence the fact that DBAASP has two more requirements to input sequences: peptides should be liner and C-terminal amidation must be taken into account. C-terminal amidation affects the charge density of the peptide, so it has influence on the peptide prediction potency, obtained by DBAASP tool. By our evaluation, sensitivity will increase by ∼5% (data not shown).
On the whole, all tools show similar results on O-DAMPD and O-APD3. The best prediction for both datasets gives CAMP3 tools, then follow DBAASP, ADAM and MLAMP (having very similar results by balance accuracy) and in the end is AMPA. The results for AMPA can be explained by the fact that this tool is based on the data of peptides tested against the particular strain of Pseudomonas aeruginosa. So this tool possibly cannot correctly predict antimicrobial potency for other species. In order to check this supposition, the sets with active and non-active peptides against most studied strain ATCC 27853 of P. aeruginosa were selected from DBAASP database (Pirtskhalava et al., 2016). The definitions of active and non-active peptides against particular strain were based on the data of minimum inhibitory concentration of peptide (MIC). Generally saying, standardization of MIC’s assessment is problematic. The data on MIC available from literature have been evaluated using different methods (broth dilution, agar dilution, etc.) and conditions (different broth, CFU, etc.). Low accuracy of estimation is a cause of the accepted practice, which considers that if MIC is within ±2 doubling dilutions for ≥95% of the compared test result sets, the matching of the results is defined as excellent (Reynolds et al., 2003). The threshold of MIC values to segregate a positive and negative sets were chosen according to this practice. We suggested, that, rather large interval between positive and negative sets would allow to diminish an impact of experimental errors. So peptides were defined as active against P.aeruginosa ATCC 27853 if their MIC < 25 µg/ml and non-active if MIC > 100 µg/ml. Initially sets with 347 active and 373 non-active peptides were selected from DBAASP. After removing similar sequences using the CD-HIT web-server with 90% maximum sequence identity threshold, 235 and 195 sequences remain in the positive and negative sets, correspondingly (Supplementary Tables S5 and S6). The prediction results for all tools on last sets are presented in Table 3. Sensitivity of AMPA becomes almost twice as high as it was on O-DAMPD and O-APD3 datasets (Tables 1 and 2), although lower than for the other tools. At the same time specificity is strongly higher than for the other tools, and so, balance accuracy is the best among all tools.
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 220 | 161 | 15 | 34 | 430 | 95.74 | 15.90 | 57.74 | 55.82 |
CAMPR3(SVM) | 225 | 164 | 10 | 31 | 430 | 93.62 | 17.44 | 57.84 | 55.53 |
ADAM | 228 | 184 | 7 | 11 | 430 | 97.02 | 5.64 | 55.34 | 51.33 |
MLAMP | 193 | 149 | 42 | 46 | 430 | 82.13 | 23.59 | 56.43 | 52.86 |
DBAASP | 206 | 131 | 29 | 64 | 430 | 87.66 | 32.82 | 61.13 | 60.24 |
AMPA | 168 | 77 | 67 | 118 | 430 | 71.49 | 60.51 | 68.57 | 66.00 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 220 | 161 | 15 | 34 | 430 | 95.74 | 15.90 | 57.74 | 55.82 |
CAMPR3(SVM) | 225 | 164 | 10 | 31 | 430 | 93.62 | 17.44 | 57.84 | 55.53 |
ADAM | 228 | 184 | 7 | 11 | 430 | 97.02 | 5.64 | 55.34 | 51.33 |
MLAMP | 193 | 149 | 42 | 46 | 430 | 82.13 | 23.59 | 56.43 | 52.86 |
DBAASP | 206 | 131 | 29 | 64 | 430 | 87.66 | 32.82 | 61.13 | 60.24 |
AMPA | 168 | 77 | 67 | 118 | 430 | 71.49 | 60.51 | 68.57 | 66.00 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 220 | 161 | 15 | 34 | 430 | 95.74 | 15.90 | 57.74 | 55.82 |
CAMPR3(SVM) | 225 | 164 | 10 | 31 | 430 | 93.62 | 17.44 | 57.84 | 55.53 |
ADAM | 228 | 184 | 7 | 11 | 430 | 97.02 | 5.64 | 55.34 | 51.33 |
MLAMP | 193 | 149 | 42 | 46 | 430 | 82.13 | 23.59 | 56.43 | 52.86 |
DBAASP | 206 | 131 | 29 | 64 | 430 | 87.66 | 32.82 | 61.13 | 60.24 |
AMPA | 168 | 77 | 67 | 118 | 430 | 71.49 | 60.51 | 68.57 | 66.00 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 220 | 161 | 15 | 34 | 430 | 95.74 | 15.90 | 57.74 | 55.82 |
CAMPR3(SVM) | 225 | 164 | 10 | 31 | 430 | 93.62 | 17.44 | 57.84 | 55.53 |
ADAM | 228 | 184 | 7 | 11 | 430 | 97.02 | 5.64 | 55.34 | 51.33 |
MLAMP | 193 | 149 | 42 | 46 | 430 | 82.13 | 23.59 | 56.43 | 52.86 |
DBAASP | 206 | 131 | 29 | 64 | 430 | 87.66 | 32.82 | 61.13 | 60.24 |
AMPA | 168 | 77 | 67 | 118 | 430 | 71.49 | 60.51 | 68.57 | 66.00 |
The results presented in Table 3 show that for the development of the predictive models for particular species, special approaches are required.
It is interesting to test the tools on the sets convenient for all. Among the considered six tools, DBAASP has the most restrictions for input sequences. Taking into account this fact, we create the positive test set, which is a set of linear peptides selected from O-DAMPD dataset. Positive set consists of 221 peptides (L-DAMPD, Supplementary Table S7), negative set is not changed (O-DAMPD-N, Supplementary Table S2). The results of prediction on last sets have been presented in Table 4. As we can see, the values of balance accuracy for DBAASP become closer to CAMP3 tools. We must note that we cannot take into account information about C-terminal amidation of the peptide sequences since this information is not available from the DAMPD dataset. So real data for sensitivity and balance accuracy for DBAASP will be higher (Vishnepolsky and Pirtskhalava, 2014). Most other tools (except MLAMP) show slightly lower values of sensitivity and balance accuracy than it was on O-DAMPD dataset.
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 202 | 381 | 18 | 1981 | 2583 | 91.40 | 83.87 | 34.65 | 87.64 |
CAMPR3(SVM) | 189 | 410 | 32 | 1952 | 2583 | 85.52 | 82.64 | 31.55 | 84.08 |
ADAM | 198 | 845 | 23 | 1517 | 2583 | 89.59 | 64.23 | 18.98 | 76.91 |
MLAMP | 164 | 481 | 57 | 1881 | 2583 | 74.21 | 79.64 | 25.43 | 76.93 |
DBAASP | 169 | 238 | 52 | 2124 | 2583 | 76.47 | 89.92 | 41.52 | 83.20 |
AMPA | 75 | 253 | 146 | 2109 | 2583 | 33.94 | 89.29 | 22.87 | 64.66 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 202 | 381 | 18 | 1981 | 2583 | 91.40 | 83.87 | 34.65 | 87.64 |
CAMPR3(SVM) | 189 | 410 | 32 | 1952 | 2583 | 85.52 | 82.64 | 31.55 | 84.08 |
ADAM | 198 | 845 | 23 | 1517 | 2583 | 89.59 | 64.23 | 18.98 | 76.91 |
MLAMP | 164 | 481 | 57 | 1881 | 2583 | 74.21 | 79.64 | 25.43 | 76.93 |
DBAASP | 169 | 238 | 52 | 2124 | 2583 | 76.47 | 89.92 | 41.52 | 83.20 |
AMPA | 75 | 253 | 146 | 2109 | 2583 | 33.94 | 89.29 | 22.87 | 64.66 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 202 | 381 | 18 | 1981 | 2583 | 91.40 | 83.87 | 34.65 | 87.64 |
CAMPR3(SVM) | 189 | 410 | 32 | 1952 | 2583 | 85.52 | 82.64 | 31.55 | 84.08 |
ADAM | 198 | 845 | 23 | 1517 | 2583 | 89.59 | 64.23 | 18.98 | 76.91 |
MLAMP | 164 | 481 | 57 | 1881 | 2583 | 74.21 | 79.64 | 25.43 | 76.93 |
DBAASP | 169 | 238 | 52 | 2124 | 2583 | 76.47 | 89.92 | 41.52 | 83.20 |
AMPA | 75 | 253 | 146 | 2109 | 2583 | 33.94 | 89.29 | 22.87 | 64.66 |
Tool . | TP . | FP . | FN . | TN . | Total . | Sens (%) . | Spec (%) . | Prec (%) . | Bal acc (%) . |
---|---|---|---|---|---|---|---|---|---|
CAMPR3(RF) | 202 | 381 | 18 | 1981 | 2583 | 91.40 | 83.87 | 34.65 | 87.64 |
CAMPR3(SVM) | 189 | 410 | 32 | 1952 | 2583 | 85.52 | 82.64 | 31.55 | 84.08 |
ADAM | 198 | 845 | 23 | 1517 | 2583 | 89.59 | 64.23 | 18.98 | 76.91 |
MLAMP | 164 | 481 | 57 | 1881 | 2583 | 74.21 | 79.64 | 25.43 | 76.93 |
DBAASP | 169 | 238 | 52 | 2124 | 2583 | 76.47 | 89.92 | 41.52 | 83.20 |
AMPA | 75 | 253 | 146 | 2109 | 2583 | 33.94 | 89.29 | 22.87 | 64.66 |
Thus, we can say that different tools have various areas of application and this fact has to be taken into account in selection of the appropriate tool. So, CAMP3 and ADAM tools can be used for predicting wide spectrum of antimicrobial peptides. Other tools have narrow area of application: DBAASP can be used for the prediction of linear peptides, MLAMP is aimed for prediction antimicrobial families and AMPA better works for peptides, which have activity against P. aeruginosa.
Funding
This work has been supported by the International Science and Technology Center (Grant No. G-2102) and Shota Rustaveli National Science Foundation (Grant No. FR/397/7-180/14).
Conflict of Interest: none declared.
References
The Uniprot Consortium. (