DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes

Abstract Motivation Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. Results Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. Availability and implementation The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.

. Performance of all methods under lDDT measurement for 5 nanobody complex targets and 3 antibody-antigen complex targets in the accuracy estimation of interface residues in CASP15.Users can enter the model data of the complex structure through the text box or upload the PDB file (2).Users can add multiple complex structures for evaluation via the "Add model" and "Remove model" buttons or upload all complex structures that need to be evaluated in a single zip file (3).Users can optionally provide an "Email" to receive result notifications and a "Job name" (4).Submit or reset tasks via the "Submit" or "Reset" buttons.

Supplementary Figures
Figure S1.Ranking of the methods for interface residue precision estimation in CASP15 according to the sum of average Z-scores for lDDT (Red), CAD (blue), PatchDockQ (light red) and PatchQS(cyan).The Z-score of each item is weighted according to the Z-score of Pearson, Spearman and AUC according to the weight of 0.1:0.5:1.The data comes from the CASP15 official website (https://predictioncenter.org/casp15/qa_local.cgi).The group name of DeepUMQA3 in CASP15 is "GuijunLab-RocketX".

Figure S2 .
Figure S2.Performance of methods for assessing the accuracy of interface residues under lDDT measurement in CASP15.(A), (B), and (C) are the pirate graphs of Pearson, Spearman, and AUC for all participating methods on 39 targets, respectively.(D) is the ranking of Z-score of Pearson, Spearman and AUC.

Figure S3 .
Figure S3.The Pearson correlation coefficient under lDDT measurement of the top 5 methods for interface residue accuracy evaluation on the 39 targets in CASP15.The targets in the gray boxes are the targets that were missed when DeepUMQA3 was submitted the results, and we evaluated them using the programs provided by the assessor.The target in the yellow box is the nanobody complex, and the target in the blue box is the antibody-antigen complex.

Figure S4 .
Figure S4.Pirate graphs on different performance indicators of DeepUMQA3 for predicting perresidue lDDT of overall complex in CASP15.On the left is the pirate graph of Local QA, and on the right is the pirate graph of Global QA.The higher the Pearson, Spearman, and Kendall, the stronger the correlation between the predicted lDDT and the real lDDT.The higher the AUC, the stronger the ability of DeepUMQA3 to distinguish high-/low-precision residues/models.The smaller the MAE, the difference in lDDT is smaller.

Figure S5 .
Figure S5.An example of DeepUMQA3 evaluating the model on the structural model T1170TS494_5o of the target T1170.T1170 is a homomer composed of 6 identical monomer structures, containing 1908 residues.(A) Head-to-head comparison of the predicted interface residue lDDT with the real lDDT.(B) The real interface residue lDDT (left) and the predicted interface residue lDDT (right) in the structural model.The colored parts represent the interface residues, and red to blue represents lDDT from 0 to 100.(C) Real lDDT (cyan) and predicted lDDT (other colors) for all residues in the overall protein complex, with predicted lDDT for different monomers indicated in different colors.

Figure S6 .
Figure S6.DeepUMQA3 server job submission.The DeepUMQA3 server is integrated into the DeepUMQA server, and the submission of complex model quality assessment tasks can be started by clicking the "Complex assessment" button (1).Users can enter the model data of the complex structure through the text box or upload the PDB file (2).Users can add multiple complex structures for evaluation via the "Add model" and "Remove model" buttons or upload all complex structures that need to be evaluated in a single zip file (3).Users can optionally provide an "Email" to receive result notifications and a "Job name" (4).Submit or reset tasks via the "Submit" or "Reset" buttons.

Figure S7 .
Figure S7.Example of DeepUMQA3 result web page.For each evaluated complex model, the results page displays three aspects.(1) The model structure with color markings, where the color ranges from red to blue, indicating the lDDT scores of residues from 0 to 100.Users can view the overall structure or interface residue structure by clicking on the "Show interface"/ "Show overall" button.(2) The lDDT curves of all residues in the whole complex, the curves of different chains have different colors; the residues marked with plum color bars in the figure are interface residues.(3) The interface residues precision of all chains, different chains are represented by different colors.Users can download each result individually or download a compressed package of all results.

Figure S8 .
Figure S8.The running time of the DeepUMQA3 web server (CPU only) on the 35 CASP15 protein complexes with a length of less than 3,000 amino acids.In summary, for protein complexes with a length of less than 750 amino acids, the runtime is within 10 minutes.For complexes with a length of around 1000 amino acids, the runtime does not exceed 15 minutes.And for complexes with a length of around 2000 amino acids, the runtime is approximately 30 minutes.

Figure S9 .
Figure S9.The results of testing the throughput of the DeepUMQA3 web server using ApacheBench.We simulated the situation where 10 users access the DeepUMQA3 network server 100 times at the same time (each user visits 10 times).The results show that all 100 visits were successful (Failed requests: 0), and the total time consumed (Time taken for tests) was 10.597 seconds.The average user access time (Time per request) and the web server average access time (Time per request, across all concurrent requests) are 1059.719msand 105.972ms respectively.The throughput rate (Requests per second) is 9.44 times per second.

Table S3 .
Performance of DeepUMQA3 for predicting per-residue lDDT of the overall complex on different type of targets in CASP15 (Local QA level).

Table S4 .
Performance of DeepUMQA3 for predicting per-residue lDDT of the overall complex on different type of targets in CASP15 (Global QA level).Global QA is computed based on the global lDDT and the real lDDT of each model.Pearson, Spearman, and Kendall are used to measure the correlation of the predicted model's global lDDT with the real global lDDT.AUC is used to measure the ability of the predicted global lDDT to discriminate high-/low-precision models.MAE is used to measure the difference between the predicted model's global lDDT and the real global lDDT.

Table S5 .
Performance of DeepUMQA3 and other participating methods for overall complex accuracy evaluation measured by TM-score and global lDDT in CASP15.Pearson and Spearman are used to measure the correlation of the predicted model's global accuracy with the real global accuracy.AUC reflects the ability of the model quality evaluation method to distinguish between high-accuracy and low-accuracy models.Loss represents the difference between the selected best model based on evaluation scores and the real best model, reflecting the ability of the model quality evaluation method to select the best model.