-
PDF
- Split View
-
Views
-
Cite
Cite
Agnes Meyder, Stefanie Kampen, Jochen Sieg, Rainer Fährrolfes, Nils-Ole Friedrich, Florian Flachsenberg, Matthias Rarey, StructureProfiler: an all-in-one tool for 3D protein structure profiling, Bioinformatics, Volume 35, Issue 5, March 2019, Pages 874–876, https://doi.org/10.1093/bioinformatics/bty692
- Share Icon Share
Abstract
Three-dimensional protein structures are important starting points for elucidating protein function and applications like drug design. Computational methods in this area rely on high quality validation datasets which are usually manually assembled. Due to the increase in published structures as well as the increasing demand for specially tailored validation datasets, automatic procedures should be adopted.
StructureProfiler is a new tool for automatic, objective and customizable profiling of X-ray protein structures based on the most frequently applied selection criteria currently in use to assemble benchmark datasets. As examples, four dataset configurations (Astex, Iridium, Platinum, combined), all results of the combined tests and the list of all PDB Ids passing the combined criteria set are attached in the Supplementary Material.
StructureProfiler is available as part of the ProteinsPlus web service http://proteins.plus and as standalone tool in the NAOMI ChemBio Suite. Dataset updates together with the tool can be found on http://www.zbh.uni-hamburg.de/structureprofiler.
Supplementary data are available at Bioinformatics online.
1 Introduction
Three-dimensional structure models are the foundation of structural bioinformatics. The information content from a protein structure is tightly coupled to the richness of supporting experimental data. Depending on the application scenario, different sets of quality criteria should be applied to select structure collections. Use cases are for example molecular dynamics simulations as well as docking and scoring of protein-ligand complexes. Developing methods in this regime demand large datasets to allow statistically sound validation. For this purpose the Astex Diverse Set (85 protein-ligand complexes, Hartshorn et al., 2007), Iridium HT (207 protein-ligand complexes, Warren et al., 2012) and the Platinum (4548 ligands for bioactive conformation prediction, Friedrich et al., 2017) dataset were created spanning a decade of available structures in the Protein Data Bank (PDB, Gutmanas et al., 2014). For all three sets, the authors published extensive information about their selection criteria and multi-tool chains. The selection criteria catalog for Astex controls some model parameters such as the resolution, as well as ligand characteristics [Lipinski’s Rule of 5 by Lipinski et al. (1997)]. The Iridium dataset is based on criteria supposed to be applied on top of the Astex criteria and emphasize the necessity of available high quality experimental data. The Platinum dataset has added control against a high diffraction precision index (DPI, Goto et al., 2004), Rfree and bond angle and length deviations for the ligand. Astex and Iridium needed manual curation to control against the structure’s electron density support, Platinum uses the newly developed electron density score for individual atoms and molecular fragments (EDIAm,Meyder et al., 2017) to objectively automate the estimation of electron density support on the atomic level. While Platinum receives frequent updates, both Astex and Iridium remain static due to the high manual workload. Additionally, none of the tool chains of the three datasets are readily accessible and modifiable. Aiming to satisfy the aforementioned demands we developed StructureProfiler as part of the NAOMI ChemBio Suite. We provide configurations which are highly similar to the selection criteria of Astex, Iridium and Platinum. StructureProfiler is also integrated in our free web service ProteinsPlus (Fährrolfes et al., 2017; Fig. 1).

PDB ID 1nax evaulated by the StructureProfiler with the combined criteria set
2 Materials and methods
All selection criteria available in StructureProfiler are listed in the Supplementary Table S1. The real-space correlation coefficient is implemented as published by Jones et al. (1991). Due to the varying implementations of the RSCC [see (Tickle, 2012) for a discussion on this topic] we use EDIAm or a tailored variant in the case of Iridium for the validation as a clearly defined, reproducible way to estimate electron density support. The Iridium dataset allows up to two heavy atoms not to be supported by electron density. Thus, we adjusted the EDIAm to leave out the two worst scored atoms calling the variation EDIAi further on. We also added selection criteria like B Factor distribution to extend the criteria catalog beyond those of the three datasets.
3 Results
StructureProfiler was validated against the three aforementioned datasets. In the following, the most important discrepancies per dataset are briefly discussed and full results can be found in the Supplementary Section S2. Electron density maps were downloaded from PDBe (Gutmanas et al., 2014). As a final application, we are profiling the PDB (downloaded on 2018-02-21, maximum resolution of 3.5 Å resolved with X-ray) with the combined criteria set. The PDB Ids annotated with ligand identifiers currently passing the combined filter criteria are given in the Supplementary Material. We plan to regularly update this list and provide all test results on http://www.zbh.uni-hamburg.de/structureprofiler.
3.1 Astex diverse set
All ligands in the Astex set need to fulfill the Lipinski Rule of five. We detected in G17905 (905, 1ygc) 8 Lipinski donors and 11 acceptors (Supplementary Fig. S2). Furthermore, a linker is prohibited but present in DFPP-G (HA1, 1v48, Supplementary Fig. S1). As EDIA is more sensitive to atoms inconsistently supported by electron density, four ligands with low EDIAm values were detected (Meyder et al., 2017). Additional information can be found in Supplementary Section S2.1.
3.2 Iridium HT
Twelve ligands with more than two atoms inconsistently supported by electron density in regards to EDIAi were found (Supplementary Fig. S4). Also, four cases with crystal symmetry contacts closer than 6 Å were detected (Supplementary Fig. S3). In the case of Alpha(2, 3)-Sialyllactose in chain C, asparagine E 7 is only 2.48 Å away. Additionally, three active sites have atoms with an occupancy below 1 (Supplementary Section S2.2). One of these residues is in close proximity to the ligand and does not meet the requirements of the Iridium HT set.
3.3 Platinum
196 ligands without full occupancies were detected. We also found 240 EDIAm violations. Besides an EDIAm software update, we switched from the now defunct electron density server EDS to retrieve the maps from the PDBe. This e.g. resulted in an EDIAm score drop from 0.84 (good) to 0.54 (medium) in the case of AO1 (1r5g) as an extreme case. Discussion of the bond length and angle violations can be found in Supplementary Section S2.3. We also controlled the Platinum set with the combined criteria set detecting 19 intermolecular clashes between active site and ligand among others. This shows that datasets applicable for one use case may not be fitting for other ones.
3.4 Usage
StructureProfiler is available as part of our ProteinsPlus web service. Enter the PDB ID of your interest into the text field and then select the tool StructureProfiler on the right side of the web page. One of the four configurations (astex-like, iridium-like, platinum-like and combined) can be selected. Failed tests and substructures with at least one failed test are marked in red. All results and configuration files can be downloaded as INI/CSV files. The usage description of the customizable command line tool can be found in the Supplementary Section S1.1.
4 Conclusion
StructureProfiler assembles the currently relevant structure quality criteria catalog in a configurable standalone, easy-to-use tool, which is also available on the web. It allows rapid screening of inhouse data as well as easy repeated screenings of public databases. Due to the use of EDIA, it reduces human curation to a minimum in terms of electron density support control thus solving the up to now existing bottleneck in dataset curation. StructureProfiler serves as the next step towards the creation of large high quality datasets for docking, 3D-QSAR and the many new machine-learning-based applications appearing right now.
Conflict of Interest: none declared.
References