- Split View
-
Views
-
Cite
Cite
Reshmi Ramakrishnan, Bert Houben, Łukasz Kreft, Alexander Botzki, Joost Schymkowitz, Frederic Rousseau, Protein Homeostasis Database: protein quality control in E.coli, Bioinformatics, Volume 36, Issue 3, February 2020, Pages 948–949, https://doi.org/10.1093/bioinformatics/btz628
- Share Icon Share
Abstract
In vivo protein folding is governed by molecular chaperones, that escort proteins from their translational birth to their proteolytic degradation. In E.coli the main classes of chaperones that interact with the nascent chain are trigger factor, DnaK/J and GroEL/ES and several authors have performed whole-genome experiments to construct exhaustive client lists for each of these.
We constructed a database collecting all publicly available data of experimental chaperone-interaction and -dependency data for the E.coli proteome, and enriched it with an extensive set of protein-specific as well as cell context-dependent proteostatic parameters. We made this publicly accessible via a web interface that allows to search for proteins or chaperone client lists, but also to profile user-specified datasets against all the collected parameters. We hope this will accelerate research in this field by quickly identifying differentiating features in datasets.
The Protein Homeostasis Database is freely available without any registration requirement at http://PHDB.switchlab.org/.
1 Introduction
In an effort to elucidate which features determine chaperone dependency in E.coli, we collected data from all hitherto published large-scale chaperone interaction studies into a meta-dataset (Arifuzzaman et al., 2006; Calloni et al., 2012; Chapman et al., 2006; Deuerling et al., 2003; Fan et al., 2016; Fan et al., 2017; Fujiwara et al., 2010; Houry et al., 1999; Kerner et al., 2005; Martinez-Hackert and Hendrickson, 2009; Mogk et al., 1999; Niwa et al., 2009, 2012). Interestingly, we found overlap between these studies was very poor, and hypothesized therefore that chaperone dependency is not only governed by protein-intrinsic parameters, but also by cellular context, which likely differs between different experimental approaches. Hence, we designed an inclusive classification scheme that takes into account all the studies mentioned above, and complemented this data with a range of experimentally determined proteostatic parameters (abundance, translation rates, solubility, etc.) as well as simple primary-sequence-based calculations (net charge, amino acid composition, etc.), structural features (secondary structure content, contact order, etc.) and bioinformatics predictions (aggregation tendency from TANGO, disorder from IUPred, etc.). We are now making this dataset publicly available through a web interface that not only makes the data readily accessible and easily searchable, but also offers preliminary analysis options, including comparisons with proteome distributions and direct retrieval of significantly distinguishing features between user-defined groups of proteins.
2 Database
The full dataset, constructed as described in the introduction, contains over a hundred proteostatic parameters for 4305 E.coli proteins. The data sources used in compiling the database are listed on the website’s About page, along with a detailed overview of which study provided which parameter. This information can also be found in our earlier publication describing the original application of our database (Ramakrishnan et al., 2019).
3 Website
In order to provide a user-friendly interface for this complex dataset, we developed a web interface that allows to (i) obtain the client lists of different chaperone fluxes, (ii) inspect features of individual proteins and (iii) perform group analyses on user-defined sets. An overview of the database construction methodology and the analysis functionality offered by the website is shown in Figure 1.
3.1 Technical implementation
The dataset was imported into a MySQL database, on top of which an interactive frontend was written using AngularJS. The visualizations are dynamically created with the help of D3.js and ECharts. The communication between the frontend and the data model is handled by PHP.
3.2 Chaperone client flux view
The web interface homepage contains an overview of the different chaperone fluxes followed by E.coli proteins. Each group title links to a Browse page (see Section 3.3) containing the subset of E.coli proteins in the specified chaperone flux, as determined previously (Ramakrishnan et al., 2019).
3.3 Browse
The Browse tab gives access to a page which allows for detailed filtering of the full dataset on any feature or any combination of features. Through the ‘select’ button, users can select single proteins or protein sets based on filtering options or simply on UniProt accession numbers.
3.4 Protein view
Quick searching using a UniProt accession number from the homepage or selecting an entry from the Browse page leads to a protein view page which shows the values of all the parameters within our database for the selected protein (Fig. 1c). Where possible, these values are plotted either as simple bar plots, or as violin plots depicting the distribution of the entire dataset, and the value of the selected protein. This page provides a convenient way of browsing through protein parameters and comparing protein characteristics with their respective proteome distributions. Upon selecting multiple proteins in the Browse tab, users are given a ‘compare’ option, which allows for a comparison of the selected proteins. Similar to the single-protein view, values for each element in the group are plotted, alongside a representation of proteome distributions (Fig. 1d). This analysis allows for rapid identification of common features within a group, as well as determination of outliers i.e. proteins that do not follow group patterns.
3.5 Comparing saved groups
Finally, users have the option of saving selected groups with user-defined names. Through the ‘compare’ button, saved groups can then be compared with each other. This yields combined violin- and boxplots showing the distributions of each feature for the selected groups, as well as for the proteome background. To readily identify interesting features, a volcano plot is also generated, depicting the most extreme fold change between all groups versus the negative logarithm of Kruskal-Wallis P-value (Fig. 1e, lower panel). In doing so, this plot readily offers information on how strongly the selected groups differ, as well as the statistical significance of these differences. Features above specific thresholds are indicated in red and hovering over the data points in the volcano plot shows the correlated feature, which allows for convenient identification of significantly distinguishing characteristics.
Acknowledgements
We thank the following researchers for feedback on methods and calculations: Hideki Taguchi (Tokyo, Japan), Tamir Tuller (Tel Aviv, Israel), Wim Vranken (Brussels, Belgium), James McInerney (Manchester, UK) and Geert Molenberghs (K.U. Leuven, Belgium).
Funding
The Switch Laboratory was supported by grants from the European Research Council under the European Union's Horizon 2020 Framework Programme ERC Grant agreement 647458 (MANGO) to JS, the Flanders institute for biotechnology (VIB), the University of Leuven (‘Industrieel Onderzoeksfonds’), the Funds for Scientific Research Flanders (FWO), the Flanders Agency for innovation by Science and Technology (IWT, SBO grant 60839) and the Federal Office for Scientific Affairs of Belgium (Belspo), IUAP, grant number P7/16. R.R. was supported by an Erasmus Mundus fellowship.
Conflict of Interest: none declared.
References
Author notes
Reshmi Ramakrishnan and Bert Houben wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.