Sentra (), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.
Recent experimental and in silico studies have resulted in a much better understanding of the principles and mechanisms of prokaryotic signal transduction (1–6). The list of recognized environmental sensors has been dramatically expanded and now includes, in addition to two-component histidine kinases and methyl-accepting chemotaxis proteins, Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases (2–10). These classes of proteins are also found as (predicted) cytoplasmic proteins, proposed to function as sensors of the intracellular biochemical parameters, such as pH, osmolarity or levels of oxygen, CO, NO and other molecules (2,10). Accordingly, many prokaryotic genomes contain multiple copies of the respective genes, whose exact functions (i.e. the parameters sensed by their protein products) are rarely known. Detailed analyses of protein sets involved in signal transduction in such model organisms as Escherichia coli, Bacillus subtilis, Pseudomonas aeruginosa, Synechocystis sp. PCC6803, Anabaena sp. PCC7120 or Halobacterium salinarum brought very interesting results and provided needed insight into the signal transduction mechanisms. In silico studies have contributed by highlighting such phenomena as the abundance of (predicted) diguanylate cyclases and c-di-GMP phosphodiesterases in many bacterial genomes, the importance of cross-talk between different signaling pathways and the existence of a complex system of intracellular signaling (2,3,10).
Progress in understanding of prokaryotic signal transduction systems, as well as availability of a large number of newly sequenced genomes, prompted us to perform a major update of Sentra (), a database of signal transduction proteins developed by the Bioinformatics group at Argonne National Laboratory (13,14). The objective of further development of Sentra was to provide users with an analytical environment containing expert-curated information describing prokaryotic signal transduction systems, as well as up-to-date knowledge base and interactive analytical tools for further analysis of signal transduction proteins in all completely sequenced genomes as they become publicly available. Such an environment will add accuracy and sensitivity to the sequence analysis of signal transduction proteins and aid in the development of conjectures regarding the nature of the transmitted signal. The previous release of Sentra featured signal transduction proteins encoded in 43 completely sequenced genomes (14). Although it contained all complete, public genomes at the time of publication, it was missing a number of valuable data and analytical capabilities. For example, it did not include diguanylate cyclases or c-di-GMP phosphodiesterases and did not support cross-genome comparative analysis of signal transduction systems (14). Further, since most components of the signal transduction machinery are multi-domain proteins, they are notoriously difficult to annotate through automated sequence comparisons and are commonly misannotated in genomic databases (10,15). Discovery of new domains often makes the existing annotations incomplete or even obsolete. To provide the solution to this problem, Sentra was redesigned to perform periodic (monthly) automated updates that include automated pre-computed analysis of newly sequenced genomes and re-analysis of existing Sentra genomes with an array of bioinformatics tools including InterPro (16), Blocks (17), BLAST (18), TMHMM (19) and tools developed by our group (e.g. Dremmel, and Chisel, ). The results of these automated analyses are presented to the users in Sentra's interactive environment for further updates and annotation. The most significant changes in Sentra database content, capabilities and user interface are as follows.
Update of the Sentra database content
Sentra now consists of two principal components: (i) a manually curated list of signal transduction proteins that includes proteins derived from 202 completely sequenced prokaryotic genomes, and (ii) an automatically generated listing of predicted signaling proteins in 235 genomes that are awaiting manual curation.
The expert-curated section of the database now lists, besides two-component histidine kinases and response regulators, Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews (2,10,12).
Support for comparative and evolutionary analysis of signal transduction proteins and signaling pathways
In the process of adaptation to environment, prokaryotic organisms have developed an ability to detect and process environmental signals that are vital for their survival. Sentra provides a unique opportunity to explore and compare the signaling apparatus of prokaryotes according to their habitat (e.g. aquatic, terrestrial), lifestyle (e.g. pathogenic) and major physiological features (e.g. energy source, motility). Users can also perform comparative analysis of signal transduction proteins characteristic of different taxonomic groups of organisms in the framework of the signaling pathways from the KEGG database (20). This capability allows identification of signaling pathways and mechanisms characteristic of particular taxonomic groups and habitats.
Sentra leverages the PUMA2 (21) system for high-throughput analysis of genomes being developed by the Bioinformatics group at Argonne. Such a connection allows Sentra to support comparative analysis of the prokaryotic signal transduction systems at multiple levels of organization: users may explore domain and feature composition of signal transduction proteins and perform interactive analysis of sequences by over 30 bioinformatics tools. All entries in Sentra are annotated with the information from the PUMA2 knowledge base integrating information from over 20 sequence, structural, metabolic and taxonomic databases, as well as the derived results from various bioinformatics tools. Sentra also contains information regarding participation of the signal transduction proteins in conserved chromosomal gene clusters (22). Such information may provide important clues regarding the nature of the transmitted signal.
Support for user annotation of signal transduction proteins
One of the important new features of Sentra is its support for the user annotation of the signal transduction proteins via the PUMA2 framework. Registered users can interactively analyze the sequences, correct functional assignment and provide detailed comments. Such capability will allow us to leverage an enormous expert knowledge accumulated in the scientific community for annotation of information in the Sentra database. All computationally intensive operations in Sentra are performed using the Grid technology-based engine GADU (23) being developed by the Bioinformatics group at Argonne.
As new completely sequenced microbial genomes become publicly available, they will be processed through the automated pipeline and included in quarterly updates of the database. These genomes will also be subject to manual curation of the overall protein lists and orthology groupings. We also intend to provide manually curated lists of proteins containing certain signal transduction domains, such as PAS (24) and FHA (25).
This work was supported by the Office of Biological and Environmental Research, US Department of Energy, under Contract DE-AC02-06CH11357 and by the Intramural Research Program of the NIH, National Library of Medicine. N.M. and E.M.G. acknowledge membership within and support in part from the Region V ‘Great Lakes’ Regional Center of Excellence in Biodefense and Emerging Infectious Diseases Consortium (GLRCE, NIAID Award 1-U54-AI-057153). M.D. acknowledges membership and support to NMPDR Bioinformatics Resource Center NIH/NIAID (Award NNSN 266200400042C). We are grateful to Luke Ulrich for his work on PhyloBlocks. Funding to pay the Open Access publication charges for this article was provided by the Intramural Research Program of the NIH, National Library of Medicine.
Conflict of interest statement. None declared.