Abstract

Summary: Based on SOAP(Simple Object Access Protocol) technology, the SubLoc server/client suite offers a user-friendly interface for searching and predicting protein subcellular location.

Availability:SubLoc SOAP server is available through HTTP protocol at . The client software can be downloaded at . More information is given at .

Contact:sunzhr@mail.tsinghua.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

INTRODUCTION

Proteins are sorted into different cellular compartments such as cytoplasm, nuclear region, mitochondrion, etc. or may be secreted out of the cell, and their proper functioning relies on this precise process of subcellular localization. Hence, subcellular location information may imply the function. DBSubLoc (Guo et al., 2004) is a protein subcellular localization annotation database, which is available at . The database contains >60 000 protein sequences from virus, bacteria, fungi, plant and animal. However, its service is via WWW and the user interface is built on web browsers, which is designed to be accessed by humans, not by machines. Thus, it is troublesome for users to use DBSubLoc in an automated manner.

We adapted the SOAP (Simple Object Access Protocol. ) technology to DBSubLoc database and developed a server/client suite to facilitate automated information retrieval. SOAP uses XML to define an extensible messaging framework providing a message construct that can be exchanged over a variety of underlying protocols. Because of its simplicity and extensibility, it is being widely applied to more and more bioinformatics databases such as KEGG (Kanehisa et al., 2004; ) and NCBI Entrez Utilities Web Service (Maglott et al., 2005) (). The SOAP server enables users to write their own programs to automatically access the DBSubLoc server. A client with GUI interface is also provided for users who are not familiar with programming.

OVERVIEW

The DBSubLoc SOAP server provides seven functions for remote users to access via HTTP protocol. Of the seven functions three are for querying and searching the database with Swiss-Prot ID (Boeckmann et al., 2003), Gene Ontology (GO) ID (Ashburner et al., 2000), DBSubLoc ID, protein name and BLAST (Altschul et al., 1997), one is for submitting users' new sequences to the DBSubLoc database, while the other three are for protein subcellular location prediction with support vector machines (SVM) (Hua and Sun, 2001) method and PSORT (Nakai and Horton, 1999) program. More details of these functions are shown in Table 1. Supplementary tables describe the objects and returned values.

Table 1

Methods implemented in the DBSubLoc server

Method Parameters Description 
name_search Protein name, such as p53, actin and hemoglobin Search DBSubLoc for proteins whose names contain the queried keyword 
id_search Protein ID for Swiss-Prot, DBSubLoc or GO. For example, O03478 (SWISSPROT id), 10021133 (DBSubLoc id) or 0005739 (GO id) Search DBSubLoc for corresponding proteins according to queried id 
blast_search Protein sequence and E-value threshold Search DBSubLoc by sequence homology search with BLAST 
eu_predict Protein sequence Predict eukaryotic protein subcellular location with SVM method 
pro_predict Protein sequence Predict prokaryotic protein subcellular location with SVM method 
psort_predict Protein sequence Predict protein subcellular location with PSORT 
feed_entry Protein sequence and subcellular location information Users could submit new protein sequences and subcellular location information to DBSubLoc via this function 
Method Parameters Description 
name_search Protein name, such as p53, actin and hemoglobin Search DBSubLoc for proteins whose names contain the queried keyword 
id_search Protein ID for Swiss-Prot, DBSubLoc or GO. For example, O03478 (SWISSPROT id), 10021133 (DBSubLoc id) or 0005739 (GO id) Search DBSubLoc for corresponding proteins according to queried id 
blast_search Protein sequence and E-value threshold Search DBSubLoc by sequence homology search with BLAST 
eu_predict Protein sequence Predict eukaryotic protein subcellular location with SVM method 
pro_predict Protein sequence Predict prokaryotic protein subcellular location with SVM method 
psort_predict Protein sequence Predict protein subcellular location with PSORT 
feed_entry Protein sequence and subcellular location information Users could submit new protein sequences and subcellular location information to DBSubLoc via this function 

The SOAP server also comes with the WSDL (Web Services Description Language, ) file at , which makes it easy for users to program with a number of programming languages to access the SOAP server. For instance, with Perl version 5.8 and SOAP::Lite module, searching DBSubLoc database by protein name ‘p53’ would be as simple as thes following example:

  • $wsdl = ‘’;

  • $serv = SOAP::Lite→service($wsdl);

  • $res=$serv→name_search(‘p53’);

More detailed examples in several programming languages are available at .

A statically precompiled binary of this open-source client with graphic user interface is available for download at , together with screenshots, manual and FAQ. Executable binary files for Linux system and Microsoft Windows 2000/XP are also provided. Besides query by protein ID and name, the client software allows the user to load files containing multiple sequences in FASTA format for homology search with BLAST or for subcellular location prediction with SVM and PSORT. The result of searching and prediction could be saved to a user-defined text file.

IMPLEMENTATION

The SOAP server was implemented with Perl version 5.8. The server employs SOAP::Lite module to implement the SOAP technology through HTTP protocol. MySQL version 4.0 serves as the database back end. The HTTP server daemon is Apache. The client is dependent on three Perl modules: Tk module for drawing GUI interface, SOAP::Lite module for communicating with the SOAP server and BioPerl module for parsing FASTAfiles.

We are grateful to anonymous reviewers for their constructive comments. This work was supported by Foundational Science Research Grant from the 973 project (2003CB715900), a National Nature Science Grant (No. 90303017, No. 90408019) and the 863 projects (2002AA234041, 2002AA231031).

Conflict of Interest: none declared.

REFERENCES

Altschul
S.F.
, et al.  . 
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
 , 
1997
, vol. 
25
 (pg. 
3389
-
3402
)
Ashburner
M.
, et al.  . 
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
Nat. Genet.
 , 
2000
, vol. 
25
 (pg. 
25
-
29
)
Boeckmann
B.
, et al.  . 
The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003
Nucleic Acids Res.
 , 
2003
, vol. 
31
 (pg. 
365
-
370
)
Guo
T.
, et al.  . 
DBSubLoc: database of protein subcellular localization
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
D122
-
D124
)
Hua
S.
Sun
Z.
Support vector machine approach for protein subcellular localization prediction
Bioinformatics
 , 
2001
, vol. 
17
 (pg. 
721
-
728
)
Kanehisa
M.
, et al.  . 
The KEGG resource for deciphering the genome
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
D277
-
80
)
Maglott
D.
, et al.  . 
Entrez Gene: gene-centered information at NCBI
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
D54
-
D58
)
Nakai
K.
Horton
P.
PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization
Trends Biochem. Sci.
 , 
1999
, vol. 
24
 (pg. 
34
-
36
)

Author notes

Associate Editor: Alvis Brazma

Comments

0 Comments