CEP server ( http://bioinfo.ernet.in/cep.htm ) provides a web interface to the conformational epitope prediction algorithm developed in-house. The algorithm, apart from predicting conformational epitopes, also predicts antigenic determinants and sequential epitopes. The epitopes are predicted using 3D structure data of protein antigens, which can be visualized graphically. The algorithm employs structure-based Bioinformatics approach and solvent accessibility of amino acids in an explicit manner. Accuracy of the algorithm was found to be 75% when evaluated using X-ray crystal structures of Ag–Ab complexes available in the PDB. This is the first and the only method available for the prediction of conformational epitopes, which is an attempt to map probable antibody-binding sites of protein antigens.
Antigen–antibody (Ag–Ab) complexes are non-obligatory heterocomplexes that are made and broken according to the environment and involve proteins that also exist independently. The most remarkable features of this special class of protein–protein interactions are high affinity and strict specificity of antibodies for their respective antigens. It is known that antibodies recognize the unique conformations and spatial locations on the surface of antigens. Therefore, epitopes are defined as the portions of the antigen molecules that interact with the antigen-binding site of antibody (paratope) to which they are complementary ( 1 ). The number of epitopes of every protein is equivalent to the number of monoclonal antibodies that can be generated against the protein. Delineation of epitopes for any protein antigen corresponds to the summation of the immune repertoire specific for the antigen in various hosts.
Epitopes are of two types, namely, sequential (when Ab binds to a contiguous stretch of amino acid residues that are linked by peptide bond) and conformational (when Ab binds to non-contiguous residues, brought together by folding of polypeptide chain). The specificity of sequential epitopes (SEs) is determined by the sequence and conformation of constituent amino acids. However, specificity of conformational epitopes (CEs) depends on the spatial folding and the conformation of the contributing individual SEs ( 2 ).
Various algorithms have been developed to predict SEs given a protein sequence ( 3 – 7 ). Most of these algorithms use the propensity values of amino acid properties, such as hydrophilicity, antigenicity, segmental mobility, flexibility and accessibility to predict antigenicity. The accuracy of these algorithms lies in the range of 35–75%. However, no algorithm is available to predict the CE or antibody-binding sites of antigenic proteins.
It is known from the analyses of the crystal structures of Ag–Ab complexes that in order to be recognized by the antibodies, the residues must be accessible for interactions and thus be present on the surface of antigens. Therefore, an algorithm has been developed to predict SE and CE of the protein antigens with known 3D structure using accessibility in an explicit fashion ( 8 , 9 ) as against the algorithms mentioned above, a few of which use accessibility implicitly.
The predicted epitopes have applications in designing experiments for characterizing the antibody-binding sites of protein antigens. The method can be applied to engineer the ‘designer binding sites’ that mimic the interacting surface of the antigen, which is of immense use in the field of immunodiagnostics. Similarly, predicted epitopes can also be used to identify the candidate peptides for the development of peptide vaccines.
DESIGN AND IMPLEMENTATION
The CEP is implemented on Apache server with Linux 9.2 as an operating system. The web interface is designed using CGI Perl and JAVA scripts. The visualization package, Jmol, which is an open source software suite ( http://jmol.sourceforge.net/ ), has been plugged in to facilitate visualization of the predicted conformational and sequential epitopes. Results are displayed in html format.
The algorithm predicts epitopes of protein antigens with known structures. It uses accessibility of residues and spatial distance cut-off to predict antigenic determinants (ADs), CEs and SEs [( 8 , 9 ); U. Kulkarni-Kale and A. S. Kolaskar, unpublished data].
The steps are as follows:
Calculation of percentage accessibility of residues using an implementation of Voronoi polyhedron ( 10 ).
Identification of antigenic residues with percentage accessible surface area ≥25.
Delineation of ADs, if at least three contiguous accessible residues are present.
Extension of AD towards N′- and C′-termini by allowing the grace of one inaccessible residue.
Prediction of CE by collapsing ADs that are within the spatial proximity of 6 Å.
Identification of SE that are not part of any CE.
Inclusion of individual accessible residues that are part of CE and SE.
Listing of AD, CE and SE.
Define subsets for representing AD, CE and SE graphically.
Evaluation of algorithm
Accuracy of the algorithm has been critically evaluated using 21 Ag–Ab co-crystal structures from PDB ( 11 ). Evaluation dataset comprises two categories, namely, antigens characterized using multiple antibodies and those characterized using a single antibody. Antibody-binding site (BS) for every Ag–Ab complex was defined as the sum of the residues that interact with antibody (IR) and those that are buried under antibody (BR). The list of IR was generated using the CONTACSYM program ( 12 ) and the van der Waal's contact distances ( 13 ). The BRs were defined as those which loose at least 1 Å 2 accessible surface area upon the formation of complex with antibody. CE/antibody-binding sites of antigens were predicted for evaluation dataset. A prediction is said to be correct, if ≥60% of the BS residues are part of a predicted CE. Of the 21 BS analysed, CEP algorithm correctly predicted 16, giving an overall accuracy of 75% [( 9 ); U. Kulkarni-Kale and A. S. Kolaskar, unpublished data].
How to use the server?
A snapshot of CEP server is shown in Figure 1 . User can predict CEs by either entering the PDB ID or uploading the coordinate file in PDB format. In the case of proteins with more than one chain, server prompts for selection of individual chains and/or oligomer. The server takes ∼2 min to compute and display the results for a protein of size 250 residues.
CEP server requires the 3D coordinate data in PDB format. A sample input file is provided for reference. It is recommended to submit structures for prediction with hydrogen atoms added; however, this is not mandatory.
Output is generated in html format. Predicted AD and CE are listed as separate tables. Predicted epitopes of lysozyme, PDB ID: 1FDL ( 14 ) are shown as Figure 2 . As shown in Figure 2 , AD number is followed by the chain ID and amino acid sequence of predicted AD along with the start and end positions. The residues that satisfy the percentage accessibility criterion are shown in uppercase, whereas those that do not satisfy this criterion are shown in lowercase. They are part of AD as they fulfil the criteria for extension (see Algorithm). The reference AD for every CE is shown in green and individual residues that are part of CE and SE are also listed ( Figure 2 ). The predicted AD, CE and SE can be visualized by clicking the respective radio buttons. The experimentally characterized BS for the evaluation dataset can be visualized by clicking the BS radio button.
In order to facilitate faster access, automated predictions have been carried out. Predictions for ∼1800 proteins with resolution better than 1.5 Å (PDB release dated April 5, 2005) are available on the server and can be browsed using PDB ID.
A method to predict SEs and CEs is described above. The algorithm predicts epitopes using the 3D structure data of protein antigens. Assignment of protein function requires both, structure and interaction data. The algorithm described above is a step towards the new paradigm ‘binding-determines function’.
Currently, no computational approaches are available to predict antibody-binding sites of protein antigens. Solution to the problem of identification of antibody-binding sites, even if approximate, will help in designing experiments to map the residues involved at the Ag–Ab interface. The algorithm reported here predicts probable antibody-binding sites of protein antigens.
The CEP algorithm has been evaluated using a curated dataset consisting of 21 co-crystal complexes available in PDB. The detailed evaluation of one of the PDB entries, 1FDL, is discussed. The binding site of 1FDL ( 14 ) is made up of 18 residues, consisting of 4 SEs, namely 18–19, 21–27, 116–121 and 124–126. The predicted CE5 of 1FDL best represents the BS and contains AD: 112–122 and AD: 13–24 along with individual accessible residues numbered 33, 34, 109 and 125. Thus, CEP server correctly predicts 13 out of the 18 BS residues, i.e. 72%. Since, 1FDL satisfies the objective criterion of prediction of ≥60% of BS residues for a given structure, it contributes positively to the calculation of the overall accuracy. Similar analyses were carried out for the remaining 20 structures and the evaluation results for 21 PDB files are available on the CEP server ( http://bioinfo.ernet.in/cep.htm ).
In the process of evaluation, it was observed that the algorithm predicts relatively larger binding sites for a few antigens, which may appear as false-positive predictions. An explanation for this can be drawn from the analyses of structures of antigens with multiple antibodies, such as lysozyme and neuraminidase. Complexes of lysozyme with various Abs have shown that the Abs of lysozyme have overlapping binding sites. Furthermore, it has also been shown that the same residue of an antigen may be a part of different epitopes and interact in a unique manner with respective paratopes ( 15 ). Thus, the residues that may appear ‘additional’ in the predicted CE need not be referred to as false positives, since the algorithm predicts all possible binding sites of the given antigens. Hence, the predicted CE/Ag-binding site is the sum of the binding sites of the individual antibodies.
Visualization and mapping of the predicted ADs and CEs on a given 3D structure enhances utility of the server. It must be mentioned that the usability of this server is limited by the availability of the 3D structure data of protein antigens. However, in the realm of structural genomics ( 16 ), we believe that structural information of proteins will no longer be a rate-limiting factor.
U.K.K. acknowledges Dr John Wootton, NCBI, USA for useful suggestions and comments. The project is funded by the Department of Biotechnology, Government of India. Contributions of Ms Sunitha Manjari in the evaluation of the server are acknowledged. The Open Access publication charges for this article have been waived by Oxford University Press.
Conflict of interest statement . None declared.