Improving B-cell epitope prediction and its application to global antibody-antigen docking

Motivation: Antibodies are currently the most important class of biopharmaceuticals. Development of such antibody-based drugs depends on costly and time-consuming screening campaigns. Computational techniques such as antibody–antigen docking hold the potential to facilitate the screening process by rapidly providing a list of initial poses that approximate the native complex. Results: We have developed a new method to identify the epitope region on the antigen, given the structures of the antibody and the antigen—EpiPred. The method combines conformational matching of the antibody–antigen structures and a specific antibody–antigen score. We have tested the method on both a large non-redundant set of antibody–antigen complexes and on homology models of the antibodies and/or the unbound antigen structure. On a non-redundant test set, our epitope prediction method achieves 44% recall at 14% precision against 23% recall at 14% precision for a background random distribution. We use our epitope predictions to rescore the global docking results of two rigid-body docking algorithms: ZDOCK and ClusPro. In both cases including our epitope, prediction increases the number of near-native poses found among the top decoys. Availability and implementation: Our software is available from http://www.stats.ox.ac.uk/research/proteins/resources. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.


Data
The datasets used in this study are presented in the sections below.

X-test
The data used in dataset X-test is presented in Table 2.
Hpr, a phosphocarrier protein of the phosphoenolpyruvate:sugar phosphotransferase system of Escherichia coli 1jhl H H L E -------Envelope protein of the West Nile virus Table 3. Summary of the homology data in dataset H-test.

H-test
The summary of data constituting dataset H-test is given in Table 3. Figure 1. The visualization shows a single candidate patch on the surface, presented in three ways (cartoon, spheres and surface). This exemplifies the depth sampling used in this work: 4.5Å cutoff and depth of 3. The red residue is the central amino acid which initiates the patch (depth = 1). The green residues correspond to the neighborhood of the first residue (depth = 2). The teal residues are those within 4.5Å from the residues in green (depth = 3).

Sampling the patches on the antigen surface
Our method of sampling candidate surface patches consisted of extending each surface exposed residue on the antigen with the surface neighborhood. A surface neighborhood is created by successive additions of the surface residues within a certain cut-off distance to those already in the patch. Thus there are two parameters in this procedure: neighbor cutoff and the number of iterations of extending the neighborhood (depth). See Figure 1 for an example.
We have estimated the best parameter configuration for the patch sampling on our training set consisting of crystal structures (clearly excluding the entries in X-test). Using several values for these parameters, we have looked at average precision and recalls of the best, top five and top ten sampled patches given in Tables 4, 5 and 6 respectively. We concluded that average precision in the region of 30% and recall in the region of 80% was achieved for the cut-off distance of 4.5Å and depth 3 which are used as standard parameters in the manuscript.  Table 6. Best sampled patches: out of 10 best on SAbDab-nr minus X-test.

Calculating the precision score
We have performed local ZDOCK runs on each of the 118 targets. The constraints for the antibody are defined as the paratope residues extended by the surface residues within 5Å of it. The corresponding constraints for the antigen consist of the epitope, also extended by 5Å from it. Epitope and paratope are defined as sets of residues on the antigen and antibody respectively where for each residue in one set there exists at least one in the other whose inter-molecular distance to the first residue is less than 4.5Å .
We have collected the top 200 decoys for each of the 118 targets, as ordered by ZDOCK, which resulted in a set of 23,600 decoys. Whenever an antibody-antigen residue pair (T ab , T ag ) was observed within 4.5Å in any of the 23,600 decoys, we note down if it was a true positive (TP) or false positive (FP) with respect to the native structure (corresponding native contact is defined as within 4.5Å in the native structure for true positive). In this way, for each pair of residue types on the antibody and the antigen (T ab , T ag ), we obtain the number of times ZDOCK matched them correctly, denoted as T P (T ab , T ag ) and the corresponding number of incorrect matches given as F P (T ab , T ag ). We use the number of true positives and false positives to define the precision score for a given antibody-antigen residue pair as given by 1: An analogous procedure was applied to train EpiPred for use with H-test, by removing those pdbs from SAbDab-nr that shared more than 90% sequence identity with the antigens or more than 99% sequence identity with antibodies in H-test. Figure 2. Histogram of the distribution of differences between intra-molecular distances between pairs of interacting residues. The blue points are the absolute numbers of times a given distance was observed whereas the red ones indicate a local mean of such points.

Inter-molecular distances
In order to motivate our choices of parameters for EpiPred, we have estimated the distribution of inter-molecular distances between pairs of interacting residues. Take node n 1 which stands for a contact between antibody residue r ab1 and antigen residue r ag1 and node n 2 with antibody residue r ab2 and antigen residue r ag2 . Define dist(r ab1 , r ab2 ) as the intra-molecular distance between the two residues r ab1 and r ab2 . Define dist(r ab1 , r ab2 ) as the intra-molecular distance between the two residues r ab1 and r ab2 . Figure 2 gives the distribution of intra-molecular distance differences (|dist(r ab1 , r ab2 ) − dist(r ag1 , r ag2 )|) for each pair of interacting residues in dataset SAbDab-nr that were not in X-test.
It appears that majority of the differences fall in the region of 0 to 3Å after which point the numbers become lower. For this reason we have tested EpiPred on the structures in SAbDab-nr that were not in X-test using several intra-molecular difference cutoffs: 0.1Å, 0.5Å, 1.0Å, 1.5Å, 2.0Å, 2.5Å and 3.0Å. EpiPred achieved similar results for cut-offs of 1Åand higher with poorer results (close to random) for cut-offs of 0.1Å and 0.5Å. Since the lowest cut-off (1Å) which produced satisfying results was least computationally expensive, it is used as the default cut-off in EpiPred. 5 Evaluating the performance of directionality of predictions.
There are eight binding modes of antibodies to lysozyme found in the PDB, (see Figure 3), five of them are full antibodies and three are camelid. The five non-camelid binding modes which bind to three distinct epitopes were used for the analysis presented in the manuscript.

Supplementary tables
The detailed docking results tables are given in 7 for X-test and 8 for H-test. Tables contrasting the results with lengths of H3 are given in as Tables 9 and 10.
7 Difference between the performance of EpiPred homology model and crystal structure datasets In order to evaluate how different performance of EpiPred is on X-test and H-test, we have compared their sample average precisions and recalls. In order to achieve the estimates of the average means and precision, we have sampled precision and recalls values from both datasets. For instance, one sample for the average precision and recall of X-test would consist of 30 precisions and recalls sampled at random with replacement form the precision-recall pair values available in X-test. For each sample of 30 precision-recall pairs we have recorded the average precision and recall from those 30 pairs. Similar procedure was applied to H-test. In total we have sampled 10 7 averages from X-test and H-test. We have fitted a line through the precision-recall pairs for the averages of X-test and H-test (see Figure 4). The lines plotted in this way for X-test and H-test cannot be called statistically significantly different when their slope and intercept are compared.
8 Evaluating the performance of the EpiPred and the global docking pipeline on a blind test case.
Our collaborators from UCB Pharma provided us with a blind test case to evaluate our epitope prediction and global docking pipelines. We were given a sequence of the antibody and the crystal structure of the antigen it forms a complex with. The antigen structure was an asymmetric homo-dimer. For both epitope prediction and docking, we needed a structure of the antibody. This structure was modeled using PIGS ( [1]). We have not used RosettaAntibody due to its unavailability at the time ( [2]). We have not modeled H3 using FREAD ( [3]) as the programs library did not have appropriate fragments for the this particular instance of the loop. We have predicted the top three epitope patches using standard parameters of EpiPred used throughout this manuscript. The top epitope prediction is incorrect placing the candidate epitope on the wrong end of the asymmetric homo-dimer structure of the antigen. Nevertheless, the second one overlaps greatly with the actual epitope (see Figure 5). We have performed docking of the antibody homology model to the antigen structure using ClusPro ( [4]). There was only one decoy out of 28 returned with I rmsd at 10.645Å which was low enough to be tentatively classified as close to native . All the other decoys had considerably higher I rmsd values. This decoy was at seventh position as ordered by ClusPro, but was brought to the third position using our re-scoring pipeline. This top decoy superimposed on the native complex is shown in Figure 5.   We present the top three epitope predictions returned by EpiPred. For smaller antigens, it was impossible to return more than one or two top epitopes as a result of overlap cut-off set at 30%. The best epitope field refers to the first occurrence of a suitable epitope prediction among the top three returned. The global docking results are annotated with indications of whether the re-scored list of decoys was better or not with the following meanings: (+) re-scoring improved the result, (-) re-scoring made the result worse, (0) re-scoring did not improve the result and there exist close to native decoys, (n/a) no suitable decoys were available. Top 1st  Top 2nd Top 3rd  T1  T5  T10  T1  T5  T10  Table 9. Table summarizing the results of epitope prediction on the X-test set. We present the top EpiPred prediction and the corresponding results for DiscoTope 2.0 using a score threshold of -3.7. The values in bold indicate the best prediction result. Precision and recall were computed by the following formulas: precision = T P/(T P + F P ), recall = T P/(T P + F N ) where TP stands for true positives, FP for false positives and FN for false negatives. In each case we also give the Matthews Correlation Coefficient (MCC). As control, the corresponding result using randomized score is give for each target. This Table corresponds to Table 1 in the manuscript. Lengths of CDR-H3 are given for each structure so as to provide a contrast between the prediction results and the relative difficulty of modeling of this loop.  The top second epitope prediction on the blind test-case is shown in red. Note that it covers the region where the actual epitope is as indicated by the native contacting antibody (in green). This epitope prediction achieved 52% precision and 69% recall. Right: The best decoy returned by ClusPro (green) contrasted with the native position of the antibody (teal). Notice that the antibody is rotated correctly and the discrepancy is only due to lateral translation.