mmCSM-AB: guiding rational antibody engineering through multiple point mutations

Abstract While antibodies are becoming an increasingly important therapeutic class, especially in personalized medicine, their development and optimization has been largely through experimental exploration. While there have been many efforts to develop computational tools to guide rational antibody engineering, most approaches are of limited accuracy when applied to antibody design, and have largely been limited to analysing a single point mutation at a time. To overcome this gap, we have curated a dataset of 242 experimentally determined changes in binding affinity upon multiple point mutations in antibody-target complexes (89 increasing and 153 decreasing binding affinity). Here, we have shown that by using our graph-based signatures and atomic interaction information, we can accurately analyse the consequence of multi-point mutations on antigen binding affinity. Our approach outperformed other available tools across cross-validation and two independent blind tests, achieving Pearson's correlations of up to 0.95. We have implemented our new approach, mmCSM-AB, as a web-server that can help guide the process of affinity maturation in antibody design. mmCSM-AB is freely available at http://biosig.unimelb.edu.au/mmcsm_ab/.

• DFIRE / dFIRE: The changes in DFIRE/dFIRE interaction energy between antibody and antigen chains were calculated as described in Sirin et al (8): All wild-type and mutant complexes were split into complex, antibody and antigen structures to calculate individual energy as the equation (1); then, calculated the difference of interaction energy between wild-type and mutant (2) • FoldX: The interaction energy between antibody and antigen groups was calculated by FoldX Analysecomplex module on wild-type and mutant complexes. The change in binding affinity upon mutations was determined by equation (3).
• bASA: The buried accessible surface area (bASA) upon mutations was computed using NACCESS as described in Sirin et al (8): firstly, the bASA of wildtype and mutant antibody-antigen complexes were calculated by the equation (4) then subtracted the bASA of mutant from wild (5).
• LISA: The local interaction signal analysis (LISA) uses contact information across the binding interface of antibody-antigen complexes to calculate protein-protein binding affinity. The changes in LISA binding affinity upon mutations were calculated on both wild and mutant antibody-antigen complexes as described by (6).
• PRODIGY: The PRODIGY determines the binding affinity of the protein-protein complex using the number of interfacial contacts and the properties of the noninteracting surface in the complex. The changes in PRODIGY binding affinity were obtained as equation (7).
• CcharPPI: CcharPPI is a webserver that provides over 100 protein-protein interactionrelated descriptors as scoring functions which only calculate the energy of a given complex without leading any structural changes. We selected all descriptors in the beginning but filtered top 7 best performing tools, SIPPER, ZRANK, ZRANK2, FIREDOCK, FIREDOCK_AB, ROSETTADOCK, INSIDE to compare with other available tools.

Feature engineering
In the feature engineering step, we calculated different classes of features and evaluated each of the features separately. Then, essential or good performing features were only used to build the final predictive mmCSM-AB model.
• Energetic terms: Interaction energy is one of the most commonly used scoring functions for assessing mutational effects. We used the FoldX AnalyseComplex module to calculate interaction energy changes on wild and mutant antibody-antigen complexes.
• Interatomic interactions: Atomic interactions such as Hydrogen bond, Ionic, Aromatic, Covalent, VDW, Hydrophobic, Metalsulpur-PI, Amide-Amide, Amide-Ring, PI-PI, Carbon-Pi interactions across antibody-antigen binding interfaces can have significant effects in binding affinity and specificity of antibodies. We implemented Arpeggio (9) to calculate the difference between wild and mutant atomic interactions.
• Solvent accessible area: In a protein binding interface, the relative solvent accessible area (RSA) can be a good marker to assess the significance of conformational changes upon mutation. Using DSSP (10), we measured the RSA changes between wild and mutant antibody-antigen complexes.
• Distance changes: One of the most distinctive mutational effects in binding interfaces can be measured by distance changes between antibody antigen chains. To avoid getting the distance from the same antibody or antigen chain, we adapted the difference of the closest distance from the mutation site to its binding partner only.
• Evolutionary score: In the evolutionary aspect, functionally important sites tend to be conserved to keep the protein stability or protein binding affinity. Using the PSI-BLAST (11), we were able to obtain position specific evolutionary score from Position Specific Scoring Matrix (PSSM) with the following parameters: evolutionary scoring matrix = PAM30, num_iterations = 3, evalue = 1E-10, seg = Yes, comp_based_stats = 1, and db = swissprot.