PMIpred: a physics-informed web server for quantitative protein–membrane interaction prediction

Abstract Motivation Many membrane peripheral proteins have evolved to transiently interact with the surface of (curved) lipid bilayers. Currently, methods to quantitatively predict sensing and binding free energies for protein sequences or structures are lacking, and such tools could greatly benefit the discovery of membrane-interacting motifs, as well as their de novo design. Results Here, we trained a transformer neural network model on molecular dynamics data for >50 000 peptides that is able to accurately predict the (relative) membrane-binding free energy for any given amino acid sequence. Using this information, our physics-informed model is able to classify a peptide’s membrane-associative activity as either non-binding, curvature sensing, or membrane binding. Moreover, this method can be applied to detect membrane-interaction regions in a wide variety of proteins, with comparable predictive performance as state-of-the-art data-driven tools like DREAMM, PPM3, and MODA, but with a wider applicability regarding protein diversity, and the added feature to distinguish curvature sensing from general membrane binding. Availability and implementation We made these tools available as a web server, coined Protein-Membrane Interaction predictor (PMIpred), which can be accessed at https://pmipred.fkt.physik.tu-dortmund.de.

Data processing.Within our NN, we use a tokeniser to assign a unique integer (1-20) to each AA type (Fig. S2A).Despite all training sequences being 24 AAs long already, we applied padding (adding a "0" token at the end) up to a fixed maximal length of 24 to enable the model to also process shorter sequences (as present in the test sets).
Because the position of an amino acid in a peptide sequence is relevant for its function, we added a vector containing this positional information to the token vector (Fig. S2A).Importantly, the weights linked to this so-called positional embedding, can be adjusted during training to tune the influence of the respective positions and AA types.Model architecture.The final architecture of our transformer NN includes a series of 6 transformer blocks that use multi-head attention to learn patterns in the training data sequences (Fig. S2B).An output dense layer translates the encoded data into a float value; the predicted sensing free energy ∆∆F .Model validation & testing.Since the training set includes highly similar sequences from late iterations of the evolutionary algorithm, we validated the model using a more challenging target: 987 fully independent, randomly generated sequences (7-24 amino acids long), that were split into a validation set and a test set (494 and 493 sequences, respectively), keeping the latter completely separate from the training process.The ∆∆F values of these test set sequences, predicted by the final model, strongly correlated with the MD results (R 2 = 0.80, MSE= 1.58 kJ 2 mol −2 , RMSE= 1.26 kJ mol −1 , Fig. S3).
Additionally, to assess the performance of our model across the entire freeenergy spectrum, we evaluated 96 sequences (24 residues each) that were not part of the training data and that ranged from 0.0 to -30.0 kJ mol −1 .Here, again, we found a close correlation between ∆∆F values predicted by the transformer model and MD-calculated free energies (R 2 = 0.95, MSE= 3.21 kJ 2 mol −2 , RMSE= 1.79 kJ mol −1 , Fig. 1B in the main text).For all test sets, the transformer model notably outperformed our previously trained CNN (van Hilten et al. [2023], Table S1).The validation set, test set, and full-range test set are all available at https://github.com/nvanhilten/PMIpred.

PMIpred
Table S1: Model comparison.R 2 , mean squared error (MSE, in kJ 2 mol −2 ), and root mean squared error (RMSE, in kJ mol −1 ) between MD-calculated and NN-predicted ∆∆F for our previous CNN (van Hilten et al. [2023]) and our current transformer model on the three test sets (with n sequences).The model was initialised using the glorot uniform initialiser.ReLu activation was applied to every dense layer, except the final one.We used the Adamax algorithm as the optimisation method during training.Data shuffling was switched off.We applied a learning rate scheduler that started at 10 −4 and linearly decreased the learning rate by 1.5 • 10 −6 per epoch until the 30 th iteration, after which it was kept constant at 5.5 • 10 −5 .
Hyperparameter testing was performed using Optuna's (Akiba et al. [2019]) TPESampler in combination with a HyperbandPruner that sampled hyperparameter settings from predefined ranges (see below), and pruned the training of poor performing models after 30 iterations (based on the validation MSE).The hyperparameters we considered were: • The embedding dimension, i.e. the length of the vector in which the tokens and positions are embedded, was chosen between 64 and 256.This also determines the number of nodes in the dense layers within the transformer blocks.• The number of transformer blocks, i.e. the 'depth' of the model, ranged between 4 and 12. • The number of attention heads in the multi-head attention layer ranged between 6 and 12.Each head can recognise a pattern after training.• The dropout rate within the transformer blocks (to prevent overtraining) ranged from 0.05 to 0.30.• The number of nodes in the final dense layer, before generating output, was varied from 8 to 64.
Using fixed initialisation seeds and a batch size of 64, the best model (embedding dim.=247, 6 transformer blocks, 8 attention heads, dropout=0.052,and 38 nodes in the final dense layer, see Fig. S2B) had an MSE of 1.83 kJ 2 mol −2 for the validation set and 1.58 kJ 2 mol −2 for the test set (Fig. S3).

S2 The role of secondary structure
We assumed α-helical secondary structure in the MD simulations for all sequences in our training set.This assumption is based on the fact that most natural peptides or protein motifs that are known to bind to the surface of membranes indeed fold into an α-helix to optimally orient fatty amino acids toward the hydrophobic membrane core and polar residues toward the water phase.Many peptides (e.g. the ALPS motif) actually only fold into this amphipathic helical secondary structure when bound to curved membranes, whereas they remain unstructured in solution (Drin and Antonny [2010]).Atomistic MD simulations of the N-terminal helix of endophilin's N-BAR domain on curved and flat membranes even showed that binding and folding are cooperative processes in the sense that α-helical folding is promoted by the presence of hydrophobic lipid packing defects and that, in turn, these defects are stabilised by the folded peptide (Cui et al. [2011]).
Beyond the notion that α-helicity and (curvature-specific) membrane binding are two intertwined phenomena, a practical reason for us to enforce helical folding is that the Martini force field (Souza et al. [2021]) we used in our simulations does not model changes in secondary structure.Instead, secondary structure elements are explicitly imposed through proper dihedrals between the backbone beads (Monticelli et al. [2008]).
To address the question whether folding would affect the sensing free energy ∆∆F , we performed additional MD simulations of sequences sampled across the full applicability domain (∆∆F roughly ranging from 0 to -30) with the peptides either explicitly modelled as an α-helix or as an unstructured coil.When plotting the resulting free-energy differences for the two distinct secondary structures (Fig. S4), we find that: 1.They strongly correlate (R 2 = 0.86), which suggests that the activity is dominated by the general amino acid content rather than the secondary structure (as already suggested for antimicrobial peptides (Wimley [2010])).2. Peptides with a low |∆∆F | are more potent when modelled as a coil (left side of the dashed line in Fig. S4).We argue that this is due to the unstructured peptides being sufficiently flexible to be able to reorient their hydrophobic and hydrophilic residues to optimally match the membrane surface's amphipathic character.3. Peptides with a high |∆∆F | are more potent when modelled as a helix (right side of the dashed line in Fig. S4).This can be explained by a preorganisation effect: the fixed helices may penetrate the membrane slightly further (thus inducing more tension) because the entropic penalty to do so is lower than for the unstructured peptides.4. Peptides with a high hydrophobic moment (brighter colours in Fig. S4) yield higher |∆∆F | values in a helical fold than as a coil.This is in line with the notion that there is a synergy between amphipathicity and (curvaturespecific) membrane binding, as evident from the many examples in natural proteins.S3 Protein test sets for benchmarking  (Chatzigoulas and Cournia [2022b]).The last 8 proteins are lipid packing defect sensors.The 27 proteins taken together are referred to as the "full test set".We used AlphaFold (AF) (Jumper et al. [2021], Varadi et al. [2022]) structures only in cases where there was no (complete) PDB file available.For every protein we used the exact same structure file (either from the PDB or AF) for the analysis.
We should note that DREAMM, PPM3, and MODA were not developed to handle unstructured/uncertain regions of AF models (pLDDT < 70), but that we still left these regions in to allow comparison of the full-length proteins.From left to right, columns give: The protein name (PI=phosphatidylinositol, PC=phosphatidylcholine); The PDB or AF database ID; The experimentally determined membrane-interacting residues; The corresponding membrane-interacting regions in our segment-based binary classification approach; References to the original literature.

S4 Benchmarking results
Table S3: Benchmarking results.Segment-based binary classification results for three stateof-the-art methods (DREAMM (Chatzigoulas and Cournia [2022b,a]), PPM3 (Lomize et al. [2022]), MODA (Kufareva et al. [2014])) and our PMIpred web server on the proteins from Table S2.TP = true positive, FP = false positive, TN = true negative, FN = false negative.The Matthews correlation coefficient (MCC, Matthews [1975], eq.S1) is a commonly used metric to score binary classifier predictions that is also suitable if the classes are differently sized (as is the case in our benchmark set, with more negatives than positives).It ranges from MCC = −1 (total disagreement between prediction and observation) to MCC = 1 (full agreement between prediction and observation, with MCC = 0 being the expected value for a random prediction.The false discovery rate (FDR, eq.S2) is the fraction of positive observations (i.e.experimentally determined membrane-interacting regions) that are falsely predicted.The perfect classifier would have a FDR of 0%, i.e. none of the 'hits' are false positives.

FDR = FP FP + TP (S2)
The false omission rate (FOR, eq.S3) is the fraction of negative observations (i.e.regions that do not interact with membranes) that are falsely omitted.The perfect classifier would have a FOR of 0%, i.e. none of the experimentally determined membrane-interacting regions are missed.

FOR = FN FN + TN (S3)
The MCC, FDR, and FOR values that are reported in the main text are calculated from the summed values (bottom four rows in Table S3) using eq.S1, eq.S2, and eq.S3, respectively.
Table S4: Scoring binary predictions.MCC, FDR, and FOR values for three state-ofthe-art methods (DREAMM (Chatzigoulas and Cournia [2022b,a]), PPM3 (Lomize et al. [2022]), MODA (Kufareva et al. [2014])) and our PMIpred web server on the proteins from Table S2.Values are calculated using eq.S1-S3 on the summed values in the bottom four rows of Table S3.MCC data are also plotted in Fig. 3D

S6 Lipid-anchored proteins
Lipid-anchored proteins are typically characterized by two features: a membrane-interacting protein domain and a covalently linked lipid anchor.For example, the GTPase KRAS features (1) a (polybasic) C-terminal region that interacts with anionic lipids in the membrane and (2) a Cys-linked farnesyl group that anchors it to the hydrophobic membrane interior.
We argue that the first of these features can be identified with PMIpred, whereas the second cannot.For KRAS in particular, we see that the (variable) polybasic tail are correctly identified as membrane active when using a negatively charged target membrane in our model (Fig. S6), even though it does not consider the farnesyl modification in this domain.PMIpred result for KRAS (PDB: 4dso).Orange indicates membrane-active (curvature-sensing) regions.

S7 Transmembrane proteins
For monofunctional glycosyltransferase (a single-pass transmembrane (TM) protein), both PMIpred and DREAMM correctly predict most of the Nterminal helix as 'membrane binding' (Fig. S7A).In contrast, none of the TM helices from the β 2 adrenergic G protein-coupled receptor (GPCR) are identified as such by either web server (Fig. S7B), due to the crowded protein environment and consequent low 'solvent'-accessibility.The same becomes apparent when studying the homotetramer of the influenza A virus' M2 proton channel (Fig. S7C): both PMIpred and DREAMM prominently highlight the amphipathic helix (which is a known membrane binder and curvature inducer (Jung et al. [2019])), but not the TM part.In contrast, isolating one of the monomers renders the TM residues more 'solvent'-accessible, causing the full protein chain to be labeled membrane-interacting by both methods (Fig. S7D).

Fig
Fig. S1 ∆∆F distribution of the training data.∆∆F distribution of all 53,940 sequences in the training data.

Fig
Fig. S2 Transformer details.A) Schematic description of how positional embedding is implemented in the transformer model."[..X...]" represents a vector of a token or the position of that token in the sequence.B) Architecture of the final transformer model.

Fig
Fig. S3 Correlation between transformer-predicted and MD-calculated ∆∆F values.Correlation plot for 493 randomly generated peptide sequences (length 7-24, not part of the training data) in the test set.MD-calculated values are averages and standard deviations for 3 independent replicas (500 ns simulation per run, as described in ref. (van Hilten et al. [2023])).

Fig. S5
Fig. S5 Sequence length affects classification.A) The distribution of predicted non-binders, sensors, binders in the combined validation and test set (987 randomly generated sequences in total) per sequence length.Predictions were done for a negatively charged target membrane (i.e. using ∆∆F adj ).B) The normalised error for different sequence lengths in the combined validation and test set.Insets give the number of sequences per bin.

Table S2 :
The protein test set.Overview of the protein structures used for benchmarking PMIpred against DREAMM, PPM3, and MODA (Fig.3Din the main text).The top 19 proteins are the same as used by Chatzigoulas & Cournia's earlier benchmark The bottom four rows give the summed values for the respective test sets.The raw data is available at https://github.com/nvanhilten/PMIpred.
in the main text.