Abstract

Motivation

We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides.

Results

Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR–peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences.

Availability and implementation

The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Predicting whether T cells recognize peptides presented on cells is a fundamental step towards the development of personalized treatments to enhance the immune response, like therapeutic cancer vaccines (Buhrman and Slansky, 2013; Corse et al., 2011; Hundal et al., 2020; McMahan et al., 2006; Meng and Butterfield, 2002; Slansky et al., 2000). In the human immune system, T cells monitor the health status of cells by identifying foreign peptides on their surface (Davis and Bjorkman, 1988; Krogsgaard and Davis, 2005). The T-cell receptors (TCRs) are able to bind to these peptides, especially if they originate from an infected or cancerous cell. The binding of TCRs—also known as TCR recognition—with peptides, presented by major histocompatibility complex (MHC) molecules in peptide-MHC (pMHC) complexes, constitutes a necessary step for immune response (Glanville et al., 2017; Rowen et al., 1996). Only if TCR recognition takes place can cytokines be released, which leads to the death of a target cell.

TCRs consist of an α- and a β-chain whose structures determine the interaction with the pMHC complex. Each chain consists of three loops, referred to as complementarity-determining regions (CDR1–3). It is believed that the CDR3 loops primarily interact with the peptide of a given pMHC complex (Feng et al., 2007; La Gruta et al., 2018; Rossjohn et al., 2015). Supplementary Material S1 depicts the 3D structure of a TCR–pMHC complex.

Recent discoveries (Dash et al., 2017; Lanzarotti et al., 2019) have demonstrated that both the CDR3α- and β-chains carry information on the specificity of the TCR toward its cognate pMHC target. Obtaining information about paired TCR α- and β-chains requires specific and expensive experiments, like single-cell (SC) sequencing, which limits its availability. Conversely, the bulk sequencing of a population of cells reactive to a peptide is cheaper, but it only allows to gain information about either the α- or the β-chain.

In this work, we propose Attentive Variational Information Bottleneck (AVIB) to predict TCR–peptide interactions. AVIB is a multi-sequence generalization of Variational Information Bottleneck (VIB) (Alemi et al., 2016). Notably, we introduce Attention of Experts (AoE), a novel method for combining single-sequence latent distributions into a joint multi-sequence latent encoding distribution using self-attention. Owing to its design, AoE can naturally leverage the abundant available data where either the CDR3α or the CDR3β sequence is missing when estimating the multi-sequence variational posterior. The model learns to predict whether the binding between the peptide and the TCR takes place or not.

Extensive experiments demonstrate that AVIB significantly outperforms state-of-the-art methods. In addition, the probabilistic nature of the VIB framework allows to provide estimates on the uncertainty of AVIB’s predictions. We empirically show that AVIB can be used for out-of-distribution (OOD) detection of amino acid sequences without supervision.

1.1 Background and related works

1.1.1 TCR–pMHC and TCR–peptide interaction prediction

Several recent works have investigated TCR–pMHC and TCR–peptide interaction prediction. Various proposed approaches operate simple CDR3β sequence alignment (Chronister et al., 2021; Wong et al., 2019). TCRdist computes CDR similarity-weighted distances (Dash et al., 2017). SETE adopts k-mer feature spaces in combination with principal component analysis and decision trees (Tong et al., 2020). Various methods adopt Random Forest to operate classification (De Neuter et al., 2018; Gielis et al., 2019; Springer et al., 2020). ImRex tackles the problem with a method based on convolutional neural networks (CNNs) (Moris et al., 2021). TCRGP is a classification method which leverages a Gaussian process (Jokinen et al., 2019). ERGO is a deep learning approach which adopts long short-term memory networks and autoencoders to compute representations of peptide and CDR3β (Springer et al., 2020). ERGO II (Springer et al., 2021) is an updated version of ERGO which considers additional input data, i.e. CDR3α sequence, V and J genes, MHC and T-cell type. NetTCR-1.0 (Jurtz et al., 2018) and NetTCR-2.0 (Montemurro et al., 2021) propose a simple 1D CNN-based model, integrating peptide and CDR3 sequence information for the prediction of TCR–peptide specificity. TITAN (Weber et al., 2021) is a bimodal neural network that explicitly encodes β-chain and peptide; it leverages transfer learning and SMILES (Weininger et al., 1989) encoding to achieve good generalization.

1.1.2 Deep multimodal variational inference

The problem investigated in this work consists in predicting whether multiple sequences of amino acids, i.e. a peptide and the CDR3s, bind. A single sequence is not informative of whether binding takes place or not when observed alone. As a consequence, binding prediction cannot be framed as a classical multimodal learning problem. Nevertheless, this work presents a strong relationship with multimodal variational inference and takes inspiration from it. In this section, related works from both the supervised and self-supervised learning domains are presented.

Self-supervised learning. Deep neural networks proved to be successful at modeling probability distributions in the context of Variational Bayes (VB) methods. The Variational Autoencoder (VAE) (Kingma and Welling, 2013) jointly trains a generative model from latent variables to observations with an inference network from observations to latent variables. Multimodal generalizations of the VAE shall tackle the problem of learning a joint posterior distribution of the latent variable conditioned on multiple input modalities. The Multimodal Variational Autoencoder (MVAE) (Wu and Goodman, 2018) models the joint posterior as a Product of Experts (PoE) over the marginal posteriors, enabling cross-modal generation at test time. The Mixture-of-experts Multimodal Variational Autoencoder (MMVAE) (Shi et al., 2019) factorizes the joint variational posterior as a combination of unimodal posteriors, using a Mixture of Experts (MoE). MoE-based models have been used in the biomedical field to tackle challenges such as protein–protein interactions (Qi et al., 2007), biomolecular sequence annotation (Caragea et al., 2009) and clustering cell phenotypes from SC data (Kopf et al., 2021). Their main advantage is that they can infer global patterns in the genetic or peptide sequences in supervised and unsupervised settings (Kopf et al., 2021).

Supervised learning. The VIB (Alemi et al., 2016) is to supervised learning what the β-VAE (Higgins et al., 2017) is to unsupervised learning. VIB leverages variational inference to construct a lower bound on the Information Bottleneck (IB) objective (Tishby et al., 2000). By applying the reparameterization trick (Kingma and Welling, 2013), Monte Carlo sampling is used to get an unbiased estimate of the gradient of the VIB objective. This allows using stochastic gradient descent to optimize the objective. Various multimodal generalizations of the VIB have been recently proposed: the Multimodal Variational Information Bottleneck (MVIB) (Grazioli et al., 2022a) and DeepIMV (Lee and Schaar, 2021). Both MVIB and DeepIMV adopt the PoE to estimate a joint multimodal latent encoding distribution from the unimodal latent encoding distributions. In contrast, our AVIB model predicts interactions among multiple input sequences. This involves modeling complex relations among different sequences (analogous to but not the same as modalities) with powerful and flexible multi-head self-attention, for which PoE is a sub-optimal choice.

2 Materials and methods

Let Y be a random variable representing a ground truth label associated with an input random variable X. Let Z be a stochastic encoding of X defined by an encoder pθ(Z|x) parameterized by a neural network. (Notation: X, Y, Z are random variables; x, y, z are their realizations; f(·;θ) and pθ(·) are functions and probability distributions parameterized by a vector θ; S represents a set.) Following Tishby et al. (2000), our goal consists in learning an encoding Z which is (a) maximally informative about Y and (b) maximally compressive about X. Using an information theoretic approach, we obtain the IB objective with p(X,Y,Z)=p(Z|X)p(Y|X)p(X):
(1)
where β0 is a Lagrange multiplier controlling the trade-off between (a) and (b) and I(Z,Y;θ) is the mutual information between Z and Y parameterized by θ:
(2)
As derived in Alemi et al. (2016), assuming qϕ(y|z) and rω(z) are variational approximations of the true p(y|z) and p(z), respectively, Equation 1 can be rewritten as:
(3)
where ϵN(0,I) is an auxiliary Gaussian noise variable, DKL is the Kullback–Leibler divergence and f(·;θ) is a vector-valued parametric deterministic encoding function (here a neural network). The reparameterization trick (Kingma and Welling, 2013) introduces ϵ and allows writing pθ(z|x)dz=p(ϵ)dϵ, where z=f(x,ϵ;θ) is now treated as a deterministic variable. This formulation allows the noise variable to be independent of the model parameters and to compute gradients of the objective in Equation 3 and optimize via backpropagation. In this work, we let the latent encoding distribution on Z be a multivariate Gaussian distribution with a diagonal covariance structure zpθ(Z|x)=N(μ,diag(σ2)); a valid reparameterization is z=μ+σϵ. With the variational distribution rω(Z) set to a standard multivariate Gaussian distribution N(0,I) as done in practice, we can view VIB as a variational encoder–decoder analogous to VAE (Kingma and Welling, 2013), in which the latent encoding distribution pθ can be viewed as a latent posterior, and the variational decoding distribution qϕ can be viewed as a decoder.

In the same spirit of extending VAE (Kingma and Welling, 2013) to MVAE (Wu and Goodman, 2018), the VIB objective of Equation 3 can be generalized by representing X as a collection of multiple input sequences X={Xi|ithsequenceispresent}. In light of this, in the language of a variational encoder-decoder, the posterior pθ(Z|x) of Equation 3 consists actually in the joint posterior pΘ(Z|x1,,xM):=pΘ(Z|x1:M), conditioned on the joint M available sequences. However, for predicting the interaction label Y from X, the M different sequences cannot be simply treated as M different modalities.

2.1 Attention of experts

Similar to previous works (Grazioli et al., 2022a; Wu and Goodman, 2018), the single-sequence posteriors are modeled as Gaussian distributions with diagonal structure: q˜θi(Z|xi)=N(μi,diag(σi2)). By stacking the parameters (represented as column-vectors) μ0 and σ0 of the latent prior with the μi and σi for all available i=1,,M sequences, we define the following two matrices MR(M+1)×dZ and ΣR(M+1)×dZ, where dZ is the dimensionality of the latent single-sequence posteriors:
(4)
We propose to implicitly learn the dependencies between the M single-sequence posteriors and the multi-sequence joint posterior by means of multi-head self-attention, leveraging its power of capturing multiple complex interactions in X, and allowing possible missing sequences:
(5)
Pool:R(M+1)×dZRdZ is a 1D max pooling function. MultiHead is the standard multi-head attention block defined in Vaswani et al. (2017), whose equations are provided in Supplementary Material S2. We refer to Equation 5 as AoE. Figure 1 provides a schematic depiction of AoE.
Attention of experts (AoE) for Gaussian posteriors. Ei is the stochastic Gaussian encoder of the ith sequence
Fig. 1.

Attention of experts (AoE) for Gaussian posteriors. Ei is the stochastic Gaussian encoder of the ith sequence

We refer to a multi-sequence VIB which adopts AoE for modeling the multi-sequence joint posterior as AVIB. The AVIB objective is:
(6)
where the multi-sequence posterior is modeled as pΘ(Z|x1:M)=N(μAoE,diag(σAoE2)).

Due to space limitation, we provide detailed description of the implementation, the training setup and the choice of the hyperparameter β in Supplementary Material S3. Supplementary Material S4 describes the full AVIB architecture.

2.2 Relation to multimodal variational inference

MVAE (Wu and Goodman, 2018) and MVIB (Grazioli et al., 2022a) approximate the joint posterior pΘ(Z|x1:M) assuming that the M modalities are conditionally independent, given the common latent variable Z. This allows expressing the joint posterior as a product of unimodal approximate posteriors q˜θi(Z|xi) and a prior p(Z), referred to as PoE: pΘ(Z|x1:M)=p(Z)i=1Mq˜θi(Z|xi), where Θ={θi}i=1M. MMVAE (Shi et al., 2019) factorizes the joint multimodal posterior as a mixture of Gaussian unimodal posteriors, referred to as MoE: pΘ(Z|x1:M)=1Mi=1Mq˜θi(Z|xi).

PoE assumes conditional independence between modalities (Hinton, 2002). Furthermore, conditional dependence is impossible to capture by MoE, due to its additive form. This becomes a major shortcoming when modeling TCR–peptide interaction, in which the single sequences are not predictive of the binding if observed individually. Although AoE does not explicitly parameterize conditional dependence between the sequences, it does not assume that each sequence should be individually predictive of the class label, making it a more suitable candidate to model molecular interactions.

AoE can improve on PoE and MoE on multiple levels. First, employing attention for estimating the joint multi-sequence posterior allows learning relative importance among the various single-sequence posteriors. This allows dynamically enhancing the weight given to certain input sequences, while diminishing the focus on others, without being restrained to ‘AND’ and ‘OR’ relations, like PoE and MoE, respectively (Shi et al., 2019). Second, as AoE is a parametric trainable module, it can learn to accommodate miscalibrated single-sequence posteriors, which are especially difficult to handle by PoE (Kutuzova et al., 2021).

The adoption of PoE and MoE for the approximation of a multimodal posterior using unimodal encoders allows for inference also in case certain modalities are missing (Grazioli et al., 2022a; Kutuzova et al., 2021; Shi et al., 2019; Wu and Goodman, 2018). A single encoder applied on the concatenation of all modalities would not allow that. Just like PoE and MoE, AoE allows inference with missing inputs. There is in fact no restriction on the number of rows of M and Σ (see Equations 4 and 5), which is the equivalent of the number of word tokens in a natural language processing setting (see Section 3.5).

In this work, we only benchmark AoE against PoE and do not compare against MoE. We believe MoE’s ‘OR’-nature is not suitable for modeling the chemical specificity of multiple molecules. If taken alone, the single-sequence variational posteriors are not informative of the chemical reaction. Analogously, sampling from a MoE—which has similarities to the OR operator—is not suitable for capturing how molecules chemically interact.

2.3 Information Bottleneck Mahalanobis distance

Although AVIB is not explicitly designed for uncertainty estimation, we propose a simple, yet effective, approach for OOD detection. This approach is strongly inspired by Lee et al. (2018) and leverages the Mahalanobis distance. In the following, we first summarize the method proposed by Lee et al. (2018). Then, we describe how to extend this approach to AVIB.

Mahalanobis distance. The Mahalanobis distance has proved to be an effective metric for OOD detection (Lee et al., 2018). Let fl(x) denote the output of the lth hidden layer of a neural network, given an input x. Using the training samples, this method fits a class-conditional Gaussian distribution to the embeddings of each class, computing a per-class mean μlc=1Nci:yi=cfl(xi) and a shared covariance matrix Σl=1Nc=1Ki:yi=c(fl(xi)μlc)(fl(xi)μlc)T. Given a test sample x, the Mahalanobis score is computed as scoreMaha(x)=lαlMl(x), where Ml(x)=minc(12(fl(x)μlc)Σl1(fl(x)μlc)T). Lee et al. (2018) fit the αl coefficients by training a logistic regression on a set of samples for which the knowledge of the OOD/ID label is assumed. Additionally, the authors show that adding a small (ε) controlled noise to the input can improve results, analogously to ODIN (Liang et al., 2017).

We leverage the expectation of the learned latent posterior conditioned on all input sequences and fit K class-conditional Gaussian distributions using the ID training samples, where K is the number of classes. For the TCR–peptide interaction prediction task, we have two classes: binders and non-binders. The K empirical class means are computed as:
(7)
A shared covariance matrix is computed as:
(8)
Lee et al. (2018) compute Mahalanobis distances at multiple hidden layers of a neural network. A logistic regression is trained to learn relative weights assuming the knowledge of the ID/OOD label for a set of validation samples. In contrast to that, given a multi-sequence sample x1:M, we propose to leverage the multi-sequence posterior distribution over encodings learned by AVIB, i.e. pΘ(Z|x1:M) and compute one single Mahalanobis distance, which acts as OOD score:
(9)

This approach is hyperparameter free. Hence, it does not require a validation set for tuning. As a consequence, prior knowledge of OOD validation samples is not required.

3 Results and discussion

First, we provide a description of the datasets used in this work. We then apply AVIB to the TCR–peptide interaction prediction problem. Last, we demonstrate AVIB’s effectiveness in the context of OOD detection. All experiments are implemented using PyTorch (Paszke et al., 2019). Code and data are publicly available at: https://github.com/nec-research/vibtcr.

3.1 Datasets

Recent studies (De Neuter et al., 2018; Fischer et al., 2020; Gielis et al., 2019; Jokinen et al., 2019; Jurtz et al., 2018; Montemurro et al., 2021; Moris et al., 2021; Springer et al., 2020; 2021; Tong et al., 2020; Weber et al., 2021; Wong et al., 2019) investigate the prediction of TCR–peptide/–pMHC interactions. Most use data from the Immune Epitope Database (IEDB) (Vita et al., 2019), VDJdb (Bagaev et al., 2020) and McPAS-TCR (Tickotsky et al., 2017), which mainly contain CDR3β data and lack information on CDR3α. We merge human TCR–peptide data extracted from the ERGO II and NetTCR-2.0 repositories (https://github.com/IdoSpringer/ERGO-II; https://github.com/mnielLab/NetTCR-2.0). Binding (i.e. positive) samples are derived from the IEDB, VDJdb and McPAS-TCR databases. Positive data points generated by Klinger et al. (2015), referred to as MIRA set, are also considered. (The MIRA set is publicly available in the NetTCR-2.0 repository https://github.com/mnielLab/NetTCR-2.0/tree/main/data.) We employ all non-binding (i.e. negative) samples used by Springer et al. (2021) and Montemurro et al. (2021). Hence, negative samples are derived from random recombination of positive data points, as well as from the 10× Genomics assays described in Montemurro et al. (2021). Overall, 271 366 human TCR–peptide samples are available. We organize the data and create the following datasets.

α + β set. 117 753 samples out of 271 366 present peptide information, along with both CDR3α and CDR3β sequences. In this work, we refer to this subset as the α+βset. The ground truth label is a binary variable which represents whether the peptide and TCR chains interact.

β set. 153 613 samples out of 271 366 present peptide and CDR3β information (the CDR3α sequence is missing). We refer to this subset as the β set. The β set and the α+βset are disjoint.

Human TCR set. We refer to the totality of the human TCR–peptide data (i.e. β set α+βset) as Human TCR set.

Non-human TCR set. We extract 5036 non-human TCR samples from the VDJdb database, which we use as OOD samples. These samples come from mice and macaques and present peptide and CDR3β information. We refer to these samples as Non-human TCR set.

In addition to the TCR datasets, in order to thoroughly evaluate AVIB on multiple types of molecular data, we perform experiments on the following peptide-MHC datasets.

NetMHCIIpan-4.0 set. This dataset consists of 108 959 peptide-MHC pairs and was proposed in Reynisson et al. (2020) for the training of the NetMHCIIpan-4.0 model. All MHC molecules are class II. A continuous binding affinity (BA) value, ranging in [0,1], is associated to each (peptide, MHC) pair and used to validate AVIB on a regression task.

Human MHC set. We create a second set of OOD samples composed of 463 684 peptide-MHC pairs. The peptide sequences are taken from the Human TCR set, i.e. the peptide information is shared among ID and OOD sets. The MHC molecules are represented as pseudo-sequences of amino acids. [For the MHC pseudo-sequences, we refer to the PUFFIN (Zeng and Gifford, 2019) repository: https://github.com/gifford-lab/PUFFIN/blob/master/data/pseudosequence.2016.all.X.dat.] We consider both Classes I and II MHC alleles. We refer to these samples as Human MHC set.

Supplementary Figure S7 depicts the distributions of the human TCR data for both the α+βset and the β set. The two datasets have similar peptide distributions but contain different CDR3β sequences. Supplementary Material S5.1 provides information regarding the distribution of the length of the amino acid sequences in the various datasets. Supplementary Material S5.2 provides information on the class distribution of the α+βset and β set. Supplementary Material S5.3 describes the binding affinity distributions of the NetMHCIIpan-4.0 set.

3.2 Pre-processing

In this work, peptides, CDR3α and CDR3β are represented as sequences of amino acids. The 20 amino acids translated by the genetic code are in general represented as English alphabet letters. Analogously to Montemurro et al. (2021), we pre-process the amino acid sequences using BLOSUM50 encodings (Henikoff and Henikoff, 1992), i.e. the substitution value of each amino acid represented by the BLOSUM50 matrix’ diagonal. This allows us to represent a sequence of N amino acids as a 20×N matrix, analogously to the approach proposed by Nielsen et al. (2003). After performing BLOSUM50 encoding, we standardize the features by subtracting the mean and scaling to unit variance. As the length of the amino acid sequences is not constant, we operate 0-padding after the BLOSUM50 encoding (Mösch and Frishman, 2021). This ensures that all matrices have shape 20×Nmax, where Nmax is the length of the longest sequence. Information on the length distribution of the amino acid sequences can be found in Supplementary Material S5.1.

3.3 TCR–peptide interaction prediction

In order to evaluate AVIB’s performance on the TCR–peptide interaction prediction task, we perform experiments on three datasets: the α+βset, the β set and their union β set α+βset. For the β set and the union set, input samples are (xPeptide,xCDR3β) pairs. For the α+βset, inputs can be either (xPeptide,xCDR3β) pairs or (xPeptide,xCDR3α,xCDR3β) triples. For all tri-sequence experiments, we adopt a full multi-sequence extension of the JAVIB objective (see Supplementary Material S6, Equation 12).

Baselines. We benchmark AVIB against two state-of-the-art deep learning methods for TCR–peptide interaction predictions: ERGO II (Springer et al., 2021) and NetTCR-2.0 (Montemurro et al., 2021). Additionally, we benchmark AVIB against the LUPI-SVM (Abbasi et al., 2018), by leveraging the α-chain at training time as privileged information. For all benchmark methods, we adopt the original publicly available implementations (https://github.com/IdoSpringer/ERGO-II; https://github.com/mnielLab/NetTCR-2.0; https://github.com/wajidarshad/LUPI-SVM).

Evaluation metrics.Table 1 summarizes the experimental results. For evaluation, the area under the receiver operator characteristic (AUROC) curve, the area under the precision–recall (AUPR) curve and the F1 score (F1) are computed on the test sets. Five repeated experiments with different 80/20 training/test random splits are performed for robust evaluation.

Table 1.

TCR–peptide interaction prediction results

DatasetInputsMethodAUROCAUPRF1
β setPep + βNetTCR-2.00.755 ± 0.0010.395 ± 0.0020.349 ± 0.002
ERGO II0.761 ± 0.0110.415 ± 0.0200.412 ± 0.010
AVIB (ours)0.804 ± 0.0010.494 ± 0.0010.477 ± 0.001
α+βsetPep + βLUPI-SVM0.770 ± 0.0010.212 ± 0.0010.218 ± 0.001
NetTCR-2.00.846 ± 0.0020.396 ± 0.0030.413 ± 0.001
ERGO II0.894 ± 0.0010.538 ± 0.0040.498 ± 0.003
AVIB (ours)0.895 ± 0.0010.534 ± 0.0040.515 ± 0.002
Pep + α + βNetTCR-2.00.862 ± 0.0020.477 ± 0.0030.472 ± 0.002
ERGO II0.903 ± 0.0020.578 ± 0.0040.528 ± 0.002
AVIB (ours)0.913 ± 0.0010.614 ± 0.0020.586 ± 0.001
β set α+βsetPep + βNetTCR-2.00.727 ± 0.0010.342 ± 0.0010.276 ± 0.002
ERGO II0.748 ± 0.0150.379 ± 0.0220.381 ± 0.014
AVIB (ours)0.773 ± 0.0010.414 ± 0.0020.396 ± 0.003
DatasetInputsMethodAUROCAUPRF1
β setPep + βNetTCR-2.00.755 ± 0.0010.395 ± 0.0020.349 ± 0.002
ERGO II0.761 ± 0.0110.415 ± 0.0200.412 ± 0.010
AVIB (ours)0.804 ± 0.0010.494 ± 0.0010.477 ± 0.001
α+βsetPep + βLUPI-SVM0.770 ± 0.0010.212 ± 0.0010.218 ± 0.001
NetTCR-2.00.846 ± 0.0020.396 ± 0.0030.413 ± 0.001
ERGO II0.894 ± 0.0010.538 ± 0.0040.498 ± 0.003
AVIB (ours)0.895 ± 0.0010.534 ± 0.0040.515 ± 0.002
Pep + α + βNetTCR-2.00.862 ± 0.0020.477 ± 0.0030.472 ± 0.002
ERGO II0.903 ± 0.0020.578 ± 0.0040.528 ± 0.002
AVIB (ours)0.913 ± 0.0010.614 ± 0.0020.586 ± 0.001
β set α+βsetPep + βNetTCR-2.00.727 ± 0.0010.342 ± 0.0010.276 ± 0.002
ERGO II0.748 ± 0.0150.379 ± 0.0220.381 ± 0.014
AVIB (ours)0.773 ± 0.0010.414 ± 0.0020.396 ± 0.003

Note: The reported confidence intervals are standard errors over five repeated experiments with different independent training/test random splits. Reported scores are computed on the test sets. Baselines: NetTCR-2.0 (Montemurro et al., 2021), ERGO II (Springer et al., 2021) and LUPI-SVM (Abbasi et al., 2018). Best results are in bold.

Table 1.

TCR–peptide interaction prediction results

DatasetInputsMethodAUROCAUPRF1
β setPep + βNetTCR-2.00.755 ± 0.0010.395 ± 0.0020.349 ± 0.002
ERGO II0.761 ± 0.0110.415 ± 0.0200.412 ± 0.010
AVIB (ours)0.804 ± 0.0010.494 ± 0.0010.477 ± 0.001
α+βsetPep + βLUPI-SVM0.770 ± 0.0010.212 ± 0.0010.218 ± 0.001
NetTCR-2.00.846 ± 0.0020.396 ± 0.0030.413 ± 0.001
ERGO II0.894 ± 0.0010.538 ± 0.0040.498 ± 0.003
AVIB (ours)0.895 ± 0.0010.534 ± 0.0040.515 ± 0.002
Pep + α + βNetTCR-2.00.862 ± 0.0020.477 ± 0.0030.472 ± 0.002
ERGO II0.903 ± 0.0020.578 ± 0.0040.528 ± 0.002
AVIB (ours)0.913 ± 0.0010.614 ± 0.0020.586 ± 0.001
β set α+βsetPep + βNetTCR-2.00.727 ± 0.0010.342 ± 0.0010.276 ± 0.002
ERGO II0.748 ± 0.0150.379 ± 0.0220.381 ± 0.014
AVIB (ours)0.773 ± 0.0010.414 ± 0.0020.396 ± 0.003
DatasetInputsMethodAUROCAUPRF1
β setPep + βNetTCR-2.00.755 ± 0.0010.395 ± 0.0020.349 ± 0.002
ERGO II0.761 ± 0.0110.415 ± 0.0200.412 ± 0.010
AVIB (ours)0.804 ± 0.0010.494 ± 0.0010.477 ± 0.001
α+βsetPep + βLUPI-SVM0.770 ± 0.0010.212 ± 0.0010.218 ± 0.001
NetTCR-2.00.846 ± 0.0020.396 ± 0.0030.413 ± 0.001
ERGO II0.894 ± 0.0010.538 ± 0.0040.498 ± 0.003
AVIB (ours)0.895 ± 0.0010.534 ± 0.0040.515 ± 0.002
Pep + α + βNetTCR-2.00.862 ± 0.0020.477 ± 0.0030.472 ± 0.002
ERGO II0.903 ± 0.0020.578 ± 0.0040.528 ± 0.002
AVIB (ours)0.913 ± 0.0010.614 ± 0.0020.586 ± 0.001
β set α+βsetPep + βNetTCR-2.00.727 ± 0.0010.342 ± 0.0010.276 ± 0.002
ERGO II0.748 ± 0.0150.379 ± 0.0220.381 ± 0.014
AVIB (ours)0.773 ± 0.0010.414 ± 0.0020.396 ± 0.003

Note: The reported confidence intervals are standard errors over five repeated experiments with different independent training/test random splits. Reported scores are computed on the test sets. Baselines: NetTCR-2.0 (Montemurro et al., 2021), ERGO II (Springer et al., 2021) and LUPI-SVM (Abbasi et al., 2018). Best results are in bold.

Peptide+CDR3β. On the β set, AVIB obtains ∼4% higher AUROC and 8% higher AUPR compared to the best baseline, ERGO II. On the β set α+βset, AVIB outperforms ERGO II by achieving ∼3% higher AUROC and ∼4% higher AUPR. On the α+βset, in the peptide+CDR3β setting, AVIB compares with ERGO II.

Peptide+CDR3α. Peptide+CDR3α results on the α+βset are reported in Supplementary Material S7.

Peptide+CDR3α+CDR3β. In the tri-sequence setting, when considering peptide and both CDR3α and CDR3β sequences, AVIB obtains 1% higher AUROC, ∼4% higher AUROC and ∼6% higher F1 score compared to the best baseline, ERGO II.

These experimental results demonstrate that AVIB is a competitive method for TCR–peptide interaction prediction. On the α+βset, AVIB’s tri-sequence (peptide+CDR3α+CDR3β) results outperform the results obtained in both bi-sequence (peptide+CDR3α and peptide+CDR3β) settings (see Table 1 and Supplementary Material S7). This shows that AVIB is an effective multi-sequence learning method, which can learn richer representations from the joint analysis of multiple data sequences.

3.3.1 Cross-dataset experiments

In Supplementary Material S8, we present cross-dataset experiments, in which we train AVIB and the baseline models on the α+βset and test on the the β set. As shown in Supplementary Figure S7, the α+βset and the β set present similar peptide distributions, but contain different CDR3β sequences. Our cross-dataset results show that all models fail to generalize to unseen CDR3β sequences. These results are in line with Grazioli et al. (2022b), which analogously shows that state-of-the-art models fail to generalize to unseen peptides.

3.3.2 Visualization of the attention weights

One of the advantages of using AoE for estimating the multi-sequence posterior is the dynamic weighting of the multiple single-sequence posteriors. This allows to capture relationships between the input sequences. In Supplementary Material S9, we show how the attention weights derived from the μAoE self-attention block change while gradually mutating the peptide sequence. We notice, that as the peptide sequence disruption increases, the peptide-CDR3β attention weight drops while CDR3β-peptide increases.

3.4 Multi-sequence posterior approximation

In this section, we compare various techniques to approximate Gaussian joint posteriors. We perform experiments and benchmark on two datasets: α+βset and NetMHCIIpan-4.0 set. Experiments on the α+βset employ either (xPeptide,xCDR3β) pairs or (xPeptide,xCDR3α,xCDR3β) triples as inputs. Experiments on the NetMHCIIpan-4.0 set input (xPeptide,xMHC) pairs.

The ground truth labels of the NetMHCIIpan-4.0 set are continuous BA scores. For BA regression, we train models by substituting the log-likelihood of Equation 6 with a mean squared error (MSE) loss. BA prediction of pMHC complexes is—just like TCR–peptide interaction prediction—a fundamental problem in computational immuno-oncology (Cheng et al., 2021; O’Donnell et al., 2018, 2020; Reynisson et al., 2020) and is a key step in the development of vaccines against cancer (Buhrman and Slansky, 2013; Corse et al., 2011; Hundal et al., 2020; McMahan et al., 2006; Meng and Butterfield, 2002; Slansky et al., 2000) and infectious diseases (Malone et al., 2020). Peptides can only be presented on the surface of cells if they bind to MHC molecules. This mechanism allows the immune system to gain knowledge about in-cell anomalies such as cancerous mutations or viral infections.

Baseline and ablation methods. We benchmark AVIB, which employs AoE, against MVIB (Grazioli et al., 2022a), which employs PoE. Additionally, we perform an ablation study meant to investigate the influence of multi-head self-attention in AoE. For the ablation, we remove the multi-head self-attention module from AoE (see Equation 5) and only operate a simple pooling of the various single-sequence posteriors. We define two ablation methods: Max Pooling of Experts (MaxPOOLoE), which adopts a 1D max pooling function and Average Pooling of Experts (AvgPOOLoE), which adopts 1D average pooling.

Evaluation metrics. For the evaluation of classification results on the α+βset, we adopt AUROC, AUPR, F1 and accuracy. For evaluating regression on the NetMHCIIpan-4.0 set, we employ MSE, root mean squared error (RMSE) and the R2 coefficient (Wright, 1921).

Table 2 presents classification and regression results on the α+βset and the NetMHCIIpan-4.0 set. AoE achieves best results in all settings and on both datasets. Interestingly, the ablation methods AvgPOOLoE and MaxPOOLoE (Supplementary Material S10) achieve worse performance compared to PoE.

Table 2.

Multi-sequence posterior approximation—benchmark and ablation

InputsMetric↑/↓MVIB (PoE)AvgPOOLoEAVIB (AoE)
Pep + βAUROC0.889 ± 0.0010.883 ± 0.0020.895 ± 0.001
AUPR0.512 ± 0.0030.502 ± 0.0020.535 ± 0.004
F10.498 ± 0.0020.484 ± 0.0030.515 ± 0.002
Accuracy0.860 ± 0.0030.852 ± 0.0020.873 ± 0.001
Pep + α + βAUROC0.910 ± 0.0010.905 ± 0.0020.913 ± 0.001
AUPR0.595 ± 0.0040.589 ± 0.0020.614 ± 0.002
F10.575 ± 0.0020.555 ± 0.0070.587 ± 0.001
Accuracy0.907 ± 0.0010.898 ± 0.0020.916 ± 0.001
Pep + MHC IIMSE0.0313 ± 0.00010.0329 ± 0.00020.0299 ± 0.0001
RMSE0.137 ± 0.0010.140 ± 0.0010.133 ± 0.003
R20.538 ± 0.0010.514 ± 0.0040.559 ± 0.001
InputsMetric↑/↓MVIB (PoE)AvgPOOLoEAVIB (AoE)
Pep + βAUROC0.889 ± 0.0010.883 ± 0.0020.895 ± 0.001
AUPR0.512 ± 0.0030.502 ± 0.0020.535 ± 0.004
F10.498 ± 0.0020.484 ± 0.0030.515 ± 0.002
Accuracy0.860 ± 0.0030.852 ± 0.0020.873 ± 0.001
Pep + α + βAUROC0.910 ± 0.0010.905 ± 0.0020.913 ± 0.001
AUPR0.595 ± 0.0040.589 ± 0.0020.614 ± 0.002
F10.575 ± 0.0020.555 ± 0.0070.587 ± 0.001
Accuracy0.907 ± 0.0010.898 ± 0.0020.916 ± 0.001
Pep + MHC IIMSE0.0313 ± 0.00010.0329 ± 0.00020.0299 ± 0.0001
RMSE0.137 ± 0.0010.140 ± 0.0010.133 ± 0.003
R20.538 ± 0.0010.514 ± 0.0040.559 ± 0.001

Note: TCR–peptide binding prediction experiments are performed on the α+βset. Peptide-MHC BA regression experiments are performed on the NetMHCIIpan-4.0 set. Confidence intervals are standard errors over five repeated experiments with different training/test random splits. Best results are in bold. ↑: larger value is better. ↓: lower value is better.

PEP, peptide; α, CDR3α sequence; β, CDR3β sequence; MHC II, MHC Class II pseudo-sequence. Baseline, MVIB. Ablation without multi-head self-attention. AvgPOOLoE, Average Pooling of Experts.

Table 2.

Multi-sequence posterior approximation—benchmark and ablation

InputsMetric↑/↓MVIB (PoE)AvgPOOLoEAVIB (AoE)
Pep + βAUROC0.889 ± 0.0010.883 ± 0.0020.895 ± 0.001
AUPR0.512 ± 0.0030.502 ± 0.0020.535 ± 0.004
F10.498 ± 0.0020.484 ± 0.0030.515 ± 0.002
Accuracy0.860 ± 0.0030.852 ± 0.0020.873 ± 0.001
Pep + α + βAUROC0.910 ± 0.0010.905 ± 0.0020.913 ± 0.001
AUPR0.595 ± 0.0040.589 ± 0.0020.614 ± 0.002
F10.575 ± 0.0020.555 ± 0.0070.587 ± 0.001
Accuracy0.907 ± 0.0010.898 ± 0.0020.916 ± 0.001
Pep + MHC IIMSE0.0313 ± 0.00010.0329 ± 0.00020.0299 ± 0.0001
RMSE0.137 ± 0.0010.140 ± 0.0010.133 ± 0.003
R20.538 ± 0.0010.514 ± 0.0040.559 ± 0.001
InputsMetric↑/↓MVIB (PoE)AvgPOOLoEAVIB (AoE)
Pep + βAUROC0.889 ± 0.0010.883 ± 0.0020.895 ± 0.001
AUPR0.512 ± 0.0030.502 ± 0.0020.535 ± 0.004
F10.498 ± 0.0020.484 ± 0.0030.515 ± 0.002
Accuracy0.860 ± 0.0030.852 ± 0.0020.873 ± 0.001
Pep + α + βAUROC0.910 ± 0.0010.905 ± 0.0020.913 ± 0.001
AUPR0.595 ± 0.0040.589 ± 0.0020.614 ± 0.002
F10.575 ± 0.0020.555 ± 0.0070.587 ± 0.001
Accuracy0.907 ± 0.0010.898 ± 0.0020.916 ± 0.001
Pep + MHC IIMSE0.0313 ± 0.00010.0329 ± 0.00020.0299 ± 0.0001
RMSE0.137 ± 0.0010.140 ± 0.0010.133 ± 0.003
R20.538 ± 0.0010.514 ± 0.0040.559 ± 0.001

Note: TCR–peptide binding prediction experiments are performed on the α+βset. Peptide-MHC BA regression experiments are performed on the NetMHCIIpan-4.0 set. Confidence intervals are standard errors over five repeated experiments with different training/test random splits. Best results are in bold. ↑: larger value is better. ↓: lower value is better.

PEP, peptide; α, CDR3α sequence; β, CDR3β sequence; MHC II, MHC Class II pseudo-sequence. Baseline, MVIB. Ablation without multi-head self-attention. AvgPOOLoE, Average Pooling of Experts.

3.5 Missing input sequences

In this section, we study AVIB’s performance when certain data sequences are available at training time, but missing at test time. We train AVIB on (xPeptide,xCDR3α,xCDR3β) triples from the α+βset. At test time, we omit one of the two CDR3 sequences. In real-world settings, it is in fact common to have batches of data where only CDR3α or CDR3β information is available. It is therefore efficient to leverage one single model which can operate also if a CDR3 sequence is missing. This prevents the need of training different models on the various sequences subsets.

Figure 2 presents the experimental results. As expected, AVIB performance decreases when a CDR3 sequence is missing at test time. However, the performance achieved by AVIB when trained in the tri-sequence setting and tested on missing sequences is not consistently different to the performance deriving from a bi-sequence training. We only observe a significant difference in the AUPR score when the CDRα sequence is missing: AVIB trained on peptide+CDR3β achieves ∼3% higher AUPR than AVIB trained on peptide+CDR3α+CDR3β and tested with missing CDR3α.

AVIB performance with missing input sequences. Confidence intervals are standard deviations over five repeated experiments on the α+βset with different independent training/test random splits. Train, training time sequences; Test, test time sequences
Fig. 2.

AVIB performance with missing input sequences. Confidence intervals are standard deviations over five repeated experiments on the α+βset with different independent training/test random splits. Train, training time sequences; Test, test time sequences

3.6 OOD detection

Alemi et al. (2018) show that VIB has the ability to detect OOD samples. In this section, the OOD detection capabilities of AVIB are investigated. We assume that we have an in-distribution (ID) dataset DID of (x1ID,,xMID;yID) tuples, where xi denote input data sequences and y the class label. DOOD denotes an OOD dataset of (x1OOD,,xMOOD;yOOD) tuples. We study the scenario in which the model only has access to ID samples DtrainID at training time. The test set consists of DtestIDDtestOOD. We adopt the Human TCR set as ID dataset, i.e. DID. We perform experiments using the Non-human TCR set and the Human MHC set as OOD datasets, i.e. DOOD.

We leverage the expectation of the learned latent posterior conditioned on all input sequences and fit two class-conditional Gaussian distributions using the ID training samples, one for the binding samples and one for the non-binding ones (Equation 7). The class-conditional Gaussian distributions share the same covariance matrix (Equation 8). Analogously to Lee et al. (2018), we discriminate whether test samples are ID or OOD using the Mahalanobis distance score (AVIB-Maha) (Equation 9).

Training and test sets for OOD detection. Given a pair (DID,DOOD), we operate a random 80/20 training/test split of DID into DtrainID and DtestID. We train AVIB on DtrainID for TCR–peptide interaction prediction. No OOD samples are available at training time. We ensure that the number of ID and OOD samples in the test set is balanced by applying the procedure described in Supplementary Material S11.1. Experiments are repeated five times with different random training/test splits.

Baselines. For benchmark, we compare our results with several OOD detection methods: MSP (Hendrycks and Gimpel, 2016), ODIN (Liang et al., 2017) and the AVIB rate (AVIB-R) DKL(pΘ(Z|x1:Mn)||rω(Z)) (Alemi et al., 2018). See Supplementary Material S11.2 for further details about the baseline methods.

Evaluation metrics. As evaluation metrics, in addition to AUROC and AUPR, we adopt the false positive rate at 95% true positive rate (FPR @ 95% TPR) and the detection error (see Supplementary Material S11.3).

Table 3 summarizes the OOD detection results for AVIB trained on the Human TCR set for TCR–peptide interaction prediction and using the Non-human TCR set and the Human MHC set as OOD datasets. Figure 3 shows the ROC and PR curves. AVIB-Maha achieves best results on all investigated metrics on both OOD datasets. On the Non-human TCR set, AVIB-Maha outperforms AVIB-R by ∼9% AUROC and >15% AUPR. On the Human MHC set, AVIB-Maha outperforms AVIB-R by ∼29% FPR at 95% TPR and ∼15% detection error.

OOD detection—ROC and PR curves
Fig. 3.

OOD detection—ROC and PR curves

Table 3.

OOD detection results

DID/DOODMethodFPR at 95% TPR ↓Detection error ↓AUROC ↑AUPR ↑
Human TCR/non-human TCRMSP0.962 ± 0.0010.505 ± 0.0010.540 ± 0.0030.627 ± 0.003
ODIN0.962 ± 0.0020.506 ± 0.0010.425 ± 0.0080.559 ± 0.014
AVIB-R0.719 ± 0.0180.384 ± 0.0090.768 ± 0.0100.714 ± 0.011
AVIB-Maha (ours)0.699 ± 0.0110.374 ± 0.0060.850 ± 0.0020.871 ± 0.001
Human TCR/human MHCMSP0.955 ± 0.0020.503 ± 0.0010.491 ± 0.0070.550 ± 0.005
ODIN0.714 ± 0.0470.382 ± 0.0240.701 ± 0.0290.763 ± 0.027
AVIB-R0.297 ± 0.0270.174 ± 0.0140.955 ± 0.0040.964 ± 0.003
AVIB-Maha (ours)0.006 ± 0.0020.028 ± 0.0010.994 ± 0.0010.995 ± 0.001
DID/DOODMethodFPR at 95% TPR ↓Detection error ↓AUROC ↑AUPR ↑
Human TCR/non-human TCRMSP0.962 ± 0.0010.505 ± 0.0010.540 ± 0.0030.627 ± 0.003
ODIN0.962 ± 0.0020.506 ± 0.0010.425 ± 0.0080.559 ± 0.014
AVIB-R0.719 ± 0.0180.384 ± 0.0090.768 ± 0.0100.714 ± 0.011
AVIB-Maha (ours)0.699 ± 0.0110.374 ± 0.0060.850 ± 0.0020.871 ± 0.001
Human TCR/human MHCMSP0.955 ± 0.0020.503 ± 0.0010.491 ± 0.0070.550 ± 0.005
ODIN0.714 ± 0.0470.382 ± 0.0240.701 ± 0.0290.763 ± 0.027
AVIB-R0.297 ± 0.0270.174 ± 0.0140.955 ± 0.0040.964 ± 0.003
AVIB-Maha (ours)0.006 ± 0.0020.028 ± 0.0010.994 ± 0.0010.995 ± 0.001

Note: Distinguishing in- and out-of-distribution test samples. DID is the in-distribution set; DOOD is the out-of-distribution set, not available at training time. The reported confidence intervals are standard errors over five repeated experiments with different independent DtrainID/DtestID random splits. ↑ indicates larger value is better; ↓ indicates lower value is better. Best results are in bold. Baselines: MSP, ODIN and AVIB-R. For ODIN, the hyperparameters are ε=0.001 and T =1000, tuned on a validation set.

Table 3.

OOD detection results

DID/DOODMethodFPR at 95% TPR ↓Detection error ↓AUROC ↑AUPR ↑
Human TCR/non-human TCRMSP0.962 ± 0.0010.505 ± 0.0010.540 ± 0.0030.627 ± 0.003
ODIN0.962 ± 0.0020.506 ± 0.0010.425 ± 0.0080.559 ± 0.014
AVIB-R0.719 ± 0.0180.384 ± 0.0090.768 ± 0.0100.714 ± 0.011
AVIB-Maha (ours)0.699 ± 0.0110.374 ± 0.0060.850 ± 0.0020.871 ± 0.001
Human TCR/human MHCMSP0.955 ± 0.0020.503 ± 0.0010.491 ± 0.0070.550 ± 0.005
ODIN0.714 ± 0.0470.382 ± 0.0240.701 ± 0.0290.763 ± 0.027
AVIB-R0.297 ± 0.0270.174 ± 0.0140.955 ± 0.0040.964 ± 0.003
AVIB-Maha (ours)0.006 ± 0.0020.028 ± 0.0010.994 ± 0.0010.995 ± 0.001
DID/DOODMethodFPR at 95% TPR ↓Detection error ↓AUROC ↑AUPR ↑
Human TCR/non-human TCRMSP0.962 ± 0.0010.505 ± 0.0010.540 ± 0.0030.627 ± 0.003
ODIN0.962 ± 0.0020.506 ± 0.0010.425 ± 0.0080.559 ± 0.014
AVIB-R0.719 ± 0.0180.384 ± 0.0090.768 ± 0.0100.714 ± 0.011
AVIB-Maha (ours)0.699 ± 0.0110.374 ± 0.0060.850 ± 0.0020.871 ± 0.001
Human TCR/human MHCMSP0.955 ± 0.0020.503 ± 0.0010.491 ± 0.0070.550 ± 0.005
ODIN0.714 ± 0.0470.382 ± 0.0240.701 ± 0.0290.763 ± 0.027
AVIB-R0.297 ± 0.0270.174 ± 0.0140.955 ± 0.0040.964 ± 0.003
AVIB-Maha (ours)0.006 ± 0.0020.028 ± 0.0010.994 ± 0.0010.995 ± 0.001

Note: Distinguishing in- and out-of-distribution test samples. DID is the in-distribution set; DOOD is the out-of-distribution set, not available at training time. The reported confidence intervals are standard errors over five repeated experiments with different independent DtrainID/DtestID random splits. ↑ indicates larger value is better; ↓ indicates lower value is better. Best results are in bold. Baselines: MSP, ODIN and AVIB-R. For ODIN, the hyperparameters are ε=0.001 and T =1000, tuned on a validation set.

4 Conclusion

In this article, we propose AVIB, a multi-sequence generalization of the Variational Information Bottleneck (Alemi et al., 2016), which uses AoE to implicitly approximate the posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to the TCR–peptide interaction prediction problem, a fundamental challenge in immuno-oncology. We show that our method significantly improves on the state-of-the-art baselines ERGO II (Springer et al., 2021) and NetTCR-2.0 (Montemurro et al., 2021). We demonstrate the effectiveness of AoE with a benchmark against PoE, as well as with an ablation study. We also show that AoE achieves the best results on peptide-MHC binding affinity regression. Furthermore, we demonstrate that AVIB can handle missing data sequences at test time. We then leverage the bottleneck posterior distribution learned by AVIB and demonstrate that it can be used to effectively detect OOD amino acid sequences. Our method significantly outperforms the baselines MSP (Hendrycks and Gimpel, 2016), ODIN (Liang et al., 2017) and AVIB-R (Alemi et al., 2018). Interestingly, we observe that generalization to unseen sequences remains a challenging problem for all investigated models. These results are analogous to those of Grazioli et al. (2022b). We believe this drop in performance is due to the sparsity of the observed training sequences. Future work should focus on tackling the problem of generalization by, for example, simulating or approximating the chemical interactions of TCR and peptides (or pMHCs), as well as their 3D structures.

Financial Support: none declared.

Conflict of Interest: none declared.

References

Abbasi
W.A.
et al. (
2018
)
Learning protein binding affinity using privileged information
.
BMC Bioinformatics
,
19
,
1
12
.

Alemi
A.A.
et al. (
2016
) Deep variational information bottleneck. In: International Conference on Learning Representations, Toulon, France.

Alemi
A.A.
et al. (
2018
) Uncertainty in the variational information bottleneck. In: Conference on Uncertainty in Artificial Intelligence, Workshop on Uncertainty in Deep Learning. Monterey, California, USA.

Bagaev
D. V.
et al. (
2020
)
VDJDB in 2019: database extension, new analysis infrastructure and a t-cell receptor motif compendium
.
Nucleic Acids Res
.,
48
,
D1057
D1062
.

Buhrman
J.D.
,
Slansky
J.E.
(
2013
)
Improving T cell responses to modified peptides in tumor vaccines
.
Immunol. Res
.,
55
,
34
47
.

Caragea
C.
et al. (
2009
)
Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling
.
BMC Bioinformatics
,
10
,
1
14
.

Cheng
J.
et al. (
2021
)
BERTMHC: improved MHC–peptide class II interaction prediction with transformer and multiple instance learning
.
Bioinformatics
,
37
,
4172
4179
.

Chronister
W.D.
et al. (
2021
)
TCRMatch: predicting T-cell receptor specificity based on sequence similarity to previously characterized receptors
.
Front. Immunol
.,
12
,
640725
.

Corse
E.
et al. (
2011
)
Strength of TCR–peptide/MHC interactions and in vivo T cell responses
.
J. Immunol
.,
186
,
5039
5045
.

Dash
P.
et al. (
2017
)
Quantifiable predictive features define epitope-specific T cell receptor repertoires
.
Nature
,
547
,
89
93
.

Davis
M.M.
,
Bjorkman
P.J.
(
1988
)
T-cell antigen receptor genes and T-cell recognition
.
Nature
,
334
,
395
402
.

De Neuter
N.
et al. (
2018
)
On the feasibility of mining CD8+ T cell receptor patterns underlying immunogenic peptide recognition
.
Immunogenetics
,
70
,
159
168
.

Feng
D.
et al. (
2007
)
Structural evidence for a germline-encoded T cell receptor–major histocompatibility complex interaction ‘codon’
.
Nat. Immunol
.,
8
,
975
983
.

Fischer
D. S.
et al. (
2020
)
Predicting antigen specificity of single T cells based on TCR CDR 3 regions
.
Mol. Syst. Biol
.,
16
,
e9416
.

Gielis
S.
et al. (
2019
)
Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires
.
Front. Immunol
.,
10
,
2820
.

Glanville
J.
et al. (
2017
)
Identifying specificity groups in the t cell receptor repertoire
.
Nature
,
547
,
94
98
.

Grazioli
F.
et al. (
2022a
)
Microbiome-based disease prediction with multimodal variational information bottlenecks
.
PLoS Comput. Biol
.,
18
,
e1010050
.

Grazioli
F.
et al. (
2022b
)
On TCR binding predictors failing to generalize to unseen peptides
.
Front. Immunol
.,
13
,
1014256
.

Hendrycks
D.
,
Gimpel
K.
(
2016
) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International Conference on Learning Representations. Toulon, France.

Henikoff
S.
,
Henikoff
J.G.
(
1992
)
Amino acid substitution matrices from protein blocks
.
Proc. Natl. Acad. Sci. USA
,
89
,
10915
10919
.

Higgins
I.
et al. (
2017
) beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017.

Hinton
G.E.
(
2002
)
Training products of experts by minimizing contrastive divergence
.
Neural Comput
.,
14
,
1771
1800
.

Hundal
J.
et al. (
2020
)
Pvactools: a computational toolkit to identify and visualize cancer neoantigens
.
Cancer Immunol. Res
.,
8
,
409
420
.

Jokinen
E.
et al. (
2019
) Determining epitope specificity of t cell receptors with TCRGP. BioRxiv, p.
542332
.

Jurtz
V.I.
et al. (
2018
) Nettcr: sequence-based prediction of tcr binding to peptide-mhc complexes using convolutional neural networks. BioRxiv, p.
433706
.

Kingma
D. P.
,
Welling
M.
(
2013
) Auto-encoding variational Bayes. In: International Conference on Learning Representations. Banff, Canada.

Klinger
M.
et al. (
2015
)
Multiplex identification of antigen-specific t cell receptors using a combination of immune assays and immune receptor sequencing
.
PLoS One
,
10
,
e0141561
.

Kopf
A.
et al. (
2021
)
Mixture-of-experts variational autoencoder for clustering and generating from similarity-based representations on single cell data
.
PLoS Comput. Biol
.,
17
,
e1009086
.

Krogsgaard
M.
,
Davis
M.M.
(
2005
)
How T cells’ see’ antigen
.
Nat. Immunol
.,
6
,
239
245
.

Kutuzova
S.
et al. (
2021
) Multimodal variational autoencoders for semi-supervised learning: In defense of product-of-experts. arXiv preprint arXiv:2101.07240.

La Gruta
N. L.
et al. (
2018
)
Understanding the drivers of MHC restriction of t cell receptors
.
Nat. Rev. Immunol
.,
18
,
467
478
.

Lanzarotti
E.
et al. (
2019
)
T-cell receptor cognate target prediction based on paired α and β chain sequence and structural CDR loop similarities
.
Front. Immunol
.,
10
,
2080
.

Lee
C.
,
Schaar
M.
(
2021
) A variational information bottleneck approach to multi-omics data integration. In: 24th International Conference on Artificial Intelligence and Statistics was held virtually from Tuesday, 13 April 2021 to Thursday, 15 April 2021, pp.
1513
1521
. PMLR.

Lee
K.
et al. (
2018
) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems, Vol. 31.

Liang
S.
et al. (
2017
) Enhancing the reliability of out-of-distribution image detection in neural networks. In: International Conference on Learning Representations, Vancouver, Canada.

Malone
B.
et al. (
2020
)
Artificial intelligence predicts the immunogenic landscape of sars-cov-2 leading to universal blueprints for vaccine designs
.
Sci. Rep
.,
10
,
1
14
.

McMahan
R.H.
et al. (
2006
)
Relating TCR-peptide-MHC affinity to immunogenicity for the design of tumor vaccines
.
J. Clin. Invest
.,
116
,
2543
2551
.

Meng
W.S.
,
Butterfield
L.H.
(
2002
)
Rational design of peptide-based tumor vaccines
.
Pharm. Res
.,
19
,
926
932
.

Montemurro
A.
et al. (
2021
)
NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data
.
Commun. Biol
.,
4
,
1
13
.

Moris
P.
et al. (
2021
) Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Briefings in Bioinformatics, 22(4).

Mösch
A.
,
Frishman
D.
(
2021
)
TCRpair: prediction of functional pairing between HLA-A*02:01-restricted T-cell receptor α and β chains
.
Bioinformatics
,
37
,
3938
3940
.

Nielsen
M.
et al. (
2003
)
Reliable prediction of t-cell epitopes using neural networks with novel sequence representations
.
Protein Sci
.,
12
,
1007
1017
.

O’Donnell
T.J.
et al. (
2018
)
MHCFLURRY: open-source class I MHC binding affinity prediction
.
Cell Syst
.,
7
,
129
132.e4
.

O’Donnell
T.J.
et al. (
2020
)
MHCflurry 2.0: improved pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing
.
Cell Syst
.,
11
,
42
48.e7
.

Paszke
A.
et al. (
2019
) PyTorch: an imperative style, high-performance deep learning library. In:
Wallach
H.
et al. (eds) Neural Information Processing Systems 2019 was held in Vancouver, Canada, Vol.
32
.
Curran Associates, Inc
., pp.
8024
8035
.

Qi
Y.
et al. (
2007
)
A mixture of feature experts approach for protein-protein interaction prediction
.
BMC Bioinformatics
,
8
,
1
14
.

Reynisson
B.
et al. (
2020
)
NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data
.
Nucleic Acids Res
.,
48
,
W449
W454
.

Rossjohn
J.
et al. (
2015
)
T cell antigen receptor recognition of antigen-presenting molecules
.
Annu. Rev. Immunol
.,
33
,
169
200
.

Rowen
L.
et al. (
1996
)
The complete 685-kilobase DNA sequence of the human β T cell receptor locus
.
Science
,
272
,
1755
1762
.

Shi
Y.
et al. (
2019
) Variational mixture-of-experts autoencoders for multi-modal deep generative models. In: Advances in Neural Information Processing Systems, Vancouver, Canada.

Slansky
J.E.
et al. (
2000
)
Enhanced antigen-specific antitumor immunity with altered peptide ligands that stabilize the MHC-peptide-TCR complex
.
Immunity
,
13
,
529
538
.

Springer
I.
et al. (
2020
)
Prediction of specific TCR-peptide binding from large dictionaries of TCR-peptide pairs
.
Front. Immunol
.,
11
,
1803
.

Springer
I.
et al. (
2021
)
Contribution of t cell receptor alpha and beta CDR3, MHC typing, V and J genes to peptide binding prediction
.
Front. Immunol
.,
12
. https://www.frontiersin.org/articles/10.3389/fimmu.2021.664514.

Tickotsky
N.
et al. (
2017
)
McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences
.
Bioinformatics
,
33
,
2924
2929
.

Tishby
N.
et al. (
2000
) The information bottleneck method. In: The 37th annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois. pp. 368–377.

Tong
Y.
et al. (
2020
)
SETE: sequence-based ensemble learning approach for TCR epitope binding prediction
.
Comput. Biol. Chem
.,
87
,
107281
.

Vaswani
A.
et al. (
2017
)
Attention is all you need
. In: 31st Conference on Neural Information Processing Systems, Mon Dec 4th through Sat the 9th Long Beach Convention Center, Long Beach. pp.
5998
6008
.

Vita
R.
et al. (
2019
)
The immune epitope database (IEDB): 2018 update
.
Nucleic Acids Res
.,
47
,
D339
D343
.

Weber
A.
et al. (
2021
)
TITAN: T-cell receptor specificity prediction with bimodal attention networks
.
Bioinformatics
,
37
,
i237
i244
.

Weininger
D.
et al. (
1989
)
Smiles. 2. Algorithm for generation of unique smiles notation
.
J. Chem. Inf. Comput. Sci
.,
29
,
97
101
.

Wong
E.
et al. (
2019
)
TRAV1-2 CD8 T-cells including oligoconal expansions of MAIT cells are enriched in the airways in human tuberculosis
.
Commun. Biol
.,
2
,
203
.

Wright
S.
(
1921
)
Correlation and causation
.
J. Agric. Res
.,
20
,
557
585
.

Wu
M.
,
Goodman
N.
(
2018
) Multimodal generative models for scalable weakly-supervised learning. In: Advances in Neural Information Processing Systems, Montreal, Canada.

Zeng
H.
,
Gifford
D. K.
(
2019
)
Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide selection for therapeutic design
.
Cell Syst
.,
9
,
159
166.e3
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Alfonso Valencia
Alfonso Valencia
Associate Editor
Search for other works by this author on:

Supplementary data