- Split View
-
Views
-
Cite
Cite
Anja Mösch, Dmitrij Frishman, TCRpair: prediction of functional pairing between HLA-A*02:01-restricted T-cell receptor α and β chains, Bioinformatics, Volume 37, Issue 21, November 2021, Pages 3938–3940, https://doi.org/10.1093/bioinformatics/btab573
- Share Icon Share
Abstract
The ability of a T cell to recognize foreign peptides is defined by a single α and a single β hypervariable complementarity determining region (CDR3), which together form the T-cell receptor (TCR) heterodimer. In ∼30–35% of T cells, two α chains are expressed at the mRNA level but only one α chain is part of the functional TCR. This effect can also be observed for β chains, although it is less common. The identification of functional α/β chain pairs is instrumental in high-throughput characterization of therapeutic TCRs. TCRpair is the first method that predicts whether an α and β chain pair forms a functional, HLA-A*02:01 specific TCR without requiring the sequence of a recognized peptide. By taking additional amino acids flanking the CDR3 regions into account, TCRpair achieves an AUC of 0.71.
TCRpair is implemented in Python using TensorFlow 2.0 and is freely available at https://www.github.com/amoesch/TCRpair.
Supplementary data are available at Bioinformatics online.
1 Introduction
T cells are a key element of the adaptive immune system because they can detect infected or aberrant cells through their receptors T-cell receptor (TCR). The TCR is a heterodimer composed of one α and one β chain. Each chain contains a hypervariable complementarity determining region (CDR3), which interacts with a peptide bound to the human leukocyte antigen (HLA), the human version of the major histocompatibility complex, expressed on the surface of an antigen presenting cell. Since the CDR3α and CDR3β regions are highly variable due to the V(D)J recombination, peptide recognition is very specific and each TCR only binds to one or just a few peptides presented by an HLA allele (Hughes et al., 2003; Lu et al., 2019). The peptide specificity is controlled by the process of thymic selection, which only allows T cells that do not recognize peptides of the healthy peptide repertoire to circulate in the body. Most of the positively selected T cells express a single unique TCR on the cell surface, for which only one transcript of an α and one of a β chain is present. However, it has been shown that ∼30–35% of T cells express two α chains on the mRNA level and some T cells also express two β chains, although their number is significantly lower due to transcriptional allelic exclusion and other mechanisms (Dupic et al., 2019; Redmond et al., 2016; Schuldt and Binstadt, 2019; Stubbington et al., 2016). If two α or two β chains can be detected by RNA sequencing of clones or single cells, two surface TCRs might be present but more often only one of the two chains from the same locus is part of the functional TCR (Schuldt and Binstadt, 2019). Identifying the functional α/β TCR combination is crucial for the assessment of suitable TCRs for cancer immunotherapy (Parkhurst et al., 2017; Shitaoka et al., 2018). Current methods to identify α/β pairing require specific experimental setups and are more geared toward the identification of α/β chain pairs in T-cell repertoires (Egorov et al., 2015; Holec et al., 2019; Howie et al., 2015; Lee et al., 2017). Here, we present TCRpair, a deep learning algorithm to predict functional pairs of α/β TCRs recognizing HLA-A*02:01 restricted peptides. TCRs are reconstructed from the CDR3 sequence and the V/J gene annotation, which represents the minimum annotation of a TCR in publicly available databases (Bagaev et al., 2020; Dhanda et al., 2019; Shugay et al., 2018; Vita et al., 2019). TCRpair can be instrumental in speeding up TCR sequence verification if RNA sequencing data does not yield unequivocal results. Additionally, TCRpair supports input from MiXCR (Bolotin et al., 2015), including filtering for possible α/β combinations by clonotype frequency.
2 Materials and methods
Pairs of CDR3α and CDR3β sequences with their respective V and J allele annotation as well as information on the recognized peptide and the HLA-A allele were downloaded from IEDB (Dhanda et al., 2019; Vita et al., 2019) and VDJdb (Bagaev et al., 2020; Shugay et al., 2018), which predominantly consists of single cell sequencing data. In total we obtained 21 715 unique TCRs, of which 3250 HLA-A*02:01 restricted TCRs were used for model training/testing and validation (Supplementary Table S1). A negative dataset (n = 2209) was generated by randomly combining CDR3α and CDR3β chains and then selecting only those chain pairs, for which the CDR3α chain originates from a TCR recognizing a different peptide as the TCR from which the CDR3β chain originates (Supplementary Fig. S1). For each TCR, the full TCR sequence was reconstructed by aligning CDR3 sequences to the sequences of their respective V and J alleles from the IMGT/LIGM database (Giudicelli, 2006). Nine different sequence types were used as model inputs: CDR3 region only, CDR3 region with 3, 5, 7, 9, 11, 13 or 15 flanking amino acids and the full TCR sequence (Fig. 1A). For each TCR, α and β chain sequences were concatenated to be used as single sequence input and BLOSUM62 encoded (Henikoff and Henikoff, 1992; Nielsen et al., 2003) (Fig. 1B and Supplementary Information S1). The dataset was randomly split into 80% training and 20% validation data. For each input type, a model was trained for 20 epochs with batch size 50 using the Adam optimization algorithm (Shao et al., 2020).
An independent dataset of 11 HLA-A*02:01-restricted TCRs with two α or two β chains detected at the RNA level was used to test whether TCRpair can identify the functional chain by comparing likelihood scores. RNA sequencing data of T-cell clones was processed by MiXCR (Bolotin et al., 2015), whereas for 10 clones two CDR3α chains and for 1 clone two CDR3β chains showed a clone fraction of at least 0.35. The functional chain for each TCR was experimentally identified by expressing the β chain in combination with both α chains (or in one case the α chain in combination with both β chains) and comparing their cytotoxicity in vitro by coculturing with peptide-presenting cells. All 11 TCRs recognize peptides for which no TCRs are present in the training data. The differences between validation dataset and independent dataset are described in Supplementary Information S2.
3 Results and discussion
TCRpair can predict whether a pair of α and β chains has the tendency to form a functional TCR and assists with the identification of the chain that is part of the functional TCR if two α or two β chains are detected at the RNA level for HLA-A*02:01 restricted TCRs (Supplementary Fig. S2). The models using flanking amino acid sequences performed better than models using only the CDR3 sequence or the full TCR sequence, which includes CDR1 and CDR2 sequences that show a limited diversity compared to the highly variable peptide binding CDR3 sequence (Arden, 1998). On the validation dataset, the models with 5 and 7 flanking amino acids both achieved an area under the receiver operating characteristic curve (AUC) of 0.71 and an average precision of 0.80 (Fig. 1C). The model with 7 flanking amino acids correctly identified 7 out of 11 TCRs from the independent dataset (Fig. 1D and Supplementary Table S2). Both models with 9 and 11 flanking amino acids performed comparably well. All these four models showed improved prediction performance compared to the model using only the CDR3 sequences and the model using the full TCR sequence. These observations also hold true when comparing AUCs for individual peptides (Supplementary Table S3). Furthermore, we observed higher differences between the likelihood scores of real and perturbed amino acid input vectors for regions with a higher amino acid variation such as the V region compared to more conserved positions such as the first two positions of the CDR3 regions (Supplementary Information S3 and Table S5) (Yu et al., 2019). These results demonstrate that TCRpair learned to identify the features of the TCR’s α and β chain sequences, which ultimately determine functional pairing and thus TCR specificity, without the need to know the sequence of the recognized peptide. TCRpair performs comparably to NetTCR 2.0 (https://services.healthtech.dtu.dk/service.php?NetTCR-2.0; Jurtz et al., 2018), which in contrast requires one of three possible peptides as additional input (Supplementary Table S4). Additionally, TCRpair demonstrates that sequence context can improve performance for sequence-based machine learning algorithms using LSTM layers, which might apply to similar prediction problems.
The current version of TCRpair is limited to the TCRs recognizing peptides presented by HLA-A*02:01, which is the most common allele in Caucasian populations (Gonzalez-Galarza et al., 2015). It does not work for other HLA restrictions (see Supplementary Table S2) or naïve T-cell repertoires (see Supplementary Information S1), for which frequency-based methods relying on the distribution of T-cell clones over multiple samples of the same repertoire are more suitable (Holec et al., 2019; Howie et al., 2015; Lee et al., 2017). However, the growing amount and quality of TCR sequencing data especially from single cells will allow the addition of further HLA alleles and the training of a general HLA-independent model in the future.
Acknowledgements
The authors wish to thank Dr Silke Raffegerst for carefully reading the manuscript and many useful comments.
Financial Support: none declared.
Conflict of Interest: A.M. is an employee of Medigene Immunotherapies GmbH, a subsidiary of Medigene AG, Planegg, Germany.