LENS: Landscape of Effective Neoantigens Software

Abstract Motivation Elimination of cancer cells by T cells is a critical mechanism of anti-tumor immunity and cancer immunotherapy response. T cells recognize cancer cells by engagement of T cell receptors with peptide epitopes presented by major histocompatibility complex molecules on the cancer cell surface. Peptide epitopes can be derived from antigen proteins coded for by multiple genomic sources. Bioinformatics tools used to identify tumor-specific epitopes via analysis of DNA and RNA-sequencing data have largely focused on epitopes derived from somatic variants, though a smaller number have evaluated potential antigens from other genomic sources. Results We report here an open-source workflow utilizing the Nextflow DSL2 workflow manager, Landscape of Effective Neoantigens Software (LENS), which predicts tumor-specific and tumor-associated antigens from single nucleotide variants, insertions and deletions, fusion events, splice variants, cancer-testis antigens, overexpressed self-antigens, viruses, and endogenous retroviruses. The primary advantage of LENS is that it expands the breadth of genomic sources of discoverable tumor antigens using genomics data. Other advantages include modularity, extensibility, ease of use, and harmonization of relative expression level and immunogenicity prediction across multiple genomic sources. We present an analysis of 115 acute myeloid leukemia samples to demonstrate the utility of LENS. We expect LENS will be a valuable platform and resource for T cell epitope discovery bioinformatics, especially in cancers with few somatic variants where tumor-specific epitopes from alternative genomic sources are an elevated priority. Availability and implementation More information about LENS, including code, workflow documentation, and instructions, can be found at (https://gitlab.com/landscape-of-effective-neoantigens-software).


Introduction
Tumor-specific and tumor-associated antigens are of great interest for understanding cancer immunobiology and developing personalized immunotherapy approaches including vaccination or adoptive T cell therapy. Unfortunately, predicting which tumor antigens are immunogenic in vivo is challenging (Smith et al. 2019b, Wells et al. 2020. These predictions are complicated by several factors influencing the appropriateness of candidate tumor antigens. Specifically, only a subset of tumor-specific or tumor-associated variants or sequences within a patient will be transcribed, translated, and processed by the proteosome. Only a subset of peptides generated through protein degradation will be presented on the cell surface by major histocompatibility complex (MHC) and only a further subset of those will result in T-cell recognition, activation, and cytotoxicity. Despite recent advances in understanding peptide attributes associated with in vivo immunogenicity, other factors including RNA editing, post-translational modifications, and peptide splicing may influence the effectiveness of computational predictions (Liepe et al. 2018, Zhang et al. 2018, Rolfs et al. 2019. Improved cancer antigen prediction using genomics data could empower more detailed studies of anti-tumor T cell responses as well as better personalized combination immunotherapy strategies (Hu et al. 2021).
Multiple workflows exist to predict tumor antigens from high-throughput sequencing data. These include OpenVax, pVACTools, and nextNEOpi among others (Hundal et al. 2020, Kodysh and Rubinsteyn 2020, Rieder et al. 2022). These tools allow users to study associations between tumor antigen burden and immunotherapy response, evaluate tumor antigen-specific T cell responses, and support clinical trials of personalized immunotherapy targeting predicted tumor antigens (Ott et al. 2017, Sahin et al. 2017, Kodysh and Rubinsteyn 2020, Caushi et al. 2021, Hu et al. 2021, Litchfield et al. 2021, Lowery et al. 2022). Here, we present an extensible, modular, and open-source workflow, Landscape of Effective Neoantigens Software (LENS), coupled with a Nextflow-based analysis platform, Reproducible Analyses Framework and Tools (RAFT, see Section 2.1), which addresses shortcomings of current neoantigen workflows, expands the repertoire of predicted tumor antigens, and serves as a springboard toward community-driven advances in neoantigen prediction.

Landscape of Effective Neoantigens Software
LENS is a modular, extensible workflow, which predicts tumor antigens from an array of sources including somatic single nucleotide variants (SNVs), conservative, disruptive, and frameshift insertions and deletions (InDels), splice variants, in-frame and frameshift fusion events, viruses, endogenous retroviruses (ERVs), and cancer-testis antigens (CTAs)/selfantigens ( Fig. 1). LENS includes phasing and germline variant information in epitope identification, and it harmonizes variant RNA expression across genomic sources to provide a more usable relative expression ranking each peptide epitope. More information about LENS, including workflow documentation and instructions for running LENS can be found at (https://gitlab.com/landscape-of-effective-neoanti gens-software).
Here, we provide an overview of the LENS workflow, describe its advantages over currently available workflows, describe its technical aspects, and discuss the results of running LENS on 115 acute myeloid leukemia (AML) patients.

Implementation
The framework running LENS is a Nextflow wrapper called RAFT, which enables our goals of modularity, extensibility, and improved support for collaboration (Di Tommaso et al. 2017). RAFT supplements the Nextflow DSL2 with a collection of tool-level modules, workflow modules, and automation to ease workflow creation and running (e.g. module dependency resolution, parameter centralization, etc.). A full description of RAFT's capabilities can be found at RAFT's GitHub: (https://gitlab.com/landscape-of-effective-neoanti gens-software/raft).

Flexibility
The modular design of LENS allows high flexibility in its execution. This presents itself in many forms including user-definable references files (FASTAs, GTF/GFFs, BED files, etc.) and parameters on either global or workflow-specific scopes. As a result, users maintain fine-grain control over workflow and tool-level behaviors. This allows manipulation of workflows to account for variability in sample type (e.g. fresh frozen vs. formalin fixed) or changing filtering criteria. Perhaps most importantly, the modular nature of LENS allows users to introduce novel tools or replace current tools with a better suited one. There may be other scenarios where swapping tools may be desirable or necessary as there are a variety of variant callers, fusion callers, and copy number inference tools available with their individual benefits and disadvantages (Haas et al. 2019). Replacing the LENS default tools with these alternatives or defining new ensemble approaches may produce richer and more meaningful results. Additionally, users may want to change tools to be within compliance of any license requirements encompassing their usage.

Extensibility
Properly leveraging novel technologies, protocols, and public datasets is crucial for progress toward meaningful and impactful insights in tumor antigen discovery. In light of this, we designed LENS such that it can be augmented through creation of new modules and workflows. LENS reports can further be supplemented by including published pMHC-specific empirical immunogenicity data as more become available. Inclusion of single-cell sequencing modules would help disentangle the genetic and transcriptional heterogeneity within a patient's tumor that cannot be observed in bulk sequencing data. Long read sequencing allows for detection of large structural variants and provides improved resolutions of haplotypes near tumor peptide-generating sequences of interest. Other technologies, such as ribosome profiling (Ribo-seq), will not only improve current tumor antigen predictions by confirming translation reading frames, but can open new tumor antigen sources, such as "genomic dark matter" arising from non-canonical reading frames (Delgado et al. 2014). We consider the version of LENS presented here to be a snapshot of the continuously improving workflow that will aid the immuno-oncology community toward improved therapeutics.

Comparison to other workflows
Neoantigen workflows have become increasingly popular as the value of personalized immunotherapy has been realized. There are currently dozens of neoantigen workflows publicly available including pVACtools, the OpenVax workflow, nextNEOpi, and MuPeXi (Bjerregaard et al. 2017, Hundal et al. 2020, Kodysh and Rubinsteyn 2020, Rieder et al. 2022. A comprehensive comparison among all available workflows is available in the literature (Rieder et al. 2022). LENS improves upon previous offerings by providing an endto-end solution utilizing containerized tools and modular workflows, expanding the types of tumor antigens predicted (to include splice variants, CTAs/self-antigens, oncogenic viruses, and ERVs), providing a harmonized tumor antigen abundance quantifier, and allowing customization of the workflow. A feature-level comparison against competing workflows that have been used in clinical trials can be found in Supplementary Table S1. A high-level comparative  Vensko et al.
summary between LENS and 26 other workflows can be found in Supplementary Table S4.

Workflow overview
The LENS workflow orchestrates over two dozen separate tools to generate tumor antigen predictions. LENS currently supports tumor antigen detection from the following tumor antigen sources: SNVs, InDels, fusion events, splice variants, viruses, ERVs, CTAs, and self-antigens. LENS will be expanded to include support for single-cell data, long read data, external reference data, and additional bioinformatics tools. A brief summary on peptide generation approaches for each workflow can be found in Supplementary Table S4. 2.6 Data pre-processing, read alignment, and transcript quantification LENS processes and aligns input FASTQS against a reference genome with an annotation file. Alignment sanitization is performed to ensure high quality inputs for downstream tumor antigen source-specific workflows. Transcripts defined within the GTF annotation file are then quantified. The preprocessing workflow results in numerous intermediate files required for each tumor antigen workflow and is visualized in Supplementary Fig. S1.

SNVs and InDels
Somatic SNVs and InDel variants may result in targetable tumor-specific peptides. These peptides may be a direct result of a somatic variant or may be a downstream coding consequence of the variant. They should not exist in non-tumor tissues, so T cells that target them could escape negative selection in the thymus, making them attractive targets for antigen-specific immunotherapy (Smith et al. 2019c).
Vaccines targeting SNV-derived tumor-specific antigens have been used in clinical trials, with strategies including peptide vaccines (Lilleby et al. 2017, Obara et al. 2017, dendritic cell vaccines (Carreno et al. 2015, Filley andDey 2017), DNA vaccines (Li et al. 2021), and RNA vaccines (Sahin et al. 2017). The LENS SNV and InDel workflows filter somatic variants using a consensus approach. By default, somatic variants are called using three variant callers (Mutect2, Strelka2, and ABRA2), and variants within the intersection of the appropriate callers (MuTect2 and Strelka2 for SNVs; all three callers for InDels) are considered (McKenna et al. 2010, Kim et al. 2018, Mose et al. 2019). Users may also use a union of variants as well as other variant callers, such as VarScan2 (Lang et al. 2022). Requiring consistent variant detection among callers culls potential false positive variants leaving a set of high confidence calls. The intersected VCF is filtered for coding variants (missense SNVs, conservative and disruptive inframe InDels, and frameshift InDels) for further processing. Variants are filtered by two criteria at different points in the workflow: (i) their relative transcription abundance must exceed an abundance percentile (e.g. 75%) and (ii) sequence capable of coding variant-harboring peptides must be detectable in the patient's tumor RNA-sequencing data. Expression filtering creates a stable set of SNVs/InDels that are likely transcribed and translated and may be presented by the MHC on the tumor cell surface. Accurately generating tumor peptides from the patient's somatic variants requires phasing candidate somatic variants with neighboring germline variants ( Supplementary Fig. S2). LENS calls germline variants with DeepVariant and performs phasing using the patient's DNAand RNA-sequencing data through WhatsHap (Martin et al. 2016). Phased heterozygous germline variants within the same haplotype block as a candidate somatic variant are incorporated into the resulting peptide where appropriate. This step ensures the predicted peptides more accurately reflect those potentially presented by the patient's tumor.
The report generated by the SNV workflow includes several metrics relevant to peptide prioritization (described in Section 2.14), the gene and transcript harboring the expressed variant, the transcript's relative abundance [in transcripts per million (TPM)], as well as the number of RNA-sequencing reads from the peptide's genomic origin containing a sequence, which translates to the peptide (Rasmussen et al. 2016, Jurtz et al. 2017, Richman et al. 2019). The last metric will be discussed in more detail later, but serves as an estimated proxy for peptide abundance harmonized across tumor antigen workflows. Furthermore, LENS estimates variant cancer cell fraction with PyClone-VI by using copy number alteration data from Sequenza along with MuTect2 allele frequencies (Favero et al. 2015, Gillis andRoth 2020). Understanding the resulting distribution of tumor sub-populations and estimating the clonality of variants will allow for improved prioritization of predicted neoantigens. The SNV/InDel workflow is visualized in Supplementary Fig. S3.
Many SNV and in-frame InDel neoantigens have shown limited utility for inducing a strong immune response (Wells et al. 2020). SNV/InDel neoantigen therapies have focused on tumor types known to have high tumor mutational burden (TMB), such as melanoma. These therapeutic approaches may not be appropriate for patients with lower TMB tumors, such as AML. Furthermore, SNVs and InDel neoantigens tend to be private to individuals which limits the possibility of "off the shelf" neoantigen vaccines. These considerations suggest other tumor antigen sources beyond somatic variants are worthy of further consideration.

Splice variants
Aberrant splicing events create tumor-specific transcripts through intron retention, exon skipping, or altered splice targeting ( Supplementary Fig. S4). These transcripts may be translated and processed into targetable tumor-specific peptides. Ninety-four percent of human genes have intronic regions ) and most undergo alternative splicing to generate transcriptomic and proteomic diversity in order to fulfill a wide variety of protein functions (Frankiw et al. 2019). Alternative splicing is influenced by RNA structure, chromatin structure, and transcription rate as well as splice site generating or disrupting cis-acting mutations and transacting mutations that alter splicing factors to cause aberrant splice variants in other genes (Smith et al. 2019c).
LENS utilizes NeoSplice, a k-mer searching and splice graph traversal algorithm, to detect tumor-specific splice variants and neoantigens derived from them (Chai et al. 2022). NeoSplice considers patient-specific splice variants derived from their tumor RNA sample relative to splice variants from a tissue-matched normal RNA sample. Differences in k-mer distributions between the tumor and normal splice variants allow for high confidence detection of tumor-specific splice variants. Peptide abundance is estimated by counting the number of reads containing the complete peptide's coding sequence mapping to the splice variant genomic origin. The splice variant workflow is visualized in Supplementary Fig. S5.

Fusion events
Scenarios in which a translocation, deletion, or inversion causes two previously genomically distant loci to become neighboring sequences or "fuse" within the genome are also of interest as sources of tumor antigens. This combination of two naturally occurring intragenic coding sequences can create novel peptides that may serve as immunogenic epitopes. This effect may be amplified when fusion events result in translational frameshifts that are as potentially immunogenic as splice variants , theoretically moreso than SNV-derived neoantigens (Yang et al. 2019).
LENS utilizes STARFusion for detection of fusion events using recommended default parameters (Haas et al. 2017). Similar to the SNV and InDel workflows, homozygous germline variants are incorporated into the fused coding sequence prior to translation using phased germline VCFs. Proteins derived from in-frame fusion events are truncated upstream to a user-specified length and downstream of the fusion junction and processed through a suite of tools to characterize potential pMHCs (see Section 2.14). Frameshift-derived proteins are truncated upstream of the fusion junction and include all downstream sequence until the first stop codon. Peptides are filtered and quantified by checking for their coding sequence within the RNA reads that either map across a junction point or are mapped to both sides of the junction point. The fusion workflow is visualized in Supplementary Fig. S6.

Tumor-specific viruses
Some viruses, such as human papillomavirus, Epstein-Barr virus, human T cell leukemia/lymphoma virus type 1, and hepatitis C virus, are associated with development of cancers in the tissues they infect earning them the label of oncogenic viruses (Perz et al. 2006, Mesri et al. 2010, Zhao et al. 2021. Viruses can drive oncogenesis through disruption of host cell growth and survival, induction of DNA damage response causing host genome instability, by causing chronic inflammation and tissue damage, or by causing immune dysregulation creating a more permissive immune environment for tumorigenesis (Tashiro and Brenner 2017). Antigens derived from the virus are regarded as tumor-associated antigens, yet are immunologically more foreign compared to self-derived peptides, making them potential immunotherapeutic targets.
LENS detects viral-derived tumor antigens with VirDetect, a previously developed workflow designed to detect viral contamination within RNA-sequencing data (Selitsky et al. 2020). Specifically, RNA reads are mapped against the reference genome and any reads that cannot be mapped to the reference are diverted to the VirDetect workflow. VirDetect aligns these unmapped reads to coding sequences of over 1900 vertebrate viruses. Homozygous germline variants detected within the RNA-sequencing data through BCFtools are incorporated into the viral sequence prior to translation. Translated viral peptide sequences are considered as potential tumor antigens for downstream processing. Peptides are quantified by counting occurrences of peptide-associated coding sequences within the reads that map to express viral coding sequences. The viral workflow is visualized in Supplementary Fig. S7.

Endogenous retroviral elements
ERVs are, unlike typical viruses, integral to the human genome and make up roughly 8% of the genome (Grandi and Tramontano 2018). Some retroviral elements have retained intact open reading frames (ORFs) that may be transcribed, translated, processed, and presented by MHCs on the cell surface under abnormal transcriptional regulation within a tumor. Their lack of central tolerance, elevated abundance, and ability to be expressed under chaotic tumor conditions make ERVs intriguing potential sources of tumor antigens. Unsurprisingly, ERVs have been a focal point of tumor antigen research due to evidence suggesting CD8þ T cell recognition of ERV-derived peptides results in an immunogenic response (Smith et al. 2019a, Bonaventura et al. 2022. LENS detects ERV-derived candidate peptides through use of the genome-based Endogenous Viral Element database, which includes computationally predicted retroviral element ORFs from which antigens may be generated (Nakagawa and Takahashi 2016). Expressed ORFs have homozygous germline variants integrated into their sequences prior to translation. We assume some ERVs have natural low levels of expression in some normal tissues, so differential expression filtering is used to narrow the considered pool ( Supplementary Fig. S8). Specifically, tissue-matched normal control samples are processed through an ERV quantification workflow and the resulting raw counts are normalized to patient-specific ERV counts through EdgeR and ERV ORFs, which exceed a fold-change threshold [logðCPM tumor þ 1ÞÀ logðCPM normal þ 1Þ > 1, see Supplementary Fig. S12], are considered for downstream processing (Robinson et al. 2010). ERV peptides are quantified by counting peptidecoding sequences among reads that map to express ORFs. The ERV workflow is visualized in Supplementary Fig. S9.

Cancer testis antigens and self-antigens
Tumor self-antigens are an additional potential target for immunotherapy. These antigens are non-mutated self-antigens that may be overexpressed within a tumor. These antigens are traditionally viewed as suboptimal therapeutic targets for two reasons: (i) T cells recognizing self-antigens would be expected to be deleted in the thymus and (ii) autoimmunity is difficult to avoid when targeting antigens expressed in both normal and tumor tissues. However, work has been done to identify methods of self-antigen targeting while minimizing autoimmunity. The pool of self-antigens to select from for targeting is also large, as any given tumor will present many selfantigens, making them potential vaccine targets (Cheever et al. 2009, Bright et al. 2014. Genes normally expressed in the immune privileged tissues, such as testis tissue, may be transcriptionally active in tumors. Antigens derived from these transcripts are commonly referred to as CTAs and are of particular interest due to their potential immunogenicity. These antigens are derived from normal genes that are usually expressed either during early development or in adulthood only within immune privileged tissues, but may become overexpressed within tumors, such as melanoma, breast cancer, or bladder cancer. This abnormal expression makes them promising targets due to their lack of expression in normal tissues undergoing immune surveillance (Mitchell et al. 2021).
LENS accepts a user-provided list of CTA and self-antigen gene identifiers, but defaults to a set of cancer-testis genes from CTDatabase (Almeida et al. 2009). This list includes genes that are germline-biased and highly expressed in cancers, but have not necessarily been shown to initiate an immune response when targeted. The CTA and self-antigen transcripts are first filtered for transcription abundance exceeding a user-specified percentile threshold. Transcripts passing the filter then have homozygous germline variants incorporated and the resulting predicted peptides are run through a suite of tools to characterize candidate pMHCs. pMHCs with high binding affinity and high pMHC stability are quantified by counting occurrences of peptide-coding sequence observed within reads mapping to the peptide's genomic origin. The CTA/self-antigen workflow is visualized in Supplementary Fig. S10.

Tumor antigen peptide characterization
Peptides are generated by antigen source-specific workflows and then combined into a single set. These peptides are combined with the patient's HLA alleles to characterize aspects of the pMHC. Calculated peptide features include estimates of binding affinity, binding stability, proteosomal processing score, presentation score, dissimilarity, foreignness, and agretopicity. Supported tools are currently NetMHCpan (binding affinity), NetMHCstabpan (binding stability), MHCFlurry (binding affinity, antigen processing, and presentation), DeepHLAPan (binding affinity and immunogenicity), and antigen.garnish (dissimilarity and foreignness).

Harmonization of peptide abundance estimates
The tumor antigen workflows within LENS have a variety of quantification metrics generated by the individual workflows. These metrics include TPM for transcript quantification in the SNV and InDel workflows, fusion fragments per million for fusion quantification in the fusion workflow, minimum observed expression for the splice variant quantification in the splice workflow, and read count for viral expression in the viral workflow. Each metric described may be broadly relevant to tumor antigen quantification, but most have at least two issues: (i) they conflate transcript abundance of the peptidegenerating coding sequence with the transcript abundance of its wild-type counterpart and (ii) they do not allow meaningful comparisons of peptide abundance among workflows. As an example of the first issue, consider TPM, a metric commonly used to represent relative transcript abundance. TPMs have been used as a filtering criterion for SNV and InDel neoantigens, but they represent the abundance of the transcript (rather than just the peptide-generating sub-sequence) and may include alleles that do not code for the peptide of interest. To address this, we developed a novel quantification strategy utilizing the observed occurrences of the nucleotide sequence responsible for coding the peptide from the peptide's genomic origin from the patient's tumor RNA-sequencing data (Supplementary Section S4.1). This metric resolves both of the limitations of currently used metrics and allows for prioritization by abundance across multiple tumor antigen types.

Visualization
Neoeptitope confirmation and prioritization are key to selecting potentially therapeutically useful targets for vaccine development. To address this need, we developed LENS Viz, a Shiny web app available at https://lens.shinyapps.io/lensviz [or run locally using code at (https://gitlab.com/landscape-ofeffective-neoantigens-software/lensviz)] (Supplementary Fig.  S11). LENS Viz allows interactive exploration of the data. This results in an improved understanding of binding affinity, binding stability, and RNA read support on a per-peptide and per-antigen source basis and how modifying thresholds affect the set of potential tumor antigens of interest. It also includes an interactive IGV widget to allow users to view read-based evidence of tumor antigen sources in the tumor samples (and the lack of evidence in the normal sample).

Murine support
Research using murine models of cancer to better understand anti-tumor immune mechanisms has been crucial to improving human health through immuno-oncology. To this end, LENS was designed to support usage of mouse samples and references to predict potential pMHCs for common research strains. Instructions for running LENS on mouse samples can be found at (https://gitlab.com/landscape-of-effective-neoanti gens-software/nextflow/modules/tools/lens/-/wikis/Running-LENS). We have tested LENS on the BBN963 strain, a mouse model of bladder cancer, and discovered potential therapeutic targets that are being followed up with experimental testing.

Applying LENS to TCGA-LAML
The design principles of LENS allow it to avoid constraints around specific tumor types or tumor antigen sources. We demonstrate LENS by processing data from several patients of the widely available TCGA-LAML (AML) dataset. AML is suitable for demonstrating LENS due to its relatively low TMB compared to other tumor types. This low TMB presents a situation where non-SNV-and InDel-derived neoantigens are crucial. The TCGA-LAML dataset also contains sufficient patient-level data including the three sample types [normal exome sequencing, abnormal (tumor) exome sequencing, and abnormal (tumor) RNA sequencing] for 115 patients. Here, we discuss the results for each tumor antigen workflow currently available from LENS. This is not intended to be an exhaustive analysis of LENS outputs, but rather to provide an example of its utility in tumor antigen prediction using genomics data.

General observations among tumor antigen sources
TCGA-LAML patients show a range of predicted tumor antigen counts from 47 to 1789 (median: 366) ( Supplementary  Fig. S13). Predicted antigen counts among patients vary by genomic source. Specifically, CTA/self-antigen and ERV peptides made up over 37 000 total predicted peptides among patients [37 349/41 395 (90.2%)] while the remaining sources (SNVs, InDels, fusion events, splice variants, and viruses) consisted of 4046 peptides (9.8%). This disparity is likely explained by two factors: (i) ERVs are highly abundant within the genome and (ii) the entirety of each CTA/self-antigen's coding sequence and ERV's ORF sequence is used for peptide generation. This increase of "raw material" compared to point neoantigen targets, like SNVs and InDels, results in more potential peptides (even after binding affinity filtering). Binding affinities among antigen sources had medians between 100 and 250 nM with the exception of CTA/selfantigens due to the manual filtering to peptides with binding affinities under 50 nM (Fig. 2a). Interestingly, binding affinity medians vary among patients within tumor antigen source and binding affinity variance is especially high in ERV and virus-derived peptides (Fig. 2b). This high variance is likely explained again by the large number of peptides generated from these workflows resulting in more opportunities for binding affinity variability.

LENS: Landscape of Effective Neoantigens Software
Binding stability distributions within tumor antigen sources also show the expected pattern of an elevated number of low stability pMHC with a long tail of rarer, but more stable pMHCs. Binding stabilities were limited to a maximum of 5 h for interpretative and illustrative purposes, but some pMHCs, primarily CTA/self-antigen-derived and ERV-derived, have estimated binding stability half-lives of up to 66.6 h. Binding stabilities appear to follow similar trends among patients across tumor antigen sources (Fig. 3a). Notably, several splice variant-derived pMHCs show identical distributions of pMHC binding affinity and stability. These patients all appear to have an expressed and detectable retained intron in the PRTN3 gene.
RNA-based peptide support (calculated as the frequency of peptide-coding sequences occurring within tumor-derived RNA-sequencing reads) likewise shows variability among antigen sources and among patients (Fig. 4). Peptides derived from somatic mutations (e.g. SNVs, InDels, and fusions) show generally lower read support than peptides generated through aberrantly expressed sources (ERVs, CTAs/selfantigens, and splice variants). Both CTA/self-antigens and ERVs show more normal distributions compared to the rightskewed distributions resulting from gene fusions and SNVs. This is likely due to a combination of factors including the averaging effect of having multiple peptides from a single transcript for ERVs and CTA/self-antigens as well as possible technical artifacts driving SNVs and gene fusions with low read support. Future work into the optimization of each tumor antigen workflow will allow for an improved interpretation of these distributions. More detail on antigen classspecific results are available in the Supplementary Material.

Benchmarking with TESLA
The dataset of experimentally validated immunogenic class I peptides from non-small cell lung cancer (NSCLC) and melanoma patients generated by the TESLA Consortium has allowed for an improved understanding of the factors driving neoantigen immunogenicity (Wells et al. 2020). These data also provide opportunities to benchmark the performance of neoantigen workflows (Rieder et al. 2022). The set of validated immunogenic peptides was derived from somatic variants, so it does not include the non-somatic variant tumor antigens that LENS predicts. Nonetheless, ensuring a workflow can detect immunogenic peptides serves as a check for correctness of the SNV and InDel neoantigen prediction steps.
LENS was run on the eight available TESLA patients (two NSCLC and six melanoma) using a union of somatic variant calls from MuTect2, Abra2, Strelka2, and VarScan2. LENS detected 36 of 38 immunogenic peptides from TESLA with 33   Vensko et al.
peptides passing filtering for reporting in the LENS report. Notably, LENS discovered hundreds of non-SNV/Indel tumor antigens for each patient. More detail can be found in the Supplementary Material.

Discussion
Predicting the full suite of tumor antigens from genomics data remains a formidable challenge. We developed LENS, coupled with the RAFT framework, to improve upon current workflow offerings in a variety of ways: (i) LENS allows for modularity and extendibility through the Nextflow DSL2 language, (ii) LENS explores more tumor antigen sources than previously considered by other workflows, (iii) LENS harmonizes peptide abundance estimates among various tumor antigen sources, and (iv) LENS includes an application for visualization and interactive exploration to support prioritization of predicted neoantigens for therapeutic applications. LENS is open source, and we expect it to expand through both introduction of new modules, tools, and reference datasets, along with refinement of its tumor antigen prioritization capabilities. There are several objectives discussed below concerning these goals. A primary objective for the improvement of LENS is expanding the set of supporting sequencing technologies. LENS currently supports data generated through short-read technologies, such as the Illumina platform. Illumina's sequencing-by-synthesis chemistry sees widespread usage and support, both within the immuno-oncology community and more broadly throughout the life sciences research community. Despite its popularity, the short read and bulk cell approaches commonly used result in reduced information compared to long read and single-cell approaches. We aim to address these limitations within LENS by supporting long read and single-cell sequencing technologies. Long read sequencing technologies allow improved resolution of structural variation that may be relevant to neoantigen prediction. Additionally, single-cell sequencing circumvents the confounding effects of bulk sequencing which allows improved understanding of tumor heterogeneity. Empirically deciphering this heterogeneity is crucial to properly prioritizing clonal tumor antigens while also mapping co-occurring subclonal neoantigens to optimize therapeutic targeting.
We plan to further expand LENS through inclusion of third-party reference data and additional bioinformatics tools to provide information about the support (or lack thereof) of a peptide's immunogenicity. The most important and immediate addition will be the inclusion of Class II binding affinity prediction. We also plan to extract relevant summarized data from large datasets, additional tools, or smaller scale datasetspecific observations in the literature.
Beyond the inclusion of technologies and data, there is also room for improvements within LENS in its current form. LENS supports a variety of tumor antigen sources, but currently it effectively treats these workflows independently. This independence among tumor antigen workflows in LENS also does not allow potentially useful "crosstalk" between workflows. For example, LENS may be able to provide higher confidence for splice antigen targets if there is evidence of somatic splice site variant in the corresponding variant data. While many challenges remain in tumor antigen prediction, we expect that the breadth of features, flexibility, modularity, and ease of use of LENS will support wide adoption as a springboard toward iterative improvements as more data, tools, and an improved understanding of antigen presentation and immunogenicity become available.

Supplementary data
Supplementary data is available at Bioinformatics online.