TIminer: NGS data mining pipeline for cancer immunology and immunotherapy

Abstract Summary Recently, a number of powerful computational tools for dissecting tumor-immune cell interactions from next-generation sequencing data have been developed. However, the assembly of analytical pipelines and execution of multi-step workflows are laborious and involve a large number of intermediate steps with many dependencies and parameter settings. Here we present TIminer, an easy-to-use computational pipeline for mining tumor-immune cell interactions from next-generation sequencing data. TIminer enables integrative immunogenomic analyses, including: human leukocyte antigens typing, neoantigen prediction, characterization of immune infiltrates and quantification of tumor immunogenicity. Availability and implementation TIminer is freely available at http://icbi.i-med.ac.at/software/timiner/timiner.shtml. Supplementary information Supplementary data are available at Bioinformatics online.

The TIminer pipeline generates several output files. We refer the reader to TIminer online documentation for a full description of the output files (http://icbi.i-med.ac.at/software/timiner/doc/index.html#output-files), and discuss in the following the interpretation of the major results obtained from the mock examples that describe tumor-immune cell interactions.
• Immunophenoscore. From the immunophenoscore plot (Supplementary Figure S2) we can see that HLA-related genes are strongly up-regulated (top-left outer sector, in dark red), while genes related to both effector immune cells (EC, top-right sector) and suppressive immune cells (SC, bottom-right sector) are slightly up-regulated. Most of the checkpoints molecules (CP) with immunoinhibitory effects (identified by the "-" sign) are down-regulated, while co-stimulators ("+" sign) are both up-(CD27) and down-regulated (ICOS). Taken all together, these positive and negative contributions can be summarized in an immunophenoscore of 10 (on a [0-10] scale), representing a good immunophenotype, i.e. a tumor which is likely to elicit an effective immune response.
• Neoantigens. The expressed neoantigens consist of seven mutated peptides arising from two genes: NCOA6 and TP53. The original pool of mutated peptides comprised eight peptides from three genes, but one of them was not expressed (TP53TG3D). Among the expressed neoantigens, the MNRRPILTI peptide, arising from an R>G missense mutation in TP53, was predicted to bind to HLA-C12:03 with high affinity (407.5 nM), whereas a lower affinity was predicted for its wild-type version MNRGPILTI (1059.7 nM).

Advanced filtering
The TIminer function filterNeoantigenDir (or filterNeoantigenFile for single-subject analysis) selects neoantigens that arise from expressed genes, which are identified from the files of transcripts-permillions (TPM) generated by Kallisto (Bray et al., 2016) (see Figure 1 in the main text). Alternatively, an advanced filtering scheme implemented in the sensitiveFilterNeoantigenDir function (or sensitiveFilterNeoantigenFile for single-subject analysis) can be selected (Supplementary Figure  S3). This function performs sensitive mapping of the RNA-seq reads with HiSat2 (Kim et al., 2015) and then calculates the coverage of each mutation from with ASEReadCounts function from the Genome Analysis Toolkit (McKenna et al., 2010). Finally, from the list of binding mutated peptides identified by NetMHCpan (Nielsen and Andreatta, 2016), those arising from mutations with an RNA-seq read coverage ≥5 are selected (default value which can be modified with the countThresh parameter).
The sensitive filtering can be activated in the TIminer pipeline by specifying the --sensitiveFiltering parameter: python TIminerPipeline.py --input INPUT --out OUT --sensitiveFiltering Please note that this option, with respect to the default filtering scheme, is more computationally demanding.
Supplementary Figure S2. The immunophenogram obtained from the mock example data. The outer sectors represent the z-scored expression of the genes or immune cell types determining tumor immunogenicity, together with their weight (either positive "+" or negative "-"); gene z-scores are averaged for cell types. The immune cell types are: activated CD4 + or CD8 + T cells (Act CD4 or Act CD8), effector memory CD4 + or CD8 + T cells (Tem CD4 or Tem CD8), central memory CD4 + or CD8 + T cells (Tcm CD4 or Tcm CD8), regulatory CD4+ T cells (Treg), and myeloid-derived suppressor cells (MDSC). The inner sectors summarize the z-scores into four scores, one for each major determinant of tumor immunogenicity: genes related to antigen processing and presentation (MHC), checkpoints molecules and immunomodulators (CP), effector T-cells (EC), and suppressive immune cells (SC). The immunophenoscore (IPS) is an aggregated score representing the overall tumor immunogenicity on an arbitrary scale from 0 to 10.