Bioinformatics for precision oncology

Abstract Molecular profiling of tumor biopsies plays an increasingly important role not only in cancer research, but also in the clinical management of cancer patients. Multi-omics approaches hold the promise of improving diagnostics, prognostics and personalized treatment. To deliver on this promise of precision oncology, appropriate bioinformatics methods for managing, integrating and analyzing large and complex data are necessary. Here, we discuss the specific requirements of bioinformatics methods and software that arise in the setting of clinical oncology, owing to a stricter regulatory environment and the need for rapid, highly reproducible and robust procedures. We describe the workflow of a molecular tumor board and the specific bioinformatics support that it requires, from the primary analysis of raw molecular profiling data to the automatic generation of a clinical report and its delivery to decision-making clinical oncologists. Such workflows have to various degrees been implemented in many clinical trials, as well as in molecular tumor boards at specialized cancer centers and university hospitals worldwide. We review these and more recent efforts to include other high-dimensional multi-omics patient profiles into the tumor board, as well as the state of clinical decision support software to translate molecular findings into treatment recommendations.


Introduction
The continuous improvement, greater availability and decreasing cost of next-generation sequencing (NGS) have allowed major cancer centers worldwide to offer NGS-based personalized oncology for clinical practice. The goal is to profile the genetic aberrations of tumors such as single-nucleotide variants (SNVs), copy number variants (CNVs), insertions and deletions (indels), structural variants (SVs) and gene fusions, and to suggest potential treatments based on the molecular lesions that are observed.
These approaches can be organized either as a single institutional molecular tumor board (MTB), where detected genetic aberrations will be evaluated for any potential matching treatments, or as a basket trial, in which predefined genetic alterations are assigned to matching treatment arms (baskets). Both approaches typically include patients who are progressive on all conventional treatment options and those with rare cancers for which limited treatments exist, such as many pediatric tumors [1].
MTBs are now widespread in the USA, Europe and Australia with reported patient numbers to date ranging up to 2000 patients per cancer center [2]. Ideally, a biopsy is taken on tumor progression from the last therapy to resemble the current genetic state of the evolved tumor [3,4]. However, some MTB approaches also profile biopsies sampled at diagnosis, especially for high-risk tumors with few treatment options [5][6][7], or biopsies of patients currently responding to therapy but without further therapeutic options [8][9][10]. Typically, biopsies with a tumor content of at least 20% are analyzed by cancer-specific gene panels, such as FoundationOne [11], or whole-exome sequencing (WES) [12] (Figure 1). Some centers include additional measurements, such as profiling of the transcriptome, methylome or copy number alterations [5,13,14]. Whereas profiling by WES usually includes a germ line control [5,12], this control is missing in most panel sequencing approaches [2,11]. In an ideal setup, matched tumor-normal DNA and RNA sequencing samples are processed in the same conditions, including in the same lane of the sequencer. The resulting NGS data are analyzed for genetic aberrations and potential drug interactions. Specific treatment suggestions are, after careful consideration of available preclinical and clinical evidence, incorporated into a clinical report, which together with the patient's clinical data, such as treatment history, comorbidities and radiology scans, forms the basis for therapeutic decisionmaking in an interdisciplinary MTB. The molecular report may suggest tumor genotype-matched clinical trials and targeted therapies, such as kinase inhibitors, or recommend the avoidance of drugs, for example, in cases where mutations that potentially confer treatment resistance have been detected.
A number of challenges exist for current precision oncology approaches during all the steps of the process, starting from clinical sampling up to bioinformatics analysis, reporting and patient treatment. In addition to difficulties in obtaining a tumor biopsy and a sufficient quantity and quality of tumor DNA and RNA for molecular profiling, Massard et al. [14] reported that in less than half of 843 patients with advanced solid tumors, an actionable mutation was found. In the largest basket trial approach to date, the MATCH trial of the US National Cancer Institute (NCI), the restricted number of drug arms resulted in even fewer gene-drug matches. Only 9% of the patients could be assigned to a genetics-based treatment [15].
The development of more selective drugs over time is expected to increase these numbers. A further challenge is to translate a MTB suggestion into patient treatment. Beltran et al. [12] reported that although 94% of solid cancer patients in their MTB had an actionable alteration, only 5% were treated based on their genotype. The main reasons were rapid decline of condition and, more importantly, the lack of access to clinical trials or off-label drugs. Finally, the costs of molecular profiling can be challenging as well. Although it has been shown that panel sequencing is financially feasible [16], the costs of more comprehensive approaches such as WES, whole-genome sequencing (WGS) and RNA sequencing can be prohibitive for reimbursement. Nevertheless, it is to be expected that comprehensive sequencing will become cheaper, and therefore financially feasible.
The final outcome of cancer genotype-matched patient treatment, namely, patient response to treatment, varies widely in the published literature. Schwaederle et al. [11] report a partial response in 36% of patients, whereas the MOSCATO trial [14] reports objective responses in 11% of patients receiving matched treatments. The currently ongoing basket trials such as NCI MATCH, which aims to include 6000 patients, will provide more conclusive data, owing to larger cohort sizes and well-defined genotype-matched treatment arms. Nonetheless, a single-gene aberration is not always predictive of treatment response as has been observed for the oncogenic BRAF mutations, which predict BRAF inhibitor response in melanoma [17], but not necessarily in non-melanoma cancers [18]. Furthermore, molecular tumor approaches reported to date are based on profiling of single biopsies. Large-scale sequencing studies have shown extensive intra-patient heterogeneity between different metastases and even within individual tumors [3,4], indicating that this approach might not necessarily identify ubiquitous and also miss relevant alterations.
In this review, we discuss bioinformatics approaches to NGS-based precision oncology, including variant calling, annotation, interpretation, drug matching and reporting in a MTB setting. We have set up a bioinformatics analysis pipeline and reporting workflow for WES and WGS at the MTB of the University Hospital Zurich and will base this review on our experiences with this ongoing effort. For guidelines on the analysis of NGS-based oncology panels, please refer to [19].

Requirements on bioinformatics solutions for clinical oncology
High-throughput NGS allows for time-and cost-effective molecular probing of tumors. However, the resulting sequencing data is challenging to analyze because of its large size and various confounding sources of variation, most notably amplification  and sequencing errors. Careful analysis of NGS data is particularly important in the context of MTBs, where treatment suggestions based on mutation calls may have dramatic effects, ranging from recovery to death of a patient. Therefore, strict standards with respect to several aspects described below need to be followed. First and foremost, experimental noise needs to be distinguished from true biological signals. Treatment decisions have to be based only on validated, real biological alterations and should not be misled by technical artifacts. Toward this end, appropriate computational data analysis pipelines have to be used that cover the entire process from primary analysis of the read data to clinical reporting. To understand the limitations of an implemented pipeline, it needs to be evaluated under defined conditions reflecting realistic use case conditions [20,21]. Pipelines need to be robust with respect to new sequencing data that may differ in some aspects from previously analyzed samples. In addition, mutation calls should be reported with a confidence estimate. Although some mutation callers report, for example, P-values or posterior probabilities, it remains a major challenge to provide a meaningful notion of confidence for the results of an entire pipeline. This is particularly important, as the overlap of different approaches is often limited, as mentioned in [22][23][24][25].
The results produced by a bioinformatics pipeline have to be reproducible. This requirement entails several technical prerequisites discussed below and includes controlling random seeds for all steps that involve randomization. Another important aspect of reproducibility is a rigorous documentation of each step of the pipeline, including complete documentation of the used tools, their version and parameter settings. This also holds for databases and ensures complete transparency [20]. For instance, in the past, most genomic studies have used as a reference genome GRCh37 from the Genome Reference Consortium or its equivalent from the University of California Santa Cruz, version hg19. Even though there are only minor differences in their genetic information, the naming scheme is different, which can lead to confusion. Moreover, the new human genome assembly GRCh38 not only updated the main chromosomes, and therefore changed their coordinates, but also included new contigs to represent population haplotypes, further complicating reproducibility. Therefore, it is necessary that for each file used in the pipeline, its generation and dependencies are clearly described. Such a setup also guarantees the traceability of all results. For example, it should be possible to trace back the call of a treatment-critical mutation, to assess the call manually and to validate it before recommending the treatment. In addition, genomic alterations in the patient which are not directly linked to cancer, known as incidental variants, may be discovered. As these variants may be reported in various ways with potential ethical implications, a clear strategy needs to be defined, for example, reporting all relevant incidental findings [26].
In addition to these requirements on stability, robustness, reproducibility and traceability of the computational pipeline, the size, sensitivity and complexity of comprehensive clinical data sets combined with the urgency caused by the often critical state of the respective patient result in a set of challenging technical prerequisites for the computational infrastructure and the implemented data analysis software of an MTB.

Technical prerequisites
Medical data require secure data storage and distributed computing. Secure storage of sensitive data calls for restrictive authorization and authentication schemes that limit data access to those who hold valid credentials. These schemes have to be implemented and reviewed on a regular basis, in particular in a clinical setting in which data might have to be stored for many years. As data sets grow and the analysis becomes increasingly complex, the computation time of even single data sets outgrow the capacity of individual computers. Distributed computing, such as high-performance clusters or cloud engines, allows for efficient execution of data analysis workflows. The drawback is that these instances do not natively comply with the strict security requirements of medical data, as resources are shared among users with and without sufficient permissions.
To address the strong requirement for speed, accuracy and reproducibility, the use of a workflow manager can help with standardization and automation of the analysis. Multiple workflow managers are available such as Snakemake [27], Nextflow [28], Toil [29], Bpipe [30] and to some extent also the Galaxy framework [31]. Although they differ in features such as cluster support and programming language, they have all been implemented with the same rationale: the scientist defines the order, the parameters and the input data for a chain of tools, and the workflow manager takes care of the correct execution and documentation of the intermediate steps.

Primary analysis of DNA data
The primary analysis of genomic data sets typically starts with the raw sequencing data and finishes with a list of mutations. The different steps of this analysis are conducted in complex pipelines that differ according to the sequencing method used. Even for the same type of sequencing method, many pipelines are available and it has been observed repeatedly that the results can be different [24,25,[32][33][34][35]. The primary analysis can be subdivided into (i) raw sequencing file processing, (ii) read mapping, (iii) alignment post-processing and (iv) variant calling ( Figure 2). These steps are implemented to different extents in most pipelines. In the following, we will describe each of them briefly.

Raw sequencing file processing
The genomic sequencing data are provided in the form of reads, amplified DNA sequences of tens to hundreds of base pairs, in so-called FASTQ files. In addition to the sequencing information, for each nucleotide, the FASTQ file contains quality scores provided by the sequencing machine. These quantities represent the probability of the reported nucleotide to be a sequencing error, as estimated by the sequencer. Quality scores can be used to trim reads such that the FASTQ files only contain highconfidence nucleotides, and the number of false positive calls owing to sequencing errors is kept at a minimum [36]. Another source of artifacts are sequencing adapters. Adapters are short nucleotide sequences attached to the genomic DNA fragment and used for amplification and sequencing. Sometimes these adapters are contained within the nucleotide sequence of a read and may lead to false-positive mutation calls. Therefore, many pipelines include tools such as Cutadapt [37], Trimmomatic [38], SeqPurge [39] or Flexbar [40] to remove lowquality bases and artifacts in the raw sequencing data.

Read mapping
Owing to the sequencing protocols, the reads do not contain any information about their origin in the genome. This information is inferred by using read mappers, which align, or map, all reads to a given reference sequence. The importance of this time-critical step has led to the development of >60 different read mappers [41], with BWA [42] and Bowtie2 [43] being popular examples. They usually provide their results in Sequence Alignment/Mapping format (SAM, binary version BAM) files, which undergo different modifications during the alignment post-processing step.

Alignment post-processing
This phase typically starts with sorting the SAM/BAM files according to their genomic coordinates. Afterward polymerase chain reaction (PCR) duplicates are often removed using, for example, picard tools (http://broadinstitute.github.io/picard) or SAM tools [44]. These duplicates are copies of the same genomic fragment and indicate selective PCR amplification which can bias the analysis. However, duplicated reads can also be biological copies originating from the same genomic location of chromosomes of different cells. The probability of a duplicate read to be a biological copy increases with coverage [45], such that this step is typically not performed for deep-coverage targeted sequencing approaches.
Another post-processing step is the re-alignment of reads around indels. As read mappers rely on heuristics to deal with the large amount of data, the resulting alignments can be suboptimal. This is especially true for sites harboring indels because here the difference between the reference genome and the patient reads is more pronounced. To reduce this bias, many pipelines perform re-alignments around these positions, for example, using the Genome Analysis Toolkit (GATK) [46][47][48]. For Illumina data, GATK also provides a tool to correct for biases in the sequencing process, which uses a machine learning approach to re-compute the quality scores of the nucleotides. The use of the re-alignment and quality score recalibration is generally recommended [47,49], but they are not always performed in practice, as they are time-intensive and the impact is sometimes not obvious [50,51].

Variant calling
Variant calling in the context of oncology refers to the identification of somatic variants in the cancer genome. These variants have occurred during the development of the tumor and they need to be separated from germ line variants of the patient. Targeted cancer therapy aims to selectively inhibit cells with specific somatic mutations, such as SNVs, indels and SVs. There are two conceptually different approaches to identify somatic variants, namely, (i) filtering for somatic variants using existing variant databases and (ii) using a normal control sample to distinguish somatic from germ line variants.
The first approach identifies variants in the genome by analyzing only the tumor sample, using tools such as VarScan2 [52], SiNVICT [53] or GATK HaplotypeCaller [46][47][48]. The identified mutations are then compared with existing databases, such as dbSNP [54,55], ExAC [56], ClinVar [57] or COSMIC [58], to assess whether a given variant has previously been reported as a germ line variant or a cancer-associated change in the genome. The major advantage of such approaches is independence from a control tissue sample, while major drawbacks are dependence on quality and completeness of the databases as well as limited sensitivity because low-frequency variants are difficult to distinguish from sequencing noise.
The second approach uses an additional non-cancerous sample from the same individual as a germ line control. This approach can further be subdivided into methods that (a) apply variant calling to the tumor and control sample independently (using tools of approach (i)) or (b) use the genomic information of the two samples jointly. Approaches in the first category subtract from the tumor sample all mutations in the control sample, i.e. the germ line variants. Methods of the second category directly call somatic mutations by comparing variants between tumor and control sample for each position, which increases the power for calling true mutations at a given false-positive rate [59]. The idea is to model the control and tumor sample jointly to transfer noise patterns learned from the control sample to handle confounding factors appropriately. The results of approaches in (b) are usually superior to results from approaches in category (a), especially with regard to specificity [60]. Examples are MuTect [61], Strelka [62], VarScan2 [52], JointSNVMix [60] and deepSNV [59,63].
For the identification of SVs, there are four commonly used techniques, namely, clustering, split-read mapping, contig assembly and statistical testing, as described in more detail in [22]. SV detection can be divided into CNV detection and identification of other SVs such as translocations and inversions. CNV calling is performed not only on WGS, but also on WES and even amplicon sequencing data. Numerous methods for CNV calling exist [64], including EXCAVATOR [65], BIC-seq2 [66] and CopywriteR [67]. In contrast, SVs like translocations and inversions are usually called based on WGS to determine the actual breakpoints of the genomic rearrangement. Popular methods include Pindel [68], SVDetect [69], Delly [70] and Lumpy [71]. As mentioned in [22], sensitive and specific SV calling remains a challenge, and choosing the appropriate approach greatly depends on the type of SV and NGS protocol features, such as the library size. For a more comprehensive review of CNV and SV calling, we refer to [22,64,72,73].

Primary analysis of RNA data
While variant calling is typically based on DNA data, differential expression analysis uses RNA sequencing data. Alignment and read pre-and post-processing are generally similar for DNA and RNA sequencing, with some key differences, for example, read mappers have to perform a special gapped alignment, because RNA reads sometimes do not continuously align to the reference sequence owing to splicing events, but map to different exons with large gaps in between. Popular RNA aligners are STAR [74] and TopHat [75].
In contrast to DNA alignments, the coverage of RNA alignments varies between regions in the genome owing to different gene expression levels. Thus, the coverage of RNA alignments can be used to infer gene expression levels after normalization with respect to total read count, gene length and possibly other confounding factors such as GC content. Here, commonly used tools include HTSeq [76] and featureCounts [77].
If matching control tissue is available, differential gene expression compared with normal can also be assessed, albeit with reduced statistical power owing to the lack of replicates. Typically, however, no adequate normal tissue is available. Popular tools for differential gene expression analysis include DESeq2 [78] and EdgeR [79,80], which model read counts directly, account for various sources of confounding and provide robust statistical procedures for parameter estimation.
An alternative, albeit imperfect, approach to detecting overor under-expressed genes is the comparison of tumor gene expression levels to publicly available data sets of suitable tumor or normal cohorts, such as TCGA (https://cancergenome. nih.gov/) or GTEx [81]. For example, Oberg et al. used 124 transcriptomes from various normal tissues as a reference data set in a pediatric hematology-oncology setting [5]. Batch effects have to be taken into account, when comparing separately generated RNA sequencing data sets. Multiple tools for batch effect removal are available, e.g. the R package SVA [82].
However, it remains a challenge to integrate transcriptome data in a clinical tumor board setting, where the task typically is to compare an individual tumor sample with a separate healthy reference or tumor cohort. Eventually, the goal is to use the RNA sequencing data in at least three ways: (1) to validate the expression of SNVs, CNVs or SVs, (2) to identify misregulated pathways that could potentially be targetable and (3) to determine the proportion of immune cell infiltration based on immune signatures. For each of these aims, different references might be necessary. As healthy tissue from individual cancer patients is not always available, public transcriptome databases may be used as a comparison. However, the transcriptional changes between healthy controls and cancer cells may be less revealing than a comparison with similar cohorts of cancer biopsies. For instance, different subtypes of melanoma (i.e. mucosal versus uveal or cutaneous) have some similarities, but differences might reveal informative vulnerabilities that could be targeted in a MTB setting. Lastly, the ability to infer tumor infiltration of immune cells based on RNA expression could be a powerful means to complement traditional immunohistochemistry approaches that are still relevant for predicting response to immunotherapies.

Variant annotation
The process of variant annotation aims at assembling as much relevant information as necessary to select or discard a given variant while at the same time keeping the amount of information that needs to be parsed manually as small as possible. Possible annotations range from basic attributes like affected gene, coding or noncoding, synonymous or nonsynonymous to complex classifications like clinical significance.
Clinical significance is the most relevant piece of information for a clinician about any variant. Typically, variants are categorized as pathogenic, likely pathogenic, of unknown significance, likely benign, or benign. However, the classification of specific variants is not consistent across available databases such as ClinVar [57], CIViC [83], COSMIC [58] and dbSNP [54,55]. For instance, algorithms such as SnpEff [84] categorize variants based on the predicted impact on protein function, whereas ClinVar [57] links particular variants to known functional or clinical features.
Additionally, the vast majority of detected variants have not yet been assigned a level of functional relevance or clinical significance. Thus, focussing only on variants annotated as (likely) pathogenic will often result in no variants at all being reported. This is unsatisfactory and potentially misleading. Annotation tools such as SnpEff [84] and ANNOVAR [85] can be applied to help extract interesting variants for the clinical report. Furthermore, a useful database for the identification of potentially deleterious SNVs is dbNSFP [86]. It contains predictions from a large set of functional prediction tools for all possible nonsynonymous SNVs and splice variants in the human genome. Among others, annotations include deleteriousness and affected protein domains. Both can be very useful for variant prioritization. For example, a deleterious variant ranks higher than a non-deleterious variant and a nonsynonymous coding variant within a protein domain ranks higher than a non-protein-truncating variant outside of a protein domain. For functional effect prediction of indels, PROVEAN [87] can be used. It predicts the functional effects of single and also multiple amino acid substitutions, in-frame insertions and deletions.
Another helpful annotation when it comes to variant prioritization is whether a variant affects a potential cancer driver gene. Information on genes that have been reported as driver genes can be obtained from the literature [88,89] and databases such as UniProt [90], IntOGen [91,92] and COSMIC [58].
With the goal of recommending drugs, it is useful to annotate genes with drugs that target them. Popular online resources to query drug-gene interactions are DGIdb [93,94], OncoKB [95] or CIViC [83]. It would be desirable to also annotate genes with indirectly interacting drugs, i.e. drugs that target proteins up-or downstream of the gene within the relevant pathway. Such annotation methods are currently being developed, e.g. [96], but no easy-to-use tool or API has yet been established.

Interpretation of molecular profiles and clinical reporting
Interpreting the clinical significance of genomic variants and transcriptional changes, i.e. the synthesis of all available information about an event and its relevance to clinical action [97] is a daunting and laborious task. It constitutes the bottleneck of the whole process from biopsy collection to reporting to the MTB [97] because it cannot be fully automated in a reliable way. Nevertheless, a properly curated list of evidence-based therapy recommendations forms the basis for the MTB to decide on the treatment of a patient. Thus, the ultimate goal of clinical reporting is to apply clinical interpretation to select relevant variants and to recommend targeted, personalized therapies [98].
The best case scenario for reporting is a single pathogenic mutation with an associated, clearly defined and clinically verified therapy, such as BRAF V600E and vemurafenib [17]. However, more often, several damaging mutations of unknown significance are identified and it is unclear which, if any, have functional or clinical relevance. This is especially true in the case of comprehensive sequencing. Consequently, the potentially long list of mutations and their associated drugs needs to be filtered automatically to obtain a relevant but manageable selection of drug-gene interactions that can then be further curated manually. Examples of such filters are exclusion of non-cancer drugs or of drugs with a nonsensical mode of action for their associated mutation, such as an inhibitor for a deleted gene.
For the report, each listed drug-gene association has to be assigned a level of confidence. In 2017, the Association for Molecular Pathology, the American College of Medical Genetics and Genomics, the American Society of Clinical Oncology and the College of American Pathologists have established four evidence levels based on professional guidelines as well as size and number of studies supporting a mutation and its associated drug [99]. While these categories may or may not fit to the local or national situation of a reporting facility, the adherence to a joint consensus is favorable, as it facilitates the comparison with other resources, like OncoKB [95] and PharmGKB [100], and also the longitudinal use of findings in the clinic.
As mentioned above, it is not unusual for variants to be assigned contradicting levels of clinical significance across and even within individual databases. Therefore, preparation of a meaningful tumor board report often needs to include a manual investigation of the associated literature to properly annotate and clinically interpret the identified variant. To determine the clinical actionability of a variant, one can consider, for example, the cell type content of the biopsy, the tissue-specificity of gene expression alterations and, when not using germ line controls, potential germ line variants. Alternatively, all findings, even contradictory ones, can be reported, thereby leaving the entire interpretation up to the MTB. However, it is questionable whether the latter approach is a practical solution given the often very short time frame that is available in the MTB to discuss particular cases. This trade-off between comprehensiveness and conciseness is a common theme in clinical reporting.

Molecular Tumor Board Zurich
In early 2015, we started the Molecular Tumor Board Zurich (MTBZ) to comprehensively profile and report on end-oftreatment line melanoma patients [101]. An important prerequisite for the success of this endeavor was to bridge the gap between the medical and technical disciplines and establish a common language to better understand the needs for efficient and effective reporting to the tumor board.
The goal of this project was to overcome certain shortcomings in the standard of care. We address these issues by (i) comprehensive sequencing, (ii) automated and comprehensive annotation, (iii) investigation beyond disease-specific therapies and (iv) identification of therapies with lacking or reduced efficacy. For patients without any traditional treatment options remaining, comprehensive profiling of the tumor might offer new treatment options. Therefore, we established a protocol based on WES and WGS of tumor and matched normal samples, specifically WES for SNV and small indel calling and low-pass WGS for CNV calling. In addition to the identification of somatic variants, WES allows us to provide more information potentially relevant to the clinician, namely, mutational burden and the patient's HLA type. We report the mutational burden of a tumor, which is especially useful for the decision on using immunotherapies, for instance, in the case of CTLA-4 blockade in melanoma [102]. Further, we put it into context by comparing it with the distribution of mutational load within publicly available samples from the same and other cancer types [103]. The HLA-I type of a patient, which can be inferred from WES data using, for example, OptiType [104], provides information on eligibility for certain cancer vaccination trials [105]. Another important difference to standard procedures is the implementation of an automated and comprehensive annotation pipeline querying multiple databases for clinical significance, finding clinical trial opportunities worldwide and putting observed variants into the context of large studies like TCGA using the cBio Cancer Genomics Portal [106]. The use of the latter is twofold: We can assess (i) whether a variant is typical for the cancer type which improves confidence, and (ii) whether a variant uncommon in the given type of cancer is commonly observed in another type of cancer and could explain why previous standard treatments had not been successful.
We group therapies associated with detected somatic mutations into (i) cancer-type-specific therapies, (ii) non-cancertype-specific therapies, (iii) investigational therapies and (iv) therapies potentially lacking benefit ( Figure 3). The first category represents all suggested therapies which have been approved for the given cancer type by the local regulatory body, i.e. Swissmedic. The second group consists of therapies that are approved but not for the cancer type under consideration. This group is especially relevant owing to the increasing understanding that the genomic profile of a tumor is a better predictor for response than the tissue of origin alone [107]. By limiting this group to approved drugs only, it constitutes a source of available options to clinicians in Switzerland, where health insurances often approve the use of off-label treatments. The third group contains therapies which are not approved, but have been shown to be effective in preclinical studies and are currently in clinical trials, either open or ongoing. Although this group is usually based on low or insufficient levels of evidence, owing to singleton studies or only pre-clinical evidence, it frequently contains references to open clinical trials that the patient might be eligible for.
The final group includes therapies for which the genetic profile might cause reduced efficacy. In the fast-moving process of understanding the efficacy of novel therapeutics and their range of effects on different targets, a single trial showing lack of efficacy may be sufficient to exclude a therapy. For example, in a patient with neuroendocrine carcinoma, paclitaxel was a candidate drug for non-cancer-type-specific therapies. However, a clinical phase II study [108] showed that high-dose paclitaxel lacked antitumor activity and displayed significant hematologic toxicity in patients with advanced neuroendocrine tumors. Therefore, paclitaxel was listed as potentially lacking benefit.
In a first pilot study, we analyzed tumor biopsies and matched germ line samples from five metastatic melanoma patients with progressive disease on standard treatment and produced reports within a clinically relevant time period of 4-12 weeks from tumor biopsy.
Briefly, we performed WES and WGS on tumor biopsy samples together with a blood sample as matched normal control. Based on the pipeline framework described in [109], we use Trimmomatic [38] to remove adapters and quality trim the raw read sequences. We apply BWA [42] for read mapping and subsequently remove PCR duplicates using picard tools (http:// broadinstitute.github.io/picard). Following the GATK best practices [46][47][48], we perform indel realignment and base recalibration previous to the variant calling. SNVs are called based on a combination of Mutect [61], Strelka [62] and VarScan2 [52] and further annotated based on various databases including dbSNP [54,55], COSMIC [58] and ClinVar [57], and functional annotation based on dbNSFP [86,110]. CNVs are called based on WGS, using BIC-seq2 [66]. All variants are compared against DGidb [93] to select the first set of candidates for possible targeted treatments based on reported drug-gene interactions. Candidate treatments are further prioritized, for instance based on the Swissmedic approval of the therapy, availability of clinical trials and treatment success in existing clinical studies. Finally, selected variants and respective treatment options are reported in the clinical report and discussed with the treating clinician.
In the five melanoma patients, we detected between 3 and 11 actionable aberrations per patient, most commonly in genes of the PI3K, cell cycle checkpoint and MAPK pathways. In two cases, the MTB recommended therapy based on our results: in one case, immunotherapy based on high mutational load, and in the other a chemotherapeutic drug based on a loss of a receptor activating the detoxification pathway of the drug. We observed a near-complete durable response in the first patient and a progression of disease in the second. The reasons for not following the report recommendations for the other patients were rapid decline of one patient's condition and treatment with a newly approved immunotherapy regimen in two others.
Together with our clinical collaborators, we were able to draft a set of best practices on what to include in the report. These best practices are also viable for other disciplines outside of oncology. First of all, the report should begin with a concise summary of the most important findings. In our report, we focus on mutational load, the state of genes commonly mutated in the specific cancer type, a therapy summary and HLA-I type ( Figure 3). Starting on page 2, the report should increase in depth such that the reader who would like to know more details can simply read on. Given the limited time to discuss a case in the MTB meeting, it is key that the most important facts can be grasped quickly from scanning the first page. Nevertheless, ideally, the report provides all information obtained from processing of the patient samples.
A selected list of clinical trial opportunities based on the molecular profile of the tumor are an important part of our report. Here, we refer to trials which are currently recruiting, thus offering a chance for the patient to get access to a potentially beneficial therapy, which might otherwise not be available. To allow the clinician to quickly assess the suitability of the trial, our report includes drug name, trial phase and title, as well as trial locations.
Given the rapid developments in molecular profiling technologies as well as in variant calling and annotation algorithms and databases, naturally, the MTBZ workflow is constant work in progress. In our most recent reports, for example, we started to incorporate transcriptomics data allowing us to detect upand downregulation of genes and transcripts, gene fusions, alternative splicing events, as well as expression status of somatic mutations.

Future directions
Bioinformatics workflows for the analysis and clinical interpretation of tumor molecular profiles have to various degrees been implemented in clinical trials and MTBs at specialized cancer centers and university hospitals worldwide. The initial results of these efforts are promising, but it has also become clear that exploiting the full potential of precision oncology faces many challenges.
One current bottleneck is efficient and precise annotation of variants. This step requires databases containing well-curated variants as well as their interactions with potential drugs. Text mining is a promising approach to accelerate and improve the process of not only curating variants across the globe, but also finding evidence in literature for interaction between drugs and genes as well as the effect of drug combinations [111]. Stronger proof for annotation in the form of globally curated variants and better literature evidence will ultimately speed up the process of interpreting results from molecular diagnostic testing, and thus overcome the bottleneck of precision oncology.
The rapid development of molecular profiling techniques will continue to provide new opportunities for precision oncology. For example, single-cell sequencing [112,113], which allows for processing the DNA of hundreds and the gene expression levels of thousands of cells independently at the same time, will lead to increasing sensitivity levels with respect to mutation identification and the detection to tumor subclones, both of which are likely to affect treatment outcome. Further, multiomics approaches will provide more insight into dysregulated pathways and increase the level of confidence in reporting an actionable variant when it can be confirmed by RNA, protein or epigenetic profiling. At the same time, multi-omics data will pose new bioinformatics challenges to integrate multiple data types and identify potentially efficacious treatments.
Moreover, powerful predictions of patient response to a personalized treatment strategy will come from functionally testing the suggested therapies on ex vivo tumor slices [114], in 2D or 3D cultures of the patient's tumor or in patient-derived xenograft models [115]. This approach, although still in its infancy, will provide another level of therapeutic decision support for the MTB by allowing for the exclusion or confirmation of therapeutic efficacy and choice of the most efficacious drug combinations.

Key Points
• Robust, reproducible, transparent and comprehensive bioinformatics pipelines are required for precision oncology, including molecular tumor boards and cancer basket trials.
• Variant calling, interpretation and annotation are at the core of improving cancer treatment by providing timely and reliable therapy recommendations.
• Clinical reporting of molecular findings is an important step that requires close interactions between bioinformaticians and clinicians.