clevRvis: visualization techniques for clonal evolution

Abstract Background A thorough analysis of clonal evolution commonly requires integration of diverse sources of data (e.g., karyotyping, next-generation sequencing, and clinical information). Subsequent to actual reconstruction of clonal evolution, detailed analysis and interpretation of the results are essential. Often, however, only few tumor samples per patient are available. Thus, information on clonal development and therapy effect may be incomplete. Furthermore, analysis of biallelic events—considered of high relevance with respect to disease course—can commonly only be realized by time-consuming analysis of the raw results and even raw sequencing data. Results We developed clevRvis, an R/Bioconductor package providing an extensive set of visualization techniques for clonal evolution. In addition to common approaches for visualization, clevRvis offers a unique option for allele-aware representation: plaice plots. Biallelic events may be visualized and inspected at a glance. Analyzing 4 public datasets, we show that plaice plots help to gain new insights into tumor development and investigate hypotheses on disease progression and therapy resistance. In addition to a graphical user interface, automatic phylogeny-aware color coding of the plots, and an approach to explore alternative trees, clevRvis provides 2 algorithms for fully automatic time point interpolation and therapy effect estimation. Analyzing 2 public datasets, we show that both approaches allow for valid approximation of a tumor’s development in between measured time points. Conclusions clevRvis represents a novel option for user-friendly analysis of clonal evolution, contributing to gaining new insights into tumor development.


Background
For many types of cancer, determining their mutational profile is a crucial step, having an impact on diagnosis as well as treatment. In multiple myeloma (MM), for example, cytogenetic aberrations have been found to have significant impact on prognosis and are thus considered in the Revised International Staging System (R-ISS) for risk stratification [1,2]. Similarly, in myelodysplastic syndromes (MDS), patients are commonly stratified according to the Revised International Prognostic Scoring System (IPSS-R), evaluatingamong others -cytogenetic abnormalities [3]. Recently, it has been shown that the analysis of point mutations and small insertions/deletions (indels) even allows for identification of clinically relevant subgroups within low-risk MDS patients [4].
In addition to information on the bare presence or absence of variants, their development over time, the clonal evolution [5], is of major importance in several diseases, e.g. MDS, acute myeloid leukemia (AML) or Burkitt lymphoma (BL) [6,7,8]. For a thor-ough analysis of clonal evolution, all variants ranging from point mutations to aberrations affecting whole chromosomes should be taken into account. Commonly, various sources of data are considered to allow for valid detection of these variants, e.g. karyotyping, fluorescence in sito hybridization, microarrays like SNP-arrays or array-CGH (aCGH), next-generation sequencing (NGS) and Sanger sequencing. Integrating all of these data, clonal evolution may be reconstructed [7].
Studying clonal evolution in more detail, a diverse set of analyses can be performed: The model of clonal evolution (linear vs branched dependent vs branched independent; of note, neutral and punctuated evolution will be considered special cases of branched and linear evolution) [8,9] can be determined, correlation to blood parameters, therapy resistance and disease progression investigated [10] and patterns characterizing subgroups of patients explored. For example, by detailed analysis of clonal evolution, evidence was found that relapsing BL is associated with the presence of clones, featuring double-hit events in TP53 [8]. However, these analyses may be hampered by different aspects: 1) Variants are only detected at few time points, providing an incomplete representation of the disease course. 2) No information on a therapy's effect on clonal evolution is available. 3) Evaluation of bi-allelic events requires tedious manual work, analyzing raw sequencing data and detailed variant calling information.
To overcome these obstacles, we developed clevRvis -an R/Bioconductor package for clonal evolution in R, providing innovative visualization techniques. In addition to common R functions, clevRvis offers a web-based graphical user interface, allowing usage not just by computer scientists, but also physicians and biologists. Our approach contains fully automatic algorithms for interpolating additional time points as well as estimating therapy effect. Evaluating two real, publically available data sets from different disease entities, characterized by a high number of measured time points, we show that both estimation approaches generate valid results.
clevRvis generates three different types of plots: 1) shark plots (a graph-based representation of clonal evolution), 2) dolphin plots (a fish plot-like representation, optionally also considering interpolated time points and estimated therapy effect), 3) plaice plots (a novel type of plots allowing for detection of bi-allelic events at a glance). All plots generated by clevRvis are highly customizable. Following recommendations for graphical strategies in clonal evolution [11], we implemented an algorithm for phylogeny-aware color coding of the clones to obtain optimal visualization. In addition, alternative phylogenetic trees can be determined and explored interactively. By analysis of four public data sets, we show that visualization with clevRvis outperforms common alternative approaches. Additionally, we show the added value of plaice plots.

Data sets analyzed
We analyze four real data sets, containing detailed information on the clonal evolution of n = 31 patients. Table 1 provides an overview of the data sets and their main characteristics.
The first set covers data from 11 patients with MDS. Clonal evolution was reconstructed based on karyotyping, FISH, SNP-array, whole-exome sequencing (WES) and ultra-deep targeted nextgeneration sequencing (tNGS) data [6]. The data set is characterized by a high number of time points (up to 30) and a relatively low number of clones. All major models of clonal evolution are present. Six patients received supportive care only, while five additionally received lenalidomide, which is expected to impact clonal evolution. The data set serves a test case, exploring options for visualizing clonal evolution and validating our approaches for automatic time point interpolation and therapy effect estimation (information on how input data for the analysis with clevRvis was derived from the publication by da Silva-Coelho et al. [6] is available in Additional file 1, section 1.1).
The second set covers data from 2 patients with chronic lym-phatic leukemia (CLL). Clonal evolution was reconstructed based on FISH, WES and tNGS data [12]. The data set provides detailed information on follow-up -especially on therapies applied -for up to 12 years, as well as regular variant detection throughout the whole time of follow-up. Thereby, this data set serves a second test case for the analysis with clevRvis, considering time point interpolation and therapy effect estimation in a different disease entity and a different source of data (information on how input data for the analysis with clevRvis was derived from the publication by González-Rincón et al. [12] is available in Additional file 1, section 1.2). The third set covers data from 8 patients with myeloid neoplasia (MPN). Clonal evolution was reconstructed based on karyotyping, FISH, aCGH and tNGS data [7]. Presence of 1 to 5 time points and an increased number of clones (one case of branched dependent evolution can be specified more precisely as neutral evolution [9]), pose a challenge for visualization. Partly, no samples were collected towards the end of therapy, requiring interpolation of additional time points and estimating the effect of therapy (information on how input data for the analysis with clevRvis was derived from the publication by Sandmann et al. [7] is available in Additional file 1, section 1.3).
The fourth set covers data from 10 patients with BL. Clonal evolution was reconstructed based on FISH, SNP-array, WES, tNGS and Sanger sequencing data [8]. The data set is characterized by a low number of time points (1 for non-relapsing patients, 2 for relapsing patients). While therapy is known to have been applied, no data showing its effect on clonal evolution is available. Thus, circumstances require interpolation of time points and estimation of therapy effect for all patients. Additionally, samples feature a high number of clones (up to 17), which is expected to be a major challenge for visualization (information on how input data for the analysis with clevRvis was derived from the publication by Reutter et al. [8] is available in Additional file 1, section 1.4).
All data sets contain information on detected small variants (SNVs and indels) as well as large variants (structural variants SVs and copy number variants CNVs) for every patient. Analysis of clonal evolution always involves integration of various data sources and partly overlapping variants. Thereby, the detection of bi-allelic variants and their evaluation by plaice plots is key for all data sets. clevRvis clevRvis provides an extensive set of visualization techniques for clonal evolution. An overview of the analysis pipeline is provided in Figure 1 (screenshots of the software are available in Additional file 1, Figures S1-S8).
For the subsequent description of the analyses performed by clevRvis, we use the following definitions: ccf(c, t i ) is defined as  developing from normal cells are defined by children(0). ccf ′ (c, t i ) is defined as the difference in CCFs of a clone c at time point t i . Thus, ccf ′ (c, t i ) = ccf(c, t i ) -j∈children(c) ccf(j, t i ).

Validity check
clevRvis requires the upload of a CCF table, containing information on the CCF of every clone at every time point, optionally also including information on parental relations. In an interactive dialogue, the user can subsequently -if this information has not been uploaded -define the parental relations for every clone listed in the CCF table. Normal cells as well as clones different from the considered one may be selected. Upon submitting this information to generate the initial seaObject, a validity check is performed, adapting and extending the check performed by fishplot [13] (see Algorithm 1).

Algorithm 1 validity check
Summing up, 1) children-clones cannot exceed their parents, 2) clones developing from normal cells cannot add up to more than 100%, 3) a clone cannot reappear, 4) a parent-clone being thoroughly replaced by its children-clone(s) cannot reappear.

Exploring alternative trees
Alternative parental relations, resulting in alternative clonal evolution trees, can be explored interactively using clevRvis. Considering n clones, a parental relations vector of length n has to be defined. For every clone, 1 out of n options can be chosen (n -1 different clones + normal cells), which results in n n permutations. To optimize run-time, filtration of clearly invalid options is performed prior to clevRvis' thorough validity check (see Algorithm 2).
On the basis of the remaining filtered options, permutations are determined and a thorough validity check (see Algorithm 1) is performed (maximum: 20,000 permutations). All valid parental relations are reported and alternative trees can be explored subsequently.

Time point interpolation
The initially generated seaObject may be extended by additional interpolated time points. The general idea of estimating auxiliary time points to improve visualization of clonal evolution was first outlined by Reutter et al. [8]. However, this initial approach mainly focused on improving figures generated by fishplot [13], considering only the difference in CCFs at a later measured time point. Furthermore, it did not contain any algorithm for automatic time point interpolation. Evaluating different scenarios of clonal evolution, we implemented an improved, fully automatic approach in clevRvis (see Algorithm 3). The number of time points interpolated by clevRvis depends on the clonal evolution being analyzed and cannot be defined by a user. We differentiate between interpolating development of a tumor prior to the first measured time point, and interpolating development between two measured time points. By default, all interpolated time points are evenly distributed (a detailed description on how to implement skewed events is available in Additional file 1, section 1.6).
We assume that clones of the same nested level developed at approximately the same time. A higher nested level indicates development at a later time. For example, in a linear evolution with clone B developing from clone A, it is sensible to assume that clone A developed first and expanded. Over time, a cell of clone A acquired additional mutations, finally resulting in the formation of clone B.
Interpolating time points prior to t 1 , we focus on the difference in CCFs ccf ′ (c, t 1 ) for all clones present at t 1 (initial.clones). The number of interpolated time points is defined by the maximum nested level of the initially present clones. As the clone(s) with the highest nested level are assumed to have developed last, ccf ′ as well as ccf are set to zero for the first (and all subsequent) interpolated time points. The CCFs of the remaining initial clones are updated. The procedure is repeated with the second highest nested level etc. until only clones with nested level 0 remain.
Interpolating time points between t i and t i+1 , we assume linear development of CCF for all clones already present at t i (old.clones). For newly developing clones (new.clones), we determine their nested levels. The number of unique nested levels (new.unique.nested.levels) -1 defines the number of interpolated time points. We stick to our assumption that clones with a higher nested level developed at a later time point. Thus, newly developing clone(s) with the lowest nested level are considered first. Linear development between t i and t i+1 is assumed. For all clones with the second lowest nested level, CCF at the first interpolated time point is set to zero. Linear development is assumed for the remaining interpolated time points up to t i+1 . The procedure is repeated with the third lowest nested level etc. until only clones with the highest new.unique.nested.levels remain.
Of note, to generate smoother dolphin-and plaice plots in R, for all clones 0.1 is added to the (interpolated) time point prior to their first appearance.

Therapy effect estimation
If a therapy is applied, interpolating time points is assumed to be insufficient to approach the development of CCFs over time properly. In the absence of therapy, the overall tumor load is not expected to decrease. Any decrease in CCF differences over time is assumed to be caused by the expansion of superior children-clones. In the presence of therapy, however, we assume that any observed decrease in CCF differences is due to therapy. This general idea was first outlined by Reutter et al. [8] as well. In clevRvis, we implemented an updated, fully automatic approach for therapy effect estimation (see Algorithm 4). To estimate the effect of a therapy that has been applied between t i and t i+1 , we focus on the difference in CCFs ccf ′ (c, t i ) for all clones present at t i (old.clones). Assuming that no increase in tumor load is observed during therapy, the minimum of ccf ′ (c, t i ) and ccf ′ (c, t i+1 ) is determined. For all newly developing clones (new.clones), we assume that they developed after the application of therapy. Thus, CCFs are set to zero for the estimated time point (therapy.time.point).

Algorithm 4 therapy effect estimation
Interpolating additional time points -located between therapy.time.point and t i+1 -, we focus on newly developing clones. For all clones only present at t i+1 (new.final.clones), we recalculated the nested levels, ignoring clones already present at therapy.time.point. As an example, we consider three clones A, B and C that develop linearly. While A and B are already present at t i , clone C only emerges at t i+1 . Another clone D develops from normal cells and -just like clone C -only emerges at t i+1 . Nested levels of clones C and D are two and zero respectively. However, development of both clones is just 'one step' compared to the starting position at t i . Therefore, the recalculated nested levels, ignoring clones A and B, are zero for both C and D.
Sticking to our main assumption that clones with a higher (recalculated) nested level develop at a later time, additional time points are interpolated. The number of time points added is defined by the maximum recalculated nested level. Newly developing clone(s) with the lowest recalculated nested level are considered first. If a clone c has no children, linear development between therapy.time.point and t i+1 is assumed. Otherwise, we assume that the CCF for clone c at all interpolated time points is defined by ccf ′ (c, t i+1 ). For all clones with the second lowest recalculated nested level, CCF at the first interpolated time point is set to zero. Subsequently, CCFs for the remaining time points are interpolated considering either linear development or ccf ′ (c, t i+1 ). The procedure is repeated with the third lowest nested level etc. until only clones with the highest recalculated.nested.level remain.

Phylogeny-aware color coding
Based on an evaluation of graphical strategies for visualizing clonal evolution published by Krzywinski [11], we developed an approach for automatic phylogeny-aware color coding. Our approach sticks to the following rules: i. The higher the nested level of a clone, the darker the hue. ii. Clones of the same branch are colored by a similar hue. iii. Clones on two (or more) branches with a common ancestor (branched dependent evolution) are colored by similar, but diverging colors. iv. Clones on two (or more) branches without a common ancestor (branched independent evolution) are colored by colors of maximum difference.
The exact range of the color palette is dynamically determined based on the clonal evolution provided as input data (number of clones, maximum nested level). Thereby, options for differentiating between closely related clones are optimized. clevRvis supports a maximum of 25 independent clones, developing from normal cells, and an unlimited number of related clones.

Shark plots
Shark plots serve a basic, raw visualization of clonal evolution (see Figure 1). Using a classical graph approach, clones are represented by nodes, parental relations by edges. Thus, the phylogeny can be directly deduced from shark plots.
Optionally, shark plots can be extended to provide information on CCFs as well. Clones are additionally visualized next to the actual shark plot. The size of each clone is correlated with its CCF. Time points are plotted next to each other.
If shark plots are chosen to be plotted along with dolphin plots, both plots are connected interactively. By hovering over one of the clones, it is automatically highlighted in both, shark and dolphin plot.

Dolphin plots
An advanced visualization of clonal evolution is realized by dolphin plots, mainly corresponding to well-established fish plots. The development of each clone over time is displayed on the x-axis, the CCFs on the y-axis. Thereby, information on phylogeny, CCFs and time course characterizing a clonal evolution are jointly visualized in a single plot. Several basic options for customizing dolphin plots are available, e.g. switching between spline and polygon shape or separating independent clones. Additionally, a user may choose between standard centered visualization of clonal evolution and bottom layout. This causes the clones to develop as "lying" on the x-axis (similar to visualization of clonal evolution by timescape [14]). Automatically, the longest branch is chosen to be plotted on the bottom, while the remaining branches are added on top.
If a seaObject has been extended by additional interpolated time points and/or estimated therapy effect, both can be visualized by dolphin plots. Customizable labels can be added to distinguish between measured vs estimated time points.

Plaice plots
Plaice plots represent a derivative of dolphin plots, resp. fish plots, developed to improve visualization of bi-allelic events. Instead of one, we consider two "flatfish" (=plaice) that are mirrored above and below the y-axis.
Common clonal evolution is visualized only in the upper plot in bottom layout. Similar to dolphin plots, a user may choose between spline vs polygon shape and separating independent clones (recommended). The fraction of remaining healthy alleles is visualized in the lower plot. For this purpose, a mirrored presentation of clonal evolution in bottom layout is plotted. By default, clones in this part of the plot are not colored, representing a starting position of 100% healthy alleles.
As an example, we consider linear clonal evolution of 2 clones -A and B. Clone A is characterized by a mutation in TP53, clone B by a deletion 17p. The two variants are overlapping. If they affect different alleles, no healthy allele of TP53 remains in all cells belonging to clone B. Thus, a user may chose to color clone B in the lower plot to indicate a decrease in healthy alleles of TP53 as the CCF of clone B increases. The clone should, however, be colored in the hue of clone A. Thereby, the bi-allelic event leading to TP53 deficiency is linked to the clone that is originally characterized by a mutation in this gene. Instead, if both variants affect the same allele, one healthy copy of TP53 remains -independent of the CCF of clones A and B. Thus, no clone should be colored in the lower plot. In addition to bi-allelic events, variants affecting the only available X-or Y-chromosome in male subjects can equally be visualized using plaice plots. Detailed information on the recommend color coding of plaice plots is provided in Additional file 1, section 1.7.
Just like dolphin plots, plaice plots provide all options for visualizing data on additional interpolated time points (recommended) and estimated therapy effect.

Comparison to common approaches
Several approaches exist for visualizing clonal evolution. A majority of tools performing clonal evolution tree reconstruction, e.g. Phy-loWGS [15], Canopy [16] or TRaP [17], provide a basic graph-based visualization that is automatically generated when performing the  1 We consider a validity check to be able to detect errors in the logic of clonal evolution, e.g. the CCF of a children-clone exceeding the CCF of its parent-clone, or CCFs at one time point summing up to > 100%. 2 We consider phylogeny-aware color coding to indicate the degree of relatedness of two clones. This includes clones developing in a linear, branched dependent and branched independent manner.
analysis. These plots, however, are commonly not customizable and will, thus, not be further considered in this work. Additionally, tools like BubbleTree [18] and AbsCN-seq [19], estimating and visualizing tumor purity, ploidy and copy numbers, are not considered due to different scope. The tool MapScape [20] is not considered as visualization focuses on spatial clonal evolution, linking anatomical images to tumor samples A commonly used approach for visualizing clonal evolution by means of fish plots is the R package fishplot [13]. The R/Bioconductor package timescape [14] provides an alternative approach, visualizing clonal evolution by interactive fish plots linked to standard graphs. A detailed evaluation of both approaches in comparison to our novel approach clevRvis is performed.
We could not identify any tool visualizing bi-allelic events in clonal evolution for comparison with our plaice plot module. Therefore, representation by plaice plots is compared to dolphin/fish plots as well as manual evaluation of bi-allelic events.

Results
We apply clevRvis to four real, publically available data sets and compare performance of our approach to the commonly used R packages fishplot [13] and timescape [14]. Results for three exemplary samples are visualized in Figure 2. Detailed results, considering all 31 samples, are available in Additional file 1 (data set 1: Figures S11-S21; data set 2: Figures S22, S23; data set 3: Figures S24-S34; data set 4: Figures S35-S45). Main analysis features of all three algorithms are summed up in Table 2. All three approaches are able to generate fish plots for visualization of clonal evolution (called "dolphin plots" in clevRvis). Additionally, timescape and clevRvis generate graphs ("shark plots" in clevRvis), representing the underlying phylogeny. However, clevRvis is the only approach providing an option to visualize information on healthy alleles and their development over time in terms of clonal evolution ("plaice plots"). It can be observed that the three main models of clonal evolution -linear, branched dependent and branched independent -can generally be considered by all three tools. The only exception is branched independent evolution, which cannot be visualized using timescape ( Figure 2B; data set 1: patients UPN08, 09, 10, Figures S18-S20; data set 3: patient 4, Figures S27-S29). The tool mandatorily requires all clones to be present in a single tree. Additionally, timescape is not capable of visualizing clonal evolution, if only a single clone is present (data set 1: patient UPN04, Figure S14). Fishplot, on the contrary, struggles with visualizing data available at a single time point ( Figure 2C; data set 3: patients 1-3, Figures S24-S26; data set 4: patients 6-10, Figures S40-S45). Furthermore, the visualization with fishplot partly suggests a wrong starting point of the clone, despite correct definition of the input (Figure 2A and 2B; data set 1: patients UPN02, 03, 07, 09, 10, Figures S12, S13, S17, S19, S20; data set 4: patients 2 and 5, Figures S36, S39).
clevRvis is the only approach providing a graphical user interface, allowing for user-friendly analysis of clonal evolution. Plots can be easily customized, e.g. interactively moving labels of the clones along the x-and y-coordinates, or picking colors and transparency levels for the clones' borders from a wide palette. By default, all plots are interactive. When hovering over a clone, its CCF is displayed and the clone is highlighted -in case of shark and dolphin plots, which are interactively connected, in both plots. Moreover, clevRvis is the only tool providing advanced features, exceeding the basic visualization of clonal evolution. These include fully automatic algorithms for time point interpolation and therapy effect estimation. Phylogeny-aware color coding is implemented as well as an algorithm for exploring alternative trees.
A detailed description on the usage of clevRvis, including exemplary input files and executable examples, is provided along with the package (manuals and vignette). A tutorial, including a complete walk-through, is additionally provided in the shiny-app.

Time point interpolation and therapy effect estimation
clevRvis provides algorithms for approximating the development of clonal evolution in between two measured time points, by interpolating additional time points as well as estimating the effect of a therapy applied. To investigate performance of our algorithms, we consider data sets 1 and 2. Results considering 4 exemplary patients are summed up in Figure 3.
Patients in data set 1 are characterized by a high number of measured time points (up to 30). Analysis with clevRvis is performed twice: 1) evaluating all measured time points; 2) evaluating the first and last measured time point only. Six patients (UPN03, 04, 05, 06, 07 and 11) received supportive care only. Thus, development of clonal evolution is best approximated by time point interpolation. For patient UPN07 ( Figure 3A) it can be observed that despite a certain simplification in the development, clonal evolution estimated by clevRvis is highly comparable to the original course. Similar results can be observed for patient UPN08. As the patient was treated with lenalidomide, we compare clonal evolution based on 12 measured time points to the estimated development, based on only 2 measured time points + interpolated time points + estimated therapy effect ( Figure 3B).
For patient UPN02, also receiving treatment with lenalidomide, certain differences can be observed ( Figure 3C). The magenta clone can barely be observed in the development estimated by clevRvis (measured with CCF = 22% at time point t 53 ). However, as the clone is present with CCF < 1% at both t 0 and t 60 , it appears basically impossible to predict its unexpected rise and subsequent fall at an intervening time point. Despite this apparent difference, the estimated clonal evolution reflects the main characteristics of the true clonal development of this patient (results for all patients available in Additional file 1, Figures S11-S21).
Data set 2 contains information on 2 patients: patient 1 characterized by 15 and patient 2 characterized by 4 time points. Clonal evolution of patient 2, considering all available time points, is displayed in Figure 3D. González-Rincón et al. [12] report that the sample at t 408 was taken before treatment. Subsequently, the patient received treatment with FCR (fludarabin, cyclophosphamid, rituximab) followed by maintenance therapy with rituximab. The patient was reported to achieve complete response. However, clonal evolution only based on the measured time points does not reflect this response. From the results published, we estimate that FCR was given for roughly one year. Therefore, we estimate therapy effect for t 700 . The resulting plot in Figure 3E shows a considerable effect of therapy on clonal evolution, matching the described course of disease (results for both patients available in Additional file 1, Figures S22, S23).

Detecting bi-allelic events
clevRvis contains a novel plotting option for clonal evolutionplaice plots. These plots allow for identification of bi-allelic events. Considering data sets 1 to 4, we investigate applicability and added value of an analysis by plaice plots. Results considering 8 exemplary patients are summed up in Figure 4.
Patient UPN05 in data set 1 is characterized by branched dependent evolution, with a total of 7 clones. The first clone features, among others, a point mutation in BCOR. The gene is located on the X-chromosome. As the patient is male, the hemizygous variant leads to a loss of the only available copy of BCOR. In the plaice plot, the first and all subsequent clones are marked, indicating a missing healthy allele of BCOR ( Figure 4A) (plaice plots for all patients in data set 1 available in Additional file 1, Figures S11-S21).
Patient 1 in data set 2 features a splicing variant in TP53 ( Figure  4B; light blue clone). Subsequently, the patient acquires a deletion in chromosome 17 (17p13.1 del), overlapping TP53. The CNV is clustered in clone 2 (intermediate blue). As this event leads to a loss of the only available allele of TP53, clone 2 is marked in the plaice plot. Light blue -the color of clone 1, characterized by the initial variant in TP53 -is chosen for coloring. Additionally, clone 3 (dark blue) features deficient TBC1D4 (13q14.3 del + point mutation) and UBA1 (X-chromosomal variant in a male patient) (plaice plots for both patients in data set 2 available in Additional file 1, Figures S22, S23).
For patient UPN06 in data set 3, clonal evolution cannot be reconstructed uniquely ( Figure 4C). The patient features two variants affecting TP53: a point mutation (p.Val272Met) and a derivative chromosome 17 (der(17)t(13;17)(q21;p12)). Data does not allow for deciphering, which of the two variants developed first. Branched dependent (version 1) as well as linear (version 2) evolution can be reconstructed. In addition to the difference in clonal evolution model, the effect on TP53 differs considerably: in version 1, plaice plots show TP53 deficiency in 22-25% of the cells. In version 2, on the contrary, all cells contain at least one healthy copy of TP53 throughout the entire period of follow-up (plaice plots for all patients in data set 3 available in Additional file 1, Figures S24-S34).
Relapsing patients in data set 4 (patients 1 to 5) are -different from non-relapse patients -characterized by ≥ 2 variants affecting TP53. Analysis and visualization with plaice plots shows that patients 2, 3 and 5 are characterized by deficient TP53 in a majority of cells ( Figure 4D). The stem line of patient 1 is characterized by a point mutation in TP53 (p.Arg248Gln) and an overlapping CNV. The duplication affects the mutated allele of TP53, however, a ratio of 1:2 for healthy:mutated remains. Thus, no clone is marked in the lower plaice plot. For patient 4, a point mutation in TP53 (p.Arg248Gln) is detected in the second to last clone. A CNV below detection thresholds is assumed to be additionally present. However, data does not allow to decide on whether it is a deletion or duplication. Thus, it is unclear whether a healthy copy of TP53 remains (plaice plots for all patients in data set 4 available in Additional file 1, Figures S35-S45).

Discussion
clevRvis is an R/Bioconductor package, providing innovative visualization techniques for clonal evolution. The optimized, highly customizable implementation of established visualization approaches (shark plots, dolphin plots) is complemented by a unique alleleaware representation of clonal evolution, allowing for analysis of biallelic events at a glance: plaice plots. In addition, the tool contains fully automatic algorithms for time point interpolation and therapy effect estimation, phylogeny-aware color-coding, exploring alternative trees as well as a graphical user interface for intuitive usage not just by computer scientists, but also biologists and physicians. To our knowledge, only two alternative approaches for visualizing clonal evolution exist: fishplot [13] and timescape [14]. With respect to functionalities, our novel approach unites all options of currently available tools and provides a wide set of additional features. Analyzing four publically available data sets, it can be observed that plots generated with clevRvis allow for an improved visualization of clonal evolution, outperforming both fishplot and timescape. Furthermore, new insights into disease course can be gained and reasons explored for therapy failure and relapse.
As regards suitable input data, the analysis of clonal evolution faces a major challenge: commonly, the number of tumor samples available heavily depends on the tumor itself. For non-solid tumors, bone marrow or peripheral blood are commonly analyzed (e.g. [6], [8]). Taking into account a patient's burden, performing regular bone marrow biopsies is ethically difficult to justify, especially towards the end of a therapy that is expected to lead to remission. For solid tumors, e.g. brain tumors, collecting samples after therapy is practically impossible as long as no relapse is observed. Thus, valid approaches estimating the development of a tumor and its response to therapy are of high relevance.
The two algorithms for time point interpolation and therapy effect estimation, inspired by the general idea outlined by Reutter et al. [8], mark a central element of clevRvis. However, the assumptions, on which these algorithms are based, can be discussed.
Interpolating initial development of a tumor towards the first measured time point t 1 , we focus on the difference in CCFs (ccf ′ ). If ccf ′ of clone A (=stemline) is 20% at t 1 , we assume that it is also 20% at every interpolated initial time point. It is, of course, possible that clone A temporarily reaches values > 20% and is -towards t 1 -pushed away by clone B, resulting in a decrease of ccf ′ . On the contrary, it is also possible that clone A expands only slowly and values < 20% are observed prior to t 1 . As we could not find evidence for one of the two scenarios being generally more likely, we decided -as a compromise -to focus only on ccf ′ at t 1 , which may result in an underestimation in some cases and an overestimation in other cases.
In the absence of therapy, it appears sensible to assume that the overall tumor load never decreases. Interpolating development of a tumor between two measured time points with no therapy being applied, we therefore assume linear development of CCFs. While focusing on the difference in CCFs would be a valid alternative approach for clones showing an increase in ccf ′ , it can lead to a violation of our main assumption in case of decreasing ccf ′ (new, quickly expanding clones pushing away existing clones). As the overall tumor load is not expected to decrease at any (interpolated) time point, we decided to stick to linear development for all clones. New clones are assumed to develop successively, based on their nested level.
In the presence of therapy, we assume that every observable decrease in CCF is related to therapy. Focusing again on the difference in CCFs, the minimum ccf ′ of the measured time points prior (t i ) and after (t i+1 ) estimated therapy effect is considered. If a decrease in ccf ′ can be observed, we assume that it is caused by therapy. If an increase is observed, we assume that the clone is resistant to therapy and was able to expand after the end of therapy. Similar to interpolation of the initial development, it is possible that this approach overestimates therapy effect in some cases, while underestimating it in others. We consider our approach a compromise, approximating true development in a majority of cases.
Due to limited data being available, e.g. data set 4, we could not proof validity of our algorithms for all patients. However, evaluating two public data sets, we could show that our algorithms indeed provide valid solutions for interpolating time points and estimating therapy effect. Partly, they even revealed new insights into tumor development (data set 2, patient 2; Figure 3D vs 3E). While our analysis also showed that the extreme development of a clone, e.g. a considerable increase followed by subsequent decrease, in between two measured time points cannot be approximated, it remains questionable whether any algorithm would be capable of predicting this unexpected behaviour in the lack of sufficient data.
In addition to approximating a tumor's development, the analysis of clonal evolution on allele-level represents another central aspect of clevRvis. While bi-allelic events are considered of high relevance, their analysis commonly requires tedious manual inspection of the variants characterizing each clone. Considering CNVs, genetic expert knowledge is often required to decipher partly complex karyotypes. Additionally, it may be necessary to consider raw sequencing data. While specific clones featuring bi-allelic events could also be highlighted in a common fish plot, our newly developed plaice plots provide a unique option to 1) easily convey information on bi-allelic events and 2) link this information to characteristic clones by suitable color-coding. As shown in Figure 4B (data set 2, patient 1), this does not necessarily refer to the same clone. It may of course be argued that it is still necessary to manually define the clones to color in the lower plaice plot once. Subsequently, however, this information can be evaluated at a glance by physicians, biologists and even computer scientists.
Compared to dolphin plots, it may be considered a disadvantage that the actual clonal evolution is only displayed in the upper half of a plaice plot, making complex clonal evolution patterns potentially difficult to see. However, real examples of complex clonal evolution (e.g. data set 1, UPN10: 8 clones, dependent and independent branches, Additional file 1, Figure S20C vs S20D; data set 4, patient 5: 17 clones, linear development, Additional file 1, Figure S39D vs S39E) show that all clones can still be distinguished clearly. Simulated data, considering even more complex clonal evolution (100 clones and 100 time points; Additional file 1, section 2.5), show that all plot types implemented in clevRvis -including plaice plotsallow for visualization of a high number of clones as well as a high number of time points.

Conclusion
clevRvis provides an extensive set of visualization techniques for clonal evolution. Exceeding currently available approaches, clevRvis allows for approximating of a tumor's development in between measured time points as well as analyzing bi-allelic events. Our future work will include an extension of the clevRvis package, including approaches for considering multiple spatial locations per time point, as well as visualizing changes in gene expression and methlyation along common clonal evolution, based on a tumor's mutational profile.