-
PDF
- Split View
-
Views
-
Cite
Cite
Sen Zhao, Sigve Nakken, Daniel Vodak, Eivind Hovig, FuSViz—visualization and interpretation of structural variation using cancer genomics and transcriptomics data, Nucleic Acids Research, Volume 53, Issue 4, 28 February 2025, gkaf078, https://doi.org/10.1093/nar/gkaf078
- Share Icon Share
Abstract
Structural variation (SV) is a frequent category of genetic alterations important for understanding cancer genome evolution and revealing key cancer driver events. With the development of high-throughput sequencing technologies, the ability to detect SVs of various sizes and types has improved, at both the DNA and RNA levels. However, SV calls are still prone to a considerable fraction of false positives, which necessitates visual inspection and manual curation as part of the quality control process. Identification of reliable and recurrent SVs in larger cohorts lends strength to revealing the driving roles of SVs in cancer development and to the discovery of potential diagnostic and prognostic biomarkers. Here, we present FuSViz, an application for visualization, interpretation, and prioritization of SVs. The tool provides multiple data view approaches in a user-friendly interface, allowing the investigation of prevalence and recurrence of SVs and relevant partner genes in a sample cohort. It integrates SV calls from DNA and RNA sequencing datasets to comprehensively illustrate the biological impact of SVs on the implicated genes and associated genomic regions. The functionality of FuSViz is intended for interrogation of both recurrent and private SVs, effectively assisting with pathogenicity evaluation and biomarker discovery in cancer sequencing projects.

Introduction
Structural variation (SV) is a class of common genetic events in human cancers [1, 2], covering genomic rearrangements that amplify, delete, or reorder chromosomal material at scales from single genes to entire chromosomes. Appearing as junctions between two breakpoints in the genome, SVs include deletions (DELs), insertions (INSs), duplications (DUPs), inversions (INVs) and translocations (TRAs), each representing a different combination of gains, losses and rearrangements of DNA [3–5]. Deletions and duplications are referred to as copy number variations (CNVs), while insertions and translocations are balanced alteration forms with no net change in genome size. When both event breakpoints occur within genes, the INVs, DELs, and TRAs result in fusion genes, representing a hybrid sequence involving two originally independent genes [6, 7]. The transcription of the fusion gene into functional chimeric RNA manifests an important consequence of the SV event. Complicating this picture further, RNA polymerase read-through and trans-splicing mechanisms may give rise to fusion transcripts independently of structural rearrangements, leading to alternative gene products only occurring at the RNA transcription, processing and regulatory levels, and not as a result of aberrant DNA double-strand breakage [8–10].
SVs are ubiquitous in cancer, and they have an influence on tumor development, progression, and resistance to therapy [1, 11]. SV formation can occur in the context of genomic instability, in which numerous SVs are accumulated within a short period, resulting in extensive intratumor heterogeneity [12, 13]. The genomic burden of somatic SVs is estimated to be 3–10 times higher than that of somatic single nucleotide variants (SNVs) and short indels [1, 14]. Consequently, SVs have a profound impact on understanding cancer clonal evolution and uncovering the key cancer drivers in a given tumor [13, 14]. Their importance in molecular oncology is highlighted by multiple studies, with the common consequences including amplification or disruption of cancer-related genes, or repurposing of noncoding DNA regulatory elements leading to dysregulated gene expression [15, 16]. In particular, the formation of fusion genes/transcripts is a frequent biomarker and driver in certain cancer types. Notably, many of these fusion events are clinically actionable, being targets for multiple molecular inhibitors [6, 7, 16]. With the advent of high-throughput sequencing technologies, both DNA (e.g. WGS and WES) and RNA sequencing, methods for de novo detection of somatic SVs of various sizes and types have been advanced in the cancer genomics field. The rise of long-read sequencing that is able to generate reads in length between 5000 and 30000 bp (as opposed to the ∼150 bp that is typical in short read sequencing nowadays) benefits SV calling and allows researchers to identify complex SVs that may be difficult to detect with short reads [17, 18]. In spite of some shortcomings, e.g. low yield and low accuracy per read, it has been successful in the detection of novel structural variants [19–21]. Many computational algorithms and tools based on sequencing data have been developed to improve the sensitivity and specificity of SV detection over the years [7, 22–25]. However, even when combining short and long read sequencing technology, SV calling still suffers from false positives, primarily due to the intrinsically complex nature of SV formation. The underlying evidence for SV events is often blurred by common sequencing errors and alignment artifacts, especially within repetitive and homologous DNA regions [26, 27]. Manual inspection of SVs using read alignment and genomic annotation is henceforth crucial for identifying erroneous calls. Flexible data visualization tools are furthermore important for interpreting the functional impact of potential oncogenic SVs, and for gaining insight into expression patterns, dosage changes and repositioning of regulatory elements (e.g. enhancer hijacking) [15, 28, 29].
The visualization of SVs is considerably more challenging than that of SNVs and indels, because the investigated genome (i.e. a genome harboring SV) may be significantly different from the reference genome. Considering both design concepts and research goals, visualization methods can be divided into several categories: (i) linear genome browsers (e.g. Integrative Genomics Viewer - IGV and JBrowse2), for providing a coordinate system to display genomic deletions and amplifications [30, 31]; (ii) scatter plots, displaying the copy number of genomic segments across chromosomes [32]; (iii) circos plots, useful for viewing breakpoint junctions and large SVs (e.g. translocations) together with multiple layers of other aberrations [33, 34]; (iv) two-way plots, illustrating fusion genes/transcripts [35, 36]; (v) simple tables, showing a list of SVs together with any of their properties in text form [37]; and (vi) network views, presenting an interaction of genes due to SV events in a graph-based manner [38]. It is appropriate to apply different visualization modalities depending on the type of investigated SVs, but the necessary variable view functionality is not well supported within the current landscape of available tools [39]. Moreover, most SV visualization tools focus on a single sample or a few samples analyzed with an intention of reviewing sequence evidence of private SV events or allowing users to compare several genomes with each other (e.g. tumor and normal pairs). Few tools are intended for viewing SV distribution across samples and thus fail to address the prevalence and recurrence of SVs in a sample cohort.
Here, we present FuSViz, a web browser-based application for the visualization and interpretation of structural variants in cancer. It supports SVs called from both DNA and RNA sequencing data, allowing for combining evidence from both sources. FuSViz has been implemented with multiple visualization solutions and provides an interactive user interface for investigating both recurrent and private SVs and their partner genes in a cohort of samples. In addition, it allows for convenient quality control of individual samples by enabling visual inspection of sequence alignments. Importantly, this assists in the process of detecting tumorigenic events and biomarkers for precision oncology.
Materials and methods
Database resources
FuSViz is a web browser-based platform intended for analysis of structural variants in human cancer, and it is also capable of analyzing SVs detected in other species, for now limited to mouse. The platform integrates a genome assembly-specific data bundle (hg19 or hg38 for human; GRCm39 for mouse) with multiple annotation resources. Specifically, it includes (i) gene, transcript and exon annotation from Ensembl [40]; (ii) protein domains and motifs from InterPro [41]; (iii) a literature-mined dataset of tumor suppressor genes/proto-oncogenes from CancerMine [42]; (iv) gene symbols and synonymous names from HGNC [43]; and (v) drug target entries from the Open Targets Platform [44]. The utilized release version of each data resource is listed in the online documentation of FuSViz.
SV input data
The input of FuSViz is a tab-separated file with aggregated DNA and/or RNA sequencing SV calls from one or multiple samples (see an example at https://fusviz.s3.eu-north-1.amazonaws.com/RNA_SV_example.txt). We provide a dedicated script “SV_standard” (https://github.com/senzhaocode/SV_standard) that allows users to conveniently convert the output data format of multiple common SV callers (Manta, Svaba, Delly and Lumpy for DNA sequencing [23, 24, 45, 46]; Dragen, STAR-fusion, Arriba, FusionCatcher and deFuse for RNA sequencing [7, 22, 25, 47]) into the required FuSViz format. In addition, FuSViz accepts an input of aggregated short variants (SNVs and indels) in the MAF format, which is compatible with data sources used in e.g. the TCGA data portal (https://portal.gdc.cancer.gov/). The short variant profile dataset was obtained with the R package TCGAbiolinks (https://github.com/BioinformaticsFMRP/TCGAbiolinks), which accesses the Genomic Data Commons through its application programming interface (API). An example R script for retrieving a MAF file is available in the “inst/example” folder of the FuSViz source package. Providing short variant profiles is optional, but it is highly recommended for investigating the relationships between SVs and short variants.
Implementation
FuSViz is implemented using the Shiny web application framework in R. The front-end web user interface is built on the basis of the R package shinydashboard. For data processing, analysis and visualization, multiple R packages are utilized for enhanced interactivity, including the DT (https://rstudio.github.io/DT), BioCircos [34], wordcloud2 (https://github.com/Lchiffon/wordcloud2), visNetwork (https://github.com/datastorm-open/visNetwork) and Gviz [48]. A novel htmlwidget is developed for communication between R and igv.js [31]. Briefly, it creates R code binding to the igv.js library and renders functionality of igv.js within the FuSViz application, e.g. browser creation, genomic track load and removal, upload of alignment and annotation files, as well as the configuration of offline mode. For read alignment visualization, FuSViz offers two solutions: (i) direct upload of BAM/CRAM files via a local file system; (ii) remote access to hosted BAM/CRAM file via URL. The source code is available at https://github.com/senzhaocode/FuSViz. Docker/Podman and Singularity/Apptainer images are also provided for FuSViz deployment.
Offline mode
By default, the initialization of FuSViz requires an online environment. However, with some additional configuration, it is possible to run FuSViz on an offline system. Users will in this case need to manually upload predefined genome reference file and its index, as well as matching cytoband annotation (see the online manual for details).
Application content and usage
The FuSViz app is launched and initialized after importing a version of the genomic annotation resource (Fig. 1A). The dashboard consists of five functional modules: (i) a table overview; (ii) a circular module; (iii) a linear module; (iv) a two-way module; and (v) a network module.

Table overview of SV calls. (A) Data import and file upload interface. (B) A list of DNA SVs is displayed in an interactive table that allows custom searching, filtering and prioritization. If SV breakpoint falls within an intergenic region, it is labeled with “*” in the respective gene columns (“gene1” or “gene2”). (C) The prevalence of SV-related genes is shown in a word-cloud plot. Three alternatives for its display are available: square (default), circle, and cardioid. (D) The SV load (i.e. the number of SVs) per sample is plotted in a histogram with colored bars representing different SV types (i.e. DUP, DEL, INS, INV and TRA). (E) The correlation of the DNA and RNA SV loads with the short variant load (if available) is plotted pairwise after a log2 transformation. The number of SVs and short variants for selected samples is listed in a table overview. (F) Drug target information (drug type, chemical identifier, and drug name) for the SV-related genes.
Overview of SV data in a table
RNA and DNA SV data are displayed in two independent interactive tables using Data.Table.js. Users are allowed to search, sort, and filter the uploaded SV events (Fig. 1B). An interactive word-cloud plot is generated for illustrating the prevalence of SV-related genes (Fig. 1C). SV counts per sample are compared and shown in a histogram plot. Correlation of SV counts at the DNA and RNA levels, as well as correlation of SV and short variant counts, is displayed in a scatter plot (Fig. 1E).
Circular module
The R BioCircos package is applied to bind the BioCircos.js library to htmlwidget to display SVs as curves in the context of chromosomes arranged as arcs of a circle [34]. SVs from DNA and RNA sequencing data are shown in “DNA_SV_cirular_plot” and “RNA_SV_cirular_plot” panels, respectively. Both plots have the same track composition, in the order from the outermost to the innermost track: gene annotation, cytoband, short variant profile (optional) and SV links (Fig. 2A). SV events can be filtered out based on recurrence by adjusting the “Num of samples” slider bar to a cutoff value of N; i.e. only events reported in at least N samples will then be displayed. If a short variant profile is available, variants with non-silent outcome can be prioritized for analysis by using the “Mutation type” select box (Fig. 2C). A mouse pointer hovering over a mutation or an SV event prompts a pop-up window displaying the genomic coordinate, the frequency in the cohort (highlighted in brackets, Fig. 2D), and names/IDs of all samples that include the given event.

Circular overview of SV calls. (A) DNA SV links are displayed in a circular track together with cytoband and gene annotations. (B) SV events with high recurrence are highlighted according to the setting of the “Num of samples” slider bar (e.g. filtering out SVs with a support of <35 samples). After zooming in, hovering the mouse pointer over a link prompts a window that shows the recurrence of SV between TMPRSS2 (chr21:41508081) and ERG (chr21:38445621) in 44 samples. (C) A short variant profile track is added between cytoband annotation and the SV track by clicking the checkbox “Load mutation data.” By default, coding variants with non-silent consequences (see available tags within the “Mutation type” select box) are shown in the circular plot. (D) One mutation hotspot locus (chr17:49619063) in the SPOP gene is zoomed onto by double-clicking the corresponding hotspot dot, and shows one change (A>C) to be present in three samples. The SPOP gene harbors one SV event at the chr17:49652451 breakpoint.
Linear module
The linear visual panel is built on the basis of an embeddable interactive genome visualization JavaScript library igv.js [31]. By default, the IGV genome browser interface is automatically launched after choosing a genome assembly version on the "Home" page (Fig. 1A). In short, the linear module displays a genomic interval of a reference genome horizontally and various types of custom tracks as parallel lines. Visualizations of four track formats (bedpe, segment, bed and bedgraph) are implemented in FuSViz to illustrate SVs.
“bedpe” plot: Intrachromosomal SVs are shown as curves that link breakpoints. Sample support is displayed in a pop-up window after clicking on a curve (Fig. 3A).
“segment” plot (only for DNA SVs): This plot is limited to SVs that represent copy number aberrations (Fig. 3B).
“bed” and “bedgraph” plots: The “bed” track (in green) is used for visualizing the SV breakpoint distribution, whereas the “bedgraph” track (in blue) shows the frequency of recurrent breakpoints across samples—the number is displayed after clicking on a selected peak (Fig. 3A). Users have an option to upload custom annotation files in VCF, BED and GTF formats, which can enable exploration of additional SV patterns in the linear module.
Two-way module
The two-way view is designed for analysis of RNA-seq based gene/transcript fusion calls within a single window, where two distant genomic intervals involved in fusion events are shown together with gene annotation. Three functional plots are provided for visualization in different dimensions.
Overview_plot: All fusion events related to two partner genes and their recurrences in a sample cohort are plotted in the context of exon annotation of different transcript isoforms (UTR region: narrow box; coding region: wide box). Prevalent fusion events are prioritized by choosing relevant breakpoints and users are allowed to select the most relevant transcript isoform for a custom analysis (Fig. 4A). A vertical baseline can be added to the plot by clicking the “Rule line” box, which helps determine the exact position of breakpoints.
Sample_plot: A plot for displaying a private fusion event of a single sample, similar to the overview plot in the context of annotation tracks. In addition, upstream and downstream parts of a putative chimeric transcript are highlighted by gray boxes. Read support evidence, i.e. the number of split reads and discordant read pairs, is displayed in the bracket above the curve, which indicates the expression level of the fusion RNA (Fig. 4B).
Domain_plot: Biological consequence of a fusion transcript is predicted in the context of protein domain and motif annotations. Upstream and downstream partners are colored green and orange (Fig. 5C), respectively. In terms of multiple transcript isoforms per gene, the ones with domain and motif annotations are shown in bold font in the select boxes “TranscriptA” and “TranscriptB”, and canonical transcripts are highlighted with an underline. The arrow line indicates the transcription direction from upstream to downstream sequence. The outcome of the putative fusion protein is color-coded (red: out-of-frame; blue: in-frame; and black: unknown).
Network module
Functionality of the network module includes visualizing the interaction of SV genes, identifying gene interaction hubs, and highlighting any functional subgraphs in the network. Specifically, it is intended to (i) identify subgraphs that show multiple oncogenes/tumor suppressor genes clustered together due to SV events, (ii) illustrate how oncogenes are linked to each other through SV events, and (iii) show whether tumor suppressor genes are disrupted by SV events and the frequencies of these events in a sample cohort. To reduce background noise in a network, only SVs with at least one partner gene in a predefined cancer-related gene set are included in the analysis. The DNA SV network is an undirected graph (i.e. A–B and B–A links are equivalent), whereas the RNA SV network is directed, with edge arrows showing the transcription direction from upstream to downstream partners (Fig. 6A and B). Each node represents either a gene or an intergenic interval that harbors SV breakpoints, and each edge denotes SV events between the connected nodes. A loop shows an interaction of one node with itself, i.e. deletion or duplication of a gene. Node and edge sizes indicate the degree of connection (i.e. the number of distinct events) and the number of samples that contain SV with the given partner genes (Fig. 6A). In “DNA_SV_network_hub” and “RNA_SV_network_hub” panels, network centrality and hub scores are summarized in a table (Fig. 6A and B). Two index values, degree and score, represent the number of edges linked to a node and the number of samples with corresponding SV events. A few settings are available for customizing the display: gravitational constant, spring length, and central gravity (see the control panel of the network module for details).
Results
Case study 1: visualization and interpretation of SV calls with a combination of short variant profiles from WGS and RNA-seq prostate cancer data
Performance and utility of FuSViz are demonstrated using a WGS and RNA-seq prostate cancer dataset (see the input example in Fig. 1A) [49, 50]. SVs are listed in the overview table of “SV from RNA-seq” and “SV from DNA-seq” panels, with SV genes being highlighted with three different colors, representing oncogenes, tumor suppressor genes and cancer-related genes (indecisive evidence from literature), respectively. As expected, the common driver genes of prostate cancer (e.g. TMPRSS2, ERG, ETV1, and SLC45A3) are highly enriched (Fig. 1C). SV-hypermutated samples are identified in the distribution histogram of the SV load (Fig. 1D). Given the relatively low mutation load of short variants in prostate cancer, no significant relationship between short variants and SVs is observed in the example data (Fig. 1E). However, one outlier sample (TCGA-EJ-7782) has a high short variant load accompanied by a moderate SV load (Fig. 1F).
In the circular plot, the most prevalent and recurrent SV (TMPRSS2–ERG) is prioritized by controlling the “Num of samples” slide bar. A pop-up window with the names of samples harboring the TMPRSS2–ERG fusion is shown above the corresponding SV curve (Fig. 2B). SPOP, as the most commonly mutated gene, is overrepresented in the track of short variant profiles. In total, 39 samples contain 40 short variants that are distributed among six different loci within the gene, but there is no overlap between these 39 samples and the two samples that harbor SPOP-related SV events (Fig. 2C and D).
While the circular plot can rapidly identify recurrent SVs, a linear genome view is more advantageous when it comes to the exploration of SV breakpoints at a high resolution. By pressing the “Load and refresh DNA/RNA SV track in bedpe” and “Load and refresh DNA/RNA breakpoint in bed” buttons, SV breakpoints of TMPRSS2 and ERG are displayed as a scatter distribution at the DNA level. Low event recurrence is indicated by absence of peaks in the “DNA SV breakpoint freq” track (Fig. 3A). In contrast, the observed breakpoint coordinates in RNA-seq data suggest recurrence at a specific exon–intron boundary (Fig. 3A). Although most DNA breakpoints are randomly distributed within intronic regions, the RNA splicing mechanism makes the transcribed breakpoints align to splice sites, resulting in occasional RNA-level recurrence. The frequency of breakpoint combinations between TMPRSS2 and ERG is reported in a pop-up window after clicking on the TMPRSS2–ERG curve. Additional functional features in the example data are highlighted by loading the “SV track in seg” track, where a significant peak in the CNV-DUP track is observed at an upstream region of the AR gene, which exactly matches the genomic area of the GHOX166900 enhancer (see the dashed line box in Fig. 3B). The peak illustrates a highly recurrent duplication of GHOX166900 in the sample cohort. Note that the linear module has a limited capacity to characterize the features of interchromosomal translocation events. We recommend investigating their frequencies using the two-way module instead (Fig. 4A and B).

The linear genome browser visualizes SV calls. (A) A comparison of TMPRSS2–ERG fusion breakpoints at the DNA and RNA levels. The RNA and DNA SV events are plotted at the top and the bottom, respectively. Both event types have three tracks: breakpoints in bed format, breakpoint frequencies in bedgraph format, and SV distribution in bedpe format. (B) Enrichment analysis of the AR gene enhancer GHOX166900. From the top, the figure shows the gene annotation track, SV segment tracks (duplications and deletions), SV bedpe track, and the enhancer annotation bed track. A copy number duplication event of enhancer GHOX166900 is visible in the loaded cohort. The event is supported by a high density of curves in the bedpe track.

Visualization of fusion genes/transcripts in the two-way module. (A) From top to bottom, the overview plot shows all fusion events of the TMPRSS2–ERG gene pair in a cohort together with recurrence information, exon annotation of TMPRSS2 and ERG transcript isoforms, genomic coordinates of partner genes, and gene loci in a chromosome ideogram. By choosing 41507950 in the select box “Breakpoint A,” three fusion events (chr21:41507950–chr21:38423561, chr21:41507950–chr21:38445621, and chr21:41507950–chr21:38584945) are highlighted with corresponding occurrences of 3, 42 and 17. (B) The sample plot illustrates a specific fusion event (chr21:41507950–chr21:38445621) of the TMPRSS2–ERG gene pair for the TCGA-CH-5737 sample. A curve links the fused sequences (colored gray) of the two partner genes, and its “(63, 16)” label denotes the number of supporting split reads and discordant read pairs, respectively. As breakpoints have a variable consequence (e.g. “at exon boundary”, “within exon” or “within intron”) in the different available transcript annotations, users are recommended to choose the relevant transcript isoforms in the select boxes “GeneA transcript” and “GeneB transcript” (e.g. ENST00000679054 and ENST00000417133) for data interpretation.
Once prevalent SV genes are identified in circular and linear plots, it may be informative to see whether any interactions exist between them in the SV gene network module. For example, the TMPRSS2–ERG–SLC45A3–ETV1 subgraph in the DNA SV interaction network shows four hubs (TMPRSS2, ERG, SLC45A3, and ETV1) with degrees of 10, 5, 6, and 8 (Fig. 6A). They are highly interacting with each other and form a functionally compact module. Importantly, all of them are colored red and represent oncogenic features. A similar subgraph with the same hub composition recurs in the RNA SV interaction network (Fig. 6B), where it is a directed graph with the arrows denoting transcription direction from upstream partners (TMPRSS2 and SLC45A3) to downstream partners (ERG and ETV1). In comparison, the TP53 subgraph illustrates a less complex example, in which the hub TP53 is surrounded by gray (genes with unknown cancer relevance) and black (intergenic regions) nodes (Fig. 6A). DNA SVs involving TP53 may result in a disruption of this tumor suppressor gene and a loss of its function.
Case study 2: fusion RNA quality control, domain plot and read coverage visualization using brain tumor RNA-seq data
Raw RNA-seq SV calls from a diffuse leptomeningeal glioneuronal tumor sample are imported to FuSViz and shown in the “SV from RNA-seq” panel of the table view module [51]. By clicking on the row index number of the fusion HNRNPDL–BRAF (Fig. 5A), the FuSViz client switches to the linear module and visualizes SV breakpoints (chr4:83348343 and chr7:140482957) in an IGV view session with two split windows. After BAM file and its index are uploaded, read alignments are displayed in the alignment track of the IGV session. The aim of the SV quality control is to search for soft-clipped read sequences that exactly match the fusion sequences at breakpoints of the partner genes (Fig. 5B). As shown, split reads of the HNRNPDL–BRAF event pass the quality control, supporting this fusion as a true positive. The underlying SV data can be edited in the table view session when necessary (e.g. the gene symbols can be updated and breakpoint coordinates can be adjusted if the provisional ones prove inaccurate).

An example analysis of RNA fusion HNRNPDL–BRAF. (A) An overview of fusion candidates in the table. By clicking on the index number of the sixth row, the FuSViz client switches to the linear module and visualizes SV breakpoints of HNRNPDL–BRAF in an IGV session. (B) Quality control of fusion breakpoint sequences. After uploading an RNA-seq alignment file into the IGV session, the genome reference sequence view, together with associated gene annotations and read alignments, is split into two panels, in which central lines are aligned to breakpoints within HNRNPDL and BRAF. (C) In the domain plot panel, motif and domain annotations are shown together with the selected isoforms of the genes involved in a given event. Colored arrow lines denote different biological consequences of putative fusion proteins. (D) The read coverage plot shows loci of partner genes in a chromosome ideogram, the fusion events (an arrow line linking the 5th exon of HNRNPDL with the 10th exon of BRAF), read support values (140 split reads and 61 discordant read pairs), exon annotations of selected transcript isoforms for upstream and downstream partners, and RNA expression level measured in read counts.

An overview of the SV gene interaction network module. (A) A landscape of DNA SV interaction subgraphs. Two example subgraphs (one for the TP53 gene and one for the TMPRSS2–ERG–SLC45A3–ETV1 gene group) are highlighted with boxes in dashed lines. All nodes are marked by one of five different colors (red: oncogenes; blue: tumor suppressor genes; orange: cancer-related genes; gray: any other genes; and black: intergenic regions). The size of a node indicates an event frequency in the sample cohort and the thickness of an edge is determined by the number of samples containing SVs between the connected nodes. (B) A landscape of DNA SV interaction subgraphs. The TMPRSS2–ERG–SLC45A3–ETV1 subgraph example is highlighted with a box in dashed lines. An edge with an arrow indicates transcription direction from the upstream partner to the downstream partner. In the “DNA/RNA_SV_network_hub” tab, a table summarizes network centrality and hub score. The node column utilizes a three-color scheme (red: oncogenes; blue: tumor suppressor genes; and orange: other cancer-related genes). Columns labeled “degree” and “score” represent the number of edges linking to a node and the number of samples containing SV events associated with a given node, respectively.
The domain plot in the two-way module displays the biological consequence of a chimeric transcript in the context of protein domain and motif annotations (Fig. 5C). After prioritization of the relevant transcript isoforms, ENST00000295470 and ENST00000288602 are chosen to represent HNRNPDL and BRAF genes in the select boxes “TranscriptA” and “TranscriptB.” The 4th exon of HNRNPDL is shown to be fused with the 10th exon of BRAF and the putative fusion protein is determined to have an in-frame outcome (arrow line colored blue; Fig. 5C). Both the RPM domain of HNRNPDL and the kinase domain of BRAF are included completely in the fusion protein. In Fig. 5D, the read coverage plot displays the expression level of the two partner genes measured in read counts, as well as the read support of the fusion event (a red curved line marked by read support—140 split reads and 61 discordant read pairs, with the transcription direction of the fusion indicated by the arrow).
Discussion
FuSViz integrates multiple approaches to SV visualization. Compared to other relevant tools (Table 1), FuSViz shares many of the visualization strategies with JBrowser2 to view synteny and structural variation (e.g. linear, circular and two-way plots for fusions, with an ability to perform breakpoint inspection). Unlike JBrowser2, which is a generic genome browser tool designed for viewing genome assemblies, displaying population-level and interspecies variations and visualizing evolutionary relationships among genes and genomes, FuSViz focuses on the interpretation of human SVs in the context of cancer, with an additional emphasis on identifying recurrent events and the prevalence of alterations in SV-related cancer genes in a sample cohort. In comparison to the WashU Epigenome Browser that specializes in visualization and analysis of epigenomic datasets with multiple dimensions in the linear view (i.e. 1D: genomic feature; 2D: HiC data; and 3D: chromatin modeling) [52], FuSViz only visualizes SVs in a one-dimensional way, but with an integration of multiple view solutions. Moreover, FuSViz is equipped with several data resources that highlight the functional implications of SV events with regard to oncogenic or tumor suppressive roles, including protein domain and motif annotations, targets for cancer drug compounds, and literature-based classification of cancer-related genes. Although the view, session, and track features implemented in FuSViz are based on user interface concepts that were previously pioneered by IGV, Circos and Ribbon [31, 33, 53], they are all tied together in a single web browser application framework that allows researchers to conveniently navigate and switch between different views, combining the strengths of multiple solutions. In addition, FuSViz has a feature of a desktop tool that can operate in an environment without an internet connection or behind a firewall, which overcomes a general weakness of web browser-based applications when it comes to the analysis of sensitive human sequencing data.
Tool name . | Programming . | Input sample(s) . | SV table . | Linear plot . | Circle plot . | Two-way plot . | Network plot . | BAM support . | VCF support . | User interface . |
---|---|---|---|---|---|---|---|---|---|---|
Ribbon | JavaScript | Single | Y | Y | – | Y* | – | Y | Y | Web app |
Svviz | Python | Single | – | Y | – | Y* | – | Y | Y | Web app/command line |
Samplot | Python | Single | Y | Y | – | Y | – | Y | Y | Web app/command line |
SVCurator | Python, JavaScript | Single | – | Y | – | Y | – | Y | Y | Web app |
Circos | Perl | Single | – | – | Y | – | – | – | – | Command line |
CIRCUS | R | Single | – | – | Y | – | – | Y | – | Library |
Loupe | JavaScript | Single | Y | Y | – | Y | – | – | – | Stand-alone |
SV-pop | Python, R | Multiple | – | – | – | – | – | Y | – | Stand-alone |
UCSC Xena | JavaScript | Multiple | Y | Y | – | – | – | – | – | Web app |
MAVIS | Python | Single | – | – | – | Y | – | Y | Y | Command line |
AGFusion | Python | Single | – | – | – | Y | – | – | – | Command line |
JBrowser2 | JavaScript, TypeScript | Single | Y | Y | Y | Y | – | Y | Y | Web app/desktop |
Chromscope | TypeScript, JavaScript | Single/ multiple | – | Y | Y | – | – | Y | Y | Web app |
WashU Epigenome | JavaScript, TypeScript | Single | – | Y | Y | – | – | Y | Y | Web app |
FuSViz | R, JavaScript | Single/ multiple | Y | Y | Y | Y | Y | Y | Y | Web browser app/command line |
Tool name . | Programming . | Input sample(s) . | SV table . | Linear plot . | Circle plot . | Two-way plot . | Network plot . | BAM support . | VCF support . | User interface . |
---|---|---|---|---|---|---|---|---|---|---|
Ribbon | JavaScript | Single | Y | Y | – | Y* | – | Y | Y | Web app |
Svviz | Python | Single | – | Y | – | Y* | – | Y | Y | Web app/command line |
Samplot | Python | Single | Y | Y | – | Y | – | Y | Y | Web app/command line |
SVCurator | Python, JavaScript | Single | – | Y | – | Y | – | Y | Y | Web app |
Circos | Perl | Single | – | – | Y | – | – | – | – | Command line |
CIRCUS | R | Single | – | – | Y | – | – | Y | – | Library |
Loupe | JavaScript | Single | Y | Y | – | Y | – | – | – | Stand-alone |
SV-pop | Python, R | Multiple | – | – | – | – | – | Y | – | Stand-alone |
UCSC Xena | JavaScript | Multiple | Y | Y | – | – | – | – | – | Web app |
MAVIS | Python | Single | – | – | – | Y | – | Y | Y | Command line |
AGFusion | Python | Single | – | – | – | Y | – | – | – | Command line |
JBrowser2 | JavaScript, TypeScript | Single | Y | Y | Y | Y | – | Y | Y | Web app/desktop |
Chromscope | TypeScript, JavaScript | Single/ multiple | – | Y | Y | – | – | Y | Y | Web app |
WashU Epigenome | JavaScript, TypeScript | Single | – | Y | Y | – | – | Y | Y | Web app |
FuSViz | R, JavaScript | Single/ multiple | Y | Y | Y | Y | Y | Y | Y | Web browser app/command line |
* Partial functionality is provided
Tool name . | Programming . | Input sample(s) . | SV table . | Linear plot . | Circle plot . | Two-way plot . | Network plot . | BAM support . | VCF support . | User interface . |
---|---|---|---|---|---|---|---|---|---|---|
Ribbon | JavaScript | Single | Y | Y | – | Y* | – | Y | Y | Web app |
Svviz | Python | Single | – | Y | – | Y* | – | Y | Y | Web app/command line |
Samplot | Python | Single | Y | Y | – | Y | – | Y | Y | Web app/command line |
SVCurator | Python, JavaScript | Single | – | Y | – | Y | – | Y | Y | Web app |
Circos | Perl | Single | – | – | Y | – | – | – | – | Command line |
CIRCUS | R | Single | – | – | Y | – | – | Y | – | Library |
Loupe | JavaScript | Single | Y | Y | – | Y | – | – | – | Stand-alone |
SV-pop | Python, R | Multiple | – | – | – | – | – | Y | – | Stand-alone |
UCSC Xena | JavaScript | Multiple | Y | Y | – | – | – | – | – | Web app |
MAVIS | Python | Single | – | – | – | Y | – | Y | Y | Command line |
AGFusion | Python | Single | – | – | – | Y | – | – | – | Command line |
JBrowser2 | JavaScript, TypeScript | Single | Y | Y | Y | Y | – | Y | Y | Web app/desktop |
Chromscope | TypeScript, JavaScript | Single/ multiple | – | Y | Y | – | – | Y | Y | Web app |
WashU Epigenome | JavaScript, TypeScript | Single | – | Y | Y | – | – | Y | Y | Web app |
FuSViz | R, JavaScript | Single/ multiple | Y | Y | Y | Y | Y | Y | Y | Web browser app/command line |
Tool name . | Programming . | Input sample(s) . | SV table . | Linear plot . | Circle plot . | Two-way plot . | Network plot . | BAM support . | VCF support . | User interface . |
---|---|---|---|---|---|---|---|---|---|---|
Ribbon | JavaScript | Single | Y | Y | – | Y* | – | Y | Y | Web app |
Svviz | Python | Single | – | Y | – | Y* | – | Y | Y | Web app/command line |
Samplot | Python | Single | Y | Y | – | Y | – | Y | Y | Web app/command line |
SVCurator | Python, JavaScript | Single | – | Y | – | Y | – | Y | Y | Web app |
Circos | Perl | Single | – | – | Y | – | – | – | – | Command line |
CIRCUS | R | Single | – | – | Y | – | – | Y | – | Library |
Loupe | JavaScript | Single | Y | Y | – | Y | – | – | – | Stand-alone |
SV-pop | Python, R | Multiple | – | – | – | – | – | Y | – | Stand-alone |
UCSC Xena | JavaScript | Multiple | Y | Y | – | – | – | – | – | Web app |
MAVIS | Python | Single | – | – | – | Y | – | Y | Y | Command line |
AGFusion | Python | Single | – | – | – | Y | – | – | – | Command line |
JBrowser2 | JavaScript, TypeScript | Single | Y | Y | Y | Y | – | Y | Y | Web app/desktop |
Chromscope | TypeScript, JavaScript | Single/ multiple | – | Y | Y | – | – | Y | Y | Web app |
WashU Epigenome | JavaScript, TypeScript | Single | – | Y | Y | – | – | Y | Y | Web app |
FuSViz | R, JavaScript | Single/ multiple | Y | Y | Y | Y | Y | Y | Y | Web browser app/command line |
* Partial functionality is provided
FuSViz also serves as a manual reviewing tool for single sample analysis and offers SV quality control with screening of read alignments. In contrast to SVCurator and VIPER, which use the IGV desktop for checking of SVs and filtering of false positives manually via a local file system [26, 54], FuSViz utilizes igv.js for this purpose, with a particular strength of supporting various data sources (e.g. the standard file formats FASTA, BAM, CRAM, VCF and GTF) either hosted on web servers or stored on a local file system. The web browser application interacts with the embeddable igv.js by means of an API, which includes functions to configure the initial view, define the track type, navigate specific loci, listen for events, and set visual properties such as the color, height and feature layout in a more customized way. Chromoscope [55], a stand-alone interactive tool for analyzing structural variants in multiscale and multiform visualization solutions, offers many features similar to those of FuSViz (e.g. its circular and linear plots display multiple layers of functional tracks supporting different data formats). However, Chromoscope loads data sources only through a JSON format configuration that specifies URLs of files hosted in a public cloud or on a local server, not supporting data access via a local file system, which results in limited functionality in an offline setting. In terms of RNA or DNA fusion analysis of a single sample, FuSViz illustrates a fusion event and its partner genes with a diagram of exon–intron structures, and predicts the outcome of the event while considering the sequence contexts of selected transcripts in which the breakpoints occur (e.g. 5-UTR, coding, 3-UTR, or intronic region). This is similar to the functionality offered by MAVIS and AGFusion that plot fusion domain using a command line mode [35, 56]. In contrast, FuSViz performs this operation in a more interactive manner. Specifically, transcript isoforms with domain/motif annotations are visually highlighted, which allows for informed decisions when selecting the most relevant isoform for the translation prediction of fusion RNA. Furthermore, FuSViz enables users to illustrate multiple breakpoint pairs and fusion events that are related to two partner genes simultaneously. In the fusion protein domain plot of FuSViz, the motif and domain annotations are demonstrated in two separate tracks. If a domain is only partially included in the chimeric protein, it is informative to see whether any of the key motifs are fully retained in the fusion product. The domains/motifs involved in a putative fusion protein are shown together with the domains that are excluded from the fusion product, as both functional loss and gain due to a fusion event are of importance. Additionally, FuSViz provides a command line solution to extend the functionality of Arriba for visualizing the read coverage of partner genes not only from RNA-seq alignments, but also from DNA sequencing alignments, enabling a more customized selection of relevant transcript isoforms.
FuSViz flexibly allows users to supply data originating from different SV callers through a standardization script, which collects SV events per sample and then aggregates them into a sample cohort. While most SV callers output their results in the VCF format, the genotype (i.e. GT field) is not utilized by FuSViz, because not all SV categories have accurate genotype information available in raw calls. For example, genotypes of insertions and inversions show less accuracy when compared to those of duplications and deletions. Notably, most tools are not able to genotype translocations. Although FuSViz was designed for visualization and interpretation of aggregated SVs from short-read sequencing data, it can work with SV calls from long-read sequencing data as well (e.g. Nanopore or PacBio technology), as long as the input is appropriately formatted. Unlike calls from DNA sequencing data, raw SVs called from RNA-seq data are often formatted in various ways, with no particular established standard. A few common features (e.g. gene name, breakpoint coordinates, read support, and fusion strand direction) are nevertheless shared by the output of most callers, as they represent a basic requirement for data interpretation. Pending user requests that fit the scope of FuSViz functionality, support for additional input evidence features might be provided in future releases.
To our knowledge, there are very few visualization tools available for combining and jointly analyzing SV calls from both DNA and RNA sequencing data. In the case of SVs in genic regions, a fusion gene is formed and transcribed to chimeric RNA, which is well captured by RNA-seq data if expressed. Due to the RNA splicing mechanism, most observed breakpoints of fusion RNAs are aligned to well-known exon boundaries. A prevalence of certain types of breakpoints or breakpoint combinations in a cohort may be biologically and clinically relevant [56]. A few fusion RNAs mediated by an RNA regulatory mechanism (e.g. RNA polymerase read-through or RNA trans-splice event) are exclusively detected at the RNA level without any evidence of DNA strand breakdown. These can be occasionally observed in benign and normal tissues as well, and do not always suggest a pathological consequence or tumorigenesis influence [8, 57]. The SVs that are uniquely presented at the RNA level can be distinct when compared to SVs detected from DNA level. In short, combining the features of SV calls at both DNA and RNA levels yields more confident results and enables a more detailed view of SV complexity in the context of genomic and transcriptomic annotations.
Acknowledgements
We acknowledge Services for Sensitive Data (TSD) at the University of Oslo for infrastructure support.
Author contributions: S.Z.: Conceptualization, Software, Data curation, Methodology, Formal analysis and Writing—original draft. S.N.: Software and Writing—review & editing. D.V.: Writing—review & editing. E.H.: Funding acquisition, Supervision and Writing—review & editing.
Funding
This work was supported by the regional health authorities for South-Eastern Norway through its project Infrastructure for Precision Diagnostics (InPred). S.N. was supported by the Research Council of Norway through its Center of Excellence funding scheme (grant number 262652). Funding to pay the Open Access publication charges for this article was provided by Oslo University Hospital.
Conflict of interest
The authors have no conflicts of interest.
Data availability
The source code for FuSViz and ready-to-use Docker/Singularity images are publicly available on GitHub (https://github.com/senzhaocode/FuSViz) and Zenodo (https://zenodo.org/records/14748124). User documentation is provided at https://fusviz-docs.readthedocs.io/en/master/index.html. Online tutorial and demo videos are accessed at https://www.youtube.com/watch?v=GAZInLru4y4&list=PLSOEIThRcIXUnPedL9VZWxuOJy3vU9y6a. Our case studies and SV and short variant datasets based on the human hg38 reference genome are available at https://fusviz.s3.eu-north-1.amazonaws.com/RNA_SV_example.txt, https://fusviz.s3.eu-north-1.amazonaws.com/DNA_SV_example.txt, and https://fusviz.s3.eu-north-1.amazonaws.com/TCGA.PRAD.mutect.somatic.maf.
Comments