TSIS: an R package to infer alternative splicing isoform switches for time-series data

Abstract Summary An alternative splicing isoform switch is where a pair of transcript isoforms reverse their relative expression abundances in response to external or internal stimuli. Although computational methods are available to study differential alternative splicing, few tools for detection of isoform switches exist and these are based on pairwise comparisons. Here, we provide the TSIS R package, which is the first tool for detecting significant transcript isoform switches in time-series data. The main steps of TSIS are to search for the isoform switch points in the time-series, characterize the switches and filter the results with user input parameters. All the functions are integrated into a Shiny App for ease of implementation of the analysis. Availability and implementation The TSIS package is available on GitHub: https://github.com/wyguo/TSIS.


Introduction
Regulation of gene expression by alternative splicing (AS) generates changes in abundance of different transcript isoforms. One particular splicing phenotype is isoform switching where the relative abundance of different isoforms of the same gene is reversed in different cell types or in response to stimuli. Isoform switches often play pivotal roles in re-programming of gene expression and isoform switches of functionally different transcript isoforms between normal and tumor tissues provide signatures for cancer diagnostics and prognostics (Sebestyen et al., 2015).
There are limited tools designed for inference of isoform switches and currently there is no software available for detecting alternative splicing isoform switches for time-series data. Isoform switch detection tools, such as iso-kTSP (Sebestyen et al., 2015), spliceR (Vitting-Seerup et al., 2014) and SwitchSeq (Gonz alez-Porta and Brazma, 2014), only perform pairwise comparisons (Fig. 1a). Timeseries RNA-seq data greatly enhances the resolution of changes in expression and AS during development or in responses to external or internal cues. Identification of isoform switches in time-series data presents specific challenges in that (i) switch points can happen between any time-points, and (ii) the isoform pairs may undergo a number of switches during the time course (Fig. 1b). To detect and characterize temporal and complex isoform switches, we developed the time-series isoform switch (TSIS) R package, which incorporates score schemes from current methods and includes a number of new metrics which capture the characteristics of the isoform switches.

Methods and application
TSIS detects pairs of AS transcripts with one or more isoform switches and genes with multiple pairs of transcripts which show isoform switches. By defining five metrics of the isoform switch, the method comprehensively captures and describes the isoform switches occurring at different points in time-series data. TSIS Applications Note analysis can be carried out using command lines as well as through a graphic interface using a Shiny App (https://CRAN.R-project.org/ package¼shiny) where the analysis can be implemented easily.

Determine the switch points
We have offered two approaches to search for the switch points in TSIS. The first approach takes the average expression values of the replicates for each time-point for each isoform and searches for the cross points. The second approach uses natural spline curves to fit the time-series data for each transcript isoform using the R package 'splines' (version 3.3.2) and finds cross points of the fitted curves for each pair of isoforms. The spline method is useful to find global trends of time-series data when the data is noisy. However, it may lack details of isoform switches in the local region. It is recommended that users use both average and spline methods to search for the switch points and examine manually when inconsistent results were produced by the above two methods.

Define the switch metrics
The intersection points determined in Section 2.1 divide the timeseries frame into intervals and each switch point is flanked by an interval before the switch and after the switch (Fig. 1b). We define the switch of two isoforms iso i and iso j by (i) the switch point P i , (ii) time-points between switch points P iÀ1 and P i as interval I 1 before switch P i and (iii) time-points between switch points P i and P iþ1 as interval I 2 after the switch P i (Fig. 1b). Each isoform switch is described by five metrics. Metric 1: S 1 represents the probability of the abundance switch and is calculated as the sum of the frequencies of two possible scenarios that one isoform is more or less abundant than the other in the two intervals adjacent to a switch point, as used in iso-kTSP (Sebestyen et al., 2015).
S 1 ðiso i ; iso j jI 1 ; I 2 Þ ¼ jpðiso i > iso j jI 1 Þ þ pðiso i < iso j jI 2 Þ À 1j; Where pðiso i > iso j jI 1 Þ and pðiso i < iso j jI 2 Þ are the frequencies/ probabilities that the samples of one isoform is greater or less than in the other in corresponding intervals. Metric 2: S 2 is the sum of average abundance differences of the two isoforms in both intervals.
S 2 ðiso i ; iso j jI 1 ; I 2 Þ ¼ dðiso i ; iso j jI 1 Þ þ dðiso i ; iso j jI 2 Þ Where dðiso i ; iso j jI k Þ is the average difference of abundances between iso i and iso j in interval I k ; k ¼ 1; 2 defined as d iso i ; iso j jI k À Á ¼ 1 jI k j X mI k expðiso i js mI k ; I k Þ À expðiso j js mI k ; I k Þ jI k j is the number of samples in interval I k and expðiso i js mI k ; I k Þ is the expression of iso i of sample s mI k in interval I k . Metric 2 indicates the magnitude of the switch. Higher values mean larger changes in abundances before and after the switch. Metric 3 measures the significance of the differences between the isoform abundances before and after the switch using paired t-tests to generate P-values for each interval. Metric 4 is a measure of whether the effect of the switch is transient or long lived (reflecting the number of time-points in the flanking intervals). Metric 5: Isoforms with high negative correlations across the time-points may identify important regulation in alternative splicing. Thus we also calculated the Pearson correlation of two isoforms across the whole timeseries.

Filter and visualize the results
TSIS provides histograms that show the number of switches happening at each time-point as well as interactive visualizations of the isoform switch profiles (Fig. 1c, d). TSIS also allows regions of interest to be defined (Fig. 1d) or switches involving the most abundant isoforms or any predefined list of isoforms to be selected as outputs. Known IS in Arabidopsis circadian clock genes AT1G01060 (G2), AT5G37260 (G29) and AT3G09600 (G12) (Fig. 1c) (Filichkin et al., 2015;James et al., 2012aJames et al., , 2012b were successfully detected by TSIS. The example dataset (used in Fig.  1c, d) and details to run the tool are shown in the user manual on the Github page.