Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation

Motivation: Gene expression profiling using RNA-seq is a powerful technique for screening RNA species’ landscapes and their dynamics in an unbiased way. While several advanced methods exist for differential expression analysis of RNA-seq data, proper tools to anal.yze RNA-seq time-course have not been proposed. Results: In this study, we use RNA-seq to measure gene expression during the early human T helper 17 (Th17) cell differentiation and T-cell activation (Th0). To quantify Th17-specific gene expression dynamics, we present a novel statistical methodology, DyNB, for analyzing time-course RNA-seq data. We use non-parametric Gaussian processes to model temporal correlation in gene expression and combine that with negative binomial likelihood for the count data. To account for experiment-specific biases in gene expression dynamics, such as differences in cell differentiation efficiencies, we propose a method to rescale the dynamics between replicated measurements. We develop an MCMC sampling method to make inference of differential expression dynamics between conditions. DyNB identifies several known and novel genes involved in Th17 differentiation. Analysis of differentiation efficiencies revealed consistent patterns in gene expression dynamics between different cultures. We use qRT-PCR to validate differential expression and differentiation efficiencies for selected genes. Comparison of the results with those obtained via traditional timepoint-wise analysis shows that time-course analysis together with time rescaling between cultures identifies differentially expressed genes which would not otherwise be detected. Availability: An implementation of the proposed computational methods will be available at http://research.ics.aalto.fi/csb/software/ Contact: tarmo.aijo@aalto.fi or harri.lahdesmaki@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online.


INTRODUCTION
A RNA-seq experiment provides a snapshot of RNA content within a cell population. The observed data is in a form of millions of short nucleotide sequences, which can be used to construct a de novo transcriptome or aligned against known reference genome and transcriptome. To quantify expressions of known genes, a common approach is to count the reads which are aligned to different genes. The discrete nature of count data led researchers to model the sequencing data using Poisson distribution (see e.g. Marioni et al. 2008). Recently, it has been shown that the Poisson distribution is insufficient for modeling sequencing data because it tends to underestimate the variance for highly expressed genes. An extension of the Poisson distribution, the negative binomial distribution, has gained popularity in modeling gene expression data from RNA-seq (or other sequencing-based count data) because it can account for this over-dispersion. Two commonly used approaches which use the negative binomial distribution to detect differential expression are DESeq (Anders and Huber, 2010) and edgeR (Robinson et al., 2010). Another method called baySeq uses an empirical Bayesian method to estimate the posterior probabilities that a gene is, or is not, differentially expressed (Hardcastle and Kelly, 2010).
Profiling gene expression over time provides information about the dynamical behavior of the genes. Storey et al. (2005) presented a method that can analyze time series microarray data in order to assess the differential expression from whole time series as opposed to the traditional methods, which analyze timepoints independently. More recently, Stegle et al. (2010) presented a methodology that uses Gaussian processes (GPs) to model gene expression over time and to identify the time intervals when each gene is differentially expressed. We have further extended the GP approach to quantify condition-specific differential expression among multiple time-course experiments ( € Aij€ o et al., 2012). These methodologies are not optimal for analyzing count data due to the different statistical characteristics and, to our knowledge, next-maSigPro (Conesa and Nueda, 2013) is the only methodology capable of taking into account the temporal dimension of RNA-seq time series. In addition, by taking into account temporal correlation makes it possible to carry out more detailed analysis of the observed dynamics, e.g. to quantify similarities and differences between the observed kinetics. To that end, GPs have been used for modeling temporally or spatially varying likelihood parameters in other fields, e.g. to model the rate parameter of the Poisson distribution temporally and the stochastic process that is produced is called as the Gaussian Cox process (Adams et al., 2009). Similar approaches have also been popular in geostatistics (Diggle et al., 1998).
Since the discovery of an interleukin 17 producing T-cell subset, this T helper 17 (Th17) cell lineage has been a focus of great research interest (Dong, 2008;Park et al., 2005). Th17 cells have been shown to play an important role in autoimmune diseases and inflammation. Recent studies have identified transcription factor genes Rorc and Stat3 as the key regulators of the early Th17 differentiation in murine (see a review in Ivanov et al., 2007). Na€ ıve human T cells are activated through the T-cell receptor (TCR) by CD3 and CD28 and Th17 cells are polarized from the activated T cells by exposing the cells to TGF-, IL-1 and IL-23. The goal of gene expression profiling in the early phase of Th17 differentiation is to gain insight into the process of differentiation by unraveling dependencies between key factors and to understand how the differentiation signal propagates through various pathways and gene regulatory networks. This knowledge could potentially prove useful in identifying biomarkers for immune-related diseases and in design of therapeutic interventions.
We present a methodology, DyNB that is built on the negative binomial likelihood and GPs. Non-parametric GP regression is used to model gene expression over time and the model inference is carried using the Bayesian reasoning. We demonstrate the applicability of DyNB by analyzing RNA-seq time-series datasets. We also show how DyNB can be used to study relative differentiation efficiencies between biological samples. The differentially expressed genes detected by DyNB as well as estimated differences in differentiation efficiencies for selected genes are validated using qRT-PCR.

GPs
The GP prior for functions is a collection of random variables such that distribution for any finite subset (index set) X is defined as where F represents the process at X; is the set of hyperparameters, m is the mean of the process, and K is the covariance matrix. In our application, the index set X of the random variables is time. We define the covariances between pairs of random variables as follows where kðÁ; ÁÞ is the squared exponential covariance function and = 1 ; 2 ð Þ T . The ði; jÞ th element in the matrix K is given by kðt i ; t j Þ.

A time-varying negative binomial distribution
Read count data are commonly modeled using the negative binomial distribution (Anders and Huber, 2010;Robinson et al., 2010) Y $ NBðr; pÞ; where r is a predefined number of failures and the probability of success is p. We will parameterize the negative binomial distribution with mean =EfYg=pr=ð1 À pÞ and variance 2 =VarfYg=pr=ð1 À pÞ 2 . Thus, we solve p as a function of and 2 as and similarly r hence we can write Y $ NB ; 2 À Á . We assume to have M replicates (j=1; . . .
Let us write the mean of the negative binomial distribution as a function of a random process F, i.e. Y $ NB g 1 ðfÞ; 2 À Á , where f is a realization of a GP. In the case of a GP, we define g 1 ðfðt i ÞÞ=fðt i Þ, where fðt i Þ is a value of the random process at the i-th timepoint. where and 2.3 A time-varying negative binomial distribution with time scaling We also consider a situation where we possess a priori knowledge that the biological replicates are differentiating in different time scales. In this study, we assume that the different time scales between biological replicates can be modeled as t j =t=k j .
where k=ðk 1 ; . . . ; k M Þ are the replicate-specific time-scaling factors. Often one wants to analyze time scaling with respect to one of the replicates, e.g., i-th replicate, which can be achieved by constraining k i =1. This also makes the model identifiable.
The statistical dependencies of the variables in our model are depicted in Supplementary Figure S1 using the plate notation.

Variance estimation and normalization
The variance for the negative binomial distribution is estimated using the approach described in Anders and Huber (2010), i.e. we model the variance as a function of the read count using a smooth function. The idea behind the variance estimation is that genes expressed in a similar level have a similar variance and sharing information between genes improves variance estimation (Anders and Huber, 2010). In other words, ðt i Þ 2 in Equations (7) and (8) are obtained from a polynomial of degree 2 giving a robust variance estimate as a function of the read count. The second order polynomial is fitted to the observed read counts and variances across timepoints and genes.
To account for different sequence depths of the samples (over all the replicates and timepoints), we make the read counts between different RNA-seq runs comparable by scaling factors, which are estimated using the procedure presented in Anders and Huber (2010). Instead of scaling the discrete read counts, the scaling is performed on the GPs samples f.

Bayesian inference for transcriptome dynamics
By using the likelihood in Equation (9), we can write the marginal likelihood where we have marginalized over the possible realizations of F. In this case, the integral in Equation (10) is not analytically tractable and we resort to Markov chain Monte Carlo (MCMC) methods. A common practice is to marginalize over all the parameters, which in our case include f, k and For the integration in Equation (11), we construct a Metropolis-Hasting algorithm presented in Algorithm 1.
Algorithm 1 A Metropolis-Hastings algorithm for posterior sampling of parameters ; k and f.
We assign the uniform prior distribution for the hyperparameter 2 , i.e. p ðÞ=U½0:5; 1, to favor smooth GP realizations and a symmetric prior p k ðÁÞ for time-scaling factors k j ; j=1; . . . ; M, which is centered around identity scaling (Fig. 5A). The parameter 1 is set empirically to account for large differences in gene expression counts y between low-and highexpressed genes (from 1 to approx. 5 Â 10 5 in our case). Thus, 1 is fixed to a gene-specific and data-dependent value 10 Stdevfyg. The GP prior per gene (p f ) is defined by the mean vector and covariance matrix, which is parameterized by the parameters 1 and 2 (which have a similar role as 1 and 2 ). Again, in defining the mean m and 1 , we should take into account the large range of different read count magnitudes; thus they are defined separately for each of the genes. The mean vector is defined as m= Maxfyg+Minfyg 2 1 and 1 =500 Maxfyg+Minfyg 2 and 2 =0:75. In our implementation, we use a truncated normal distribution as the proposal distribution q , where the last accepted sample ðiÞ 2 is the mean and the variance and the boundaries are predefined, i.e. N ½0:5;1 ð ðiÞ 2 ; 0:01 2 Þ. Our choice of the proposal distribution for the time-scaling factors k is an uniform distribution, where the probabilities for the three allowed transitions, i.e. +4 h, -4 h and 0 h, are 1/3. For the proposal distribution q f we use the GP prior whose mean is the last accepted sample f ðiÞ and the covariance matrix K is defined by the inputs X and the hyperparameters Ã .
Using the accepted samples f ðiÞ we estimate the posterior mean and (co)variance of the distribution using the standard sample estimators. The marginal likelihood is estimated using the harmonic mean of the likelihoods of the samples from the posterior distribution as presented in Newton and Raftery (1994), where the idea is to use the parameter posterior as the importance sampling function where f ðiÞ ; ðiÞ ; k ðiÞ $ pðf; ; kjyÞ: Another variable whose posterior distribution we are interested in is k whose posterior we also get directly from the MCMC chain. Moreover, the estimated marginal likelihoods are used for model selection purposes as we will see in next section. The convergence of the chains was assessed using the potential scale reduction factors as described in Gelman and Rubin (1992), and the results confirming the convergence are depicted in Supplementary Figure S2.

Quantification of differential dynamics
In this study we want to answer the question whether a gene is differentially expressed between different conditions, namely Th0 and Th17 lineages, and we assume to have replicated time series measurements from these two lineages. From now on we consider only two conditions but the same methodology can be easily generalized for any number of conditions (Hardcastle and Kelly, 2010;€ Aij€ o et al., 2012). The null model, M 0 , denotes that the Th0 and Th17 lineages behave similarly, which we implement by fitting a single DyNB model to Th0 and Th17 measurements. The alternative model, M 1 , denotes that the two lineages behave differently, which we implement by fitting one DyNB model to Th0 and another DyNB model to Th17. Assuming equal prior probabilities for both models, the evidence for the alternative model is quantified by the Bayes factor (BF) By following the BF interpretation chart described in Jeffreys (1998), a BF !10 should be thought as strong evidence for the model M 1 over the model M 0 . BFs were recently used for model selection purposes in the context of identifying alternative splicing events between biological samples (Katz et al., 2010).

Human-activated T and Th17 cells
CD4 + T cells were isolated from the umbilical cord blood collected from healthy neonates born in Turku University Hospital; Hospital District of Southwest Finland with approval from the Finnish Ethics Committee. CD4 + T cells were isolated from umbilical cord blood samples using Ficoll-Paque and anti-CD4 magnetic beads. For activating CD4 + T cells and inducing polarization of Th17 phenotype the cells were activated and stimulated as indicated in Figure 1A and as previously described (Tuomela et al., 2012). The polarization was confirmed as described by Tuomela et al. (2012). Strand-specific RNA-seq libraries were prepared from 2-5 mg of total RNA (Parkhomchuk et al., 2009), bar-coded and multiplexed (3 to 4 samples per lane) and 40-nt paired-end reads were obtained on an Illumina HiSeq2000. The gene expressions were profiled from Th0 and Th17 cells at the five timepoints, 0, 12, 24, 48 and 72 h with three biological replicates. The Ensembl gene models were used in the gene expression estimation.

Temporal modeling of RNA-seq data
Using the model described in Section 2, our first goal is to estimate a smooth representation of gene expression dynamics based on the measured read counts. Smoothness of expression dynamics is enforced by the GP prior, and agreement of expression dynamics with the read count data is quantified using the negative binomial likelihood. To avoid overfitting, the inference is done using the Bayesian analysis, and thus the final model fitting estimate is obtained by integrating over parameters using an MCMC sampling technique.
Applying the aforementioned methodology without the timescaling option to RNA-seq data, we estimated the smooth representations of the underlying gene expression in Th0 and Th17 lineages. The posterior means (solid curves) of the specific Th0 i115 and Th17 models (M 1 ) together with corresponding 95% CIs (shaded areas around means) for IL17A, IL17F and RORC are depicted in Figures 1B-D. For example, the cytokine IL17A is known to be highly expressed in Th17 cells and its expression is commonly used to assess the Th17 polarization efficiency (Brucklacher-Waldert et al., 2009). The strong induction of IL17A and IL17F in the Th17 differentiation is apparent by the data. Based on visual assessment, however, the induction of IL17A and IL17F behaves differently among the replicates.

Modeling of variable differentiation efficiency
To study variable differentiation efficiencies in IL17 genes in an unbiased manner, we repeated the analysis but now taking into account the possibility of different time scales between the replicates. The model with time scaling allows the samples to be decelerated/accelerated relatively to each other, so that their scaled behavior is similar. We fixed the time scale of the second sample and allowed the other two samples to be accelerated or decelerated independently of each other using the transformation t=k j . Another choice could have been a time shift, t+s j , which moves linearly the whole time series together with the start point. Because in our case the cells are activated and polarized exactly at the same time, we wished to keep the start point fixed across the samples. The transformation is illustrated in Figure 2A, where the axis in the center corresponds to the case without time scaling and the top and bottom axes correspond the cases of 32 and -32 h time differences at 72 h due to the time scaling (corresponding to k = 5/9 and k = 13/9), respectively. We constrained the effects of scaling to be discrete, i.e. from -32 to +32 h at the end of the time series (72 h) in 4 h steps. To demonstrate the methods applicability for estimating differentiation efficiencies, we carried out a simulation study. Using IL17A as a template profile, we generated two time series (2nd and 3rd replicate) with a similar behavior and third one (the first replicate) which is a delayed version of the two, i.e. the timepoint 72 h corresponds to 48 h. The method correctly inferred that the first replicate is delayed compared with the other two replicates as depicted in Figure 2B. Finally, the estimated posterior distributions of time differences depicted in Figure 2C demonstrated the method's accuracy in estimating differences in differentiation efficiency.
The results with time scaling for the marker genes IL17A, IL17F and RORC are depicted in Figures 3B-D, respectively. The effect of time scaling is visualized by transforming the measurements based on the time-scaling parameter posterior mean: e.g. IL17A is delayed over 24 h at 72 h in the first Th17 sample. As expected, uncertainty of the estimates, especially at the end of time series, increases due to the time scaling. For the marker genes IL17A and IL17F, however, we notice that the time scaling is able to improve the model fit. To validate our observation of different time scaling, we performed a kinetic assay of IL17A, IL17F and RORC mRNA levels throughout the early Th17 differentiation using qRT-PCR in the same biological samples as the RNA-seq ( Fig. 4; Supplementary Table S1). Note that because time scaling (i.e. differentiation efficiency) is a replicate-specific random effect we need to use the same samples for qRT-PCR validation. These confirmed our conclusions: expression of IL17F and IL17A was delayed in the first and third series, while expression of RORC behaved similarly across the samples.
Next we wanted to confirm the presence of different time scaling by studying the posterior distribution of the time-scaling genome wide by repeating the analysis for all expressed genes A B C D Fig. 1. Transcriptome dynamics of Th17 marker genes. (A) T helper precursor cells isolated from cord blood are activated using platebound CD3 and soluble CD28 in the presence of IFN-and IL-4 yielding the cells to follow the Th0 lineage. Th17 commitment is achieved by activation and polarization condition, including IL-6, IL-1 and TGF-. Cells were harvested at 0, 12, 24, 48, and 72 h. From the harvested cells the RNA was extracted and used for preparation an RNA-seq library. (B) The estimated smooth representation of IL17A dynamics without time scaling. The read counts are on the y-axis. Circles and diamonds mark the measurements from Th0 and Th17 cells, respectively, and the replicates are distinguish with different colors. The solid curves are the posterior means of the specific Th0 and Th17 models (M 1 ) with corresponding 95% CIs (shaded areas around means). (C and D) Same as (B), but the depicted results are for the IL17F and RORC genes i116 (i.e. at least one read in Th0 and Th17 samples). To detect differentially expressed genes between the Th0 and Th17 lineages, we used the following criteria: (i) BF410, i.e. strong evidence for M 1 over M 0 , and (ii) fold-change42 in at least one timepoint. These criteria gave us 698 differentially expressed genes. Then we studied how presence and absence of estimated time-scaling parameters differ between the Th0 and Th17 lineages for each of the differentially expressed genes. The results are depicted as 2D histograms in Figure 5A where the first (top panel) and third replicate (bottom panel) are analyzed separately. In both replicates, there are many genes with no time scaling effect, and thus they behave similarly to the second replicate. In the first replicate, the probability mass is partly distributed to the left lower quadrant, which corresponds to cases where a gene is decelerated in both lineages in the first replicate relative to the second replicate. We can conclude that in terms of genome-wide expression dynamics the first and third replicates are different from each other and that the third and second replicates are similar to each other since the mass in Figure 5A (bottom panel) is centered strongly around the point (0, 0). Figure 5B and C shows the distributions of time differences between the replicates over all the differentially expressed genes for Th0 and Th17 lineages, respectively. Histograms in Figure 5B and C suggests that both the activation (Th0) and differentiation (Th17) are delayed in the first replicate. We did not observe a difference in differentiation efficiencies for all differentially expressed genes but there is clear shift of the probability mass towards deceleration. Whereas, for the third replicate the posterior distribution is centered around the region corresponding to no time scaling. We conclude that the first replicate differs from the other replicates in its differentiation kinetics.

Comparison of temporal and timepoint-wise analysis
In order to study advantages and disadvantages of our temporal analysis, we carried out a differential expression analysis at the individual timepoints using DESeq tool for comparison purposes. For each timepoint we call a gene differentially expressed if multiple testing corrected (Benjamini-Hochberg method) p adj 50:01 and the absolute value of the log 2 foldchange is41. Combining differentially expressed genes from different timepoints, timepoint-wise analysis gives a total of 823 genes, which is in agreement with the number detected by DyNB. Comparing directly the numbers of genes detected by the frequentist DESeq and our Bayesian DyNB may not be exactly meaningful due to differences in defining the detection thresholds, and simply because timepoint-wise analysis has four times more differential expression tests. Instead, results from the two methods need more careful investigation. Overlap of the differentially expressed genes identified by the two approaches, DyNB and DESeq, are depicted in Figure 6A (top panel). Out of 698 differentially expressed genes identified by DyNB, 546 are also detected by the DESeq. Figure 6A (bottom panel) shows a similar Venn diagram but now using only the top 698 genes from the timepoint-wise analysis (ranked according to the adjusted P-values). In this case, 500 genes overlap between temporal and timepoint-wise analysis. The overall agreement between the two methods is demonstrated by the hypergeometric test of gene set overlap (P51 e-16).
Next we wanted to see how the overlap between temporal and timepoint-wise analysis changes when we consider separately the top 698 genes that are identified by DESeq exactly at one, two, three, or four timepoints. The number of genes belonging to each class is shown in Figure 6B. The agreement between the two methods for different gene classes was quantified using the precision-recall metric as a function of the statistical significance from DyNB analysis (Fig. 6C). As expected, the level of agreement between the presented method and DESeq correlates with the number of timepoints where DESeq identified genes to be differentially expressed. For example, the genes differentially expressed in all four timepoints based on the DESeq analysis are all detected by DyNB as well. We conclude that, on average, both the temporal and timepoint-wise analysis detect largely the same genes, which have a strong differential expression, as expected. However, i118 the overlap is not perfect and different results are reported for genes whose differential expression is weaker or noise level higher and for genes which are affected by variable differentiation efficiency. Additionally, DyNB provides insights into differentiation efficiencies between biological replicates, which is not possible with timepoint-wise or traditional temporal methodologies. DyNB allows each gene to have its own time scalings between replicates. Thus, we studied the effect of the assumption that all genes would be affected similarly by the differential differentiation efficiency. This was done by introducing informative delay priors ( Supplementary Fig. S3A), which closely resembles the posterior distribution of time-scaling parameters obtained from the application of DyNB (Fig. 6C). After applying DyNB with the strong time-scaling prior, we noticed that the distributions of the estimated time differences of the differentially expressed genes (the same criteria as before) resembled the informative prior distributions as depicted in Supplementary Figure S3B, indicating that the time differences can be estimated even without strong regularization. Consequently, we believe that it is more beneficial to apply DyNB without the informative prior distribution because, e.g. in the context of Th17 differentiation only a fraction of genes respond to the differentiation.
We also compared DyNB (with and without the informative delay prior) with the next-maSigPro (Conesa and Nueda, 2013). Interestingly, next-maSigPro (Benjamini-Hochberg-corrected P-value50.01 with the negative binomial model) showed the weakest level of agreement with the other methods as depicted in Supplementary Figure S3C.
Three representative examples detected by DyNB, but not identified by DESeq from timepoint-wise analysis with the aforementioned criteria, are shown in Figure 7. These genes illustrate the benefits of the time-scaling parameter. The gene ISG20 has similar behavior as the IL17A gene, i.e. it is induced between the last two timepoints (48 and 72 h) but the activation is delayed in the first replicate. ISG20 has been reported to have a role in Th17 cells (Pan et al., 2013). The members of the RAB protein family, e.g. RAB3, are known to play a major role in protein-mediated transport and in fusion of intracellular structures and are highly expressed in various cells of immune system, especially after activation (Pei et al., 2012). TIAM1 (T lymphoma invasion and metastasis protein 1) has shown to have a role in T-cell trafficking through Rac activation (G erard et al., 2009). On the contrary, Supplementary Figure S4 shows two representative genes, KIF11 and MAP1B, which are detected by the timepoint-wise analysis, but not by the temporal analysis implemented in DyNB. Temporal analysis together with the possibility to account for variable differentiation efficiencies can filter out those genes for which the replicated Th0 and Th17 profiles are seemingly similar and thus likely false positives.

DISCUSSION AND CONCLUSIONS
We presented the first statistical method, DyNB, to study RNAseq dynamics together with a method to correct for, or detect, DyNB is compared with a commonly used method, DESeq that relies on the same statistical assumptions but analyzes data from each timepoints separately and, therefore, ignores correlations between timepoints. As expected, the comparison showed that the agreement between the methods is high but at the same time temporal modeling approach has some benefits. The most notable advantage is the possibility to take into account different differentiation efficiencies between biological replicates. Indeed, many experimental systems in cell development and differentiation display subtle kinetic differences between replicates, which are not necessarily apparent until large-scale transcriptomics data are obtained. This method might critically help improve the interpretation of such experiments. Concerning future improvements, the proposed straightforward MCMC sampling scheme might lead to inefficient sampling if more parameters are marginalized. In those cases, sampling could be improved by using more elegant samplers, such as elliptical sampling (Murray et al., 2009).
Our results show that a temporal analysis can bring insights into analysis of differentiation processes and help in the analysis of time-series datasets. We demonstrated applicability of DyNB by applying it to time series RNA-seq data from Th17 and Th0 lineages and identified novel Th17-specific genes. We used qRT-PCR to validate our computational predictions of sample-specific time scales. For example, by taking into account differences in differentiation efficiencies, we can identify a more complete set of differentially expressed genes. In turn, this improves our ability to discern subtle changes in regulatory pathways and broaden the scope of targets available for intervention.