-
PDF
- Split View
-
Views
-
Cite
Cite
Alexey Uvarovskii, Christoph Dieterich, pulseR: Versatile computational analysis of RNA turnover from metabolic labeling experiments, Bioinformatics, Volume 33, Issue 20, October 2017, Pages 3305–3307, https://doi.org/10.1093/bioinformatics/btx368
- Share Icon Share
Abstract
Metabolic labelling of RNA is a well-established and powerful method to estimate RNA synthesis and decay rates. The pulseR R package simplifies the analysis of RNA-seq count data that emerge from corresponding pulse-chase experiments.
The pulseR package provides a flexible interface and readily accommodates numerous different experimental designs. To our knowledge, it is the first publicly available software solution that models count data with the more appropriate negative-binomial model. Moreover, pulseR handles labelled and unlabelled spike-in sets in its workflow and accounts for potential labeling biases (e.g. number of uridine residues).
The pulseR package is freely available at https://github.com/dieterich-lab/pulseR under the GPLv3.0 licence.
Supplementary data are available at Bioinformatics online.
1 Introduction
Gene expression abundance levels are defined by rates of RNA synthesis and degradation. Understanding how certain gene levels are regulated by these processes across different experimental conditions helps to gain deeper mechanistic insights into RNA metabolism. Pulse-chase experiments facilitate measurement of such kinetics (Wachutka and Gagneur, 2016). Generally, ‘tagged’ nucleoside analogs are introduced to the medium, taken up by cells and incorporated into nascent RNA molecules. For example, 4-thiouridine labelling (4sU), which was pioneered by Dölken et al. (2008) in this context, is used to estimate kinetic rates of RNA metabolism in an increasing number of studies [see Wachutka and Gagneur (2016) for review]. Briefly, RNA labelling facilitates to separate newly synthesized from pre-existing RNA. RNA-seq data, which are generated from sequencing these RNA pools, have a discrete nature. To date, there is no publicly available software for kinetic parameter estimation of gene expression, which is specifically designed to handle fragment count data. Here we present the pulseR package, which allows to process RNA-seq data from 4sU-labelling or similar experiments with complex designs.
2 Implementation
2.1 Parameter definition
RNA dynamics can be described by ordinary differential equations, which have a simple analytic solution if the degradation and synthesis rates are assumed to be constant. In the pulseR package, users need to specify the expressions for the mean RNA abundances. Alternatively, formulas can be generated using package functions for the most frequent use cases, see package documentation.
Although most interest is focused on the gene-specific parameters, pulseR allows to introduce shared parameters. This can be useful for taking into account the difference in the uridine content, since it can introduce a bias in the estimations (Miller et al., 2011; Schwalb et al., 2012). In this case, the RNA abundances are multiplied by a probability that at least one uridine in the molecule is substituted by 4sU. The shared parameter then is the probability for a single base to be substituted by a 4sU.
2.2 Normalization
In case spike-ins are present, can be directly estimated from spike-in read counts. To this end, the user provides lists of spike-ins, which represent unlabelled or labelled RNA.
2.3 Parameter estimation
We assume r to be shared between all samples and genes. Otherwise it would not be possible to infer all parameters from a small number of replicates (usually, only two or three points are available).
We separated the fitting procedure into several simpler steps:
fitting of gene-specific parameters (e.g. degradation rate)
fitting of shared parameters
fitting of the normalization factors (for a spike-in-free design)
estimation of the dispersion parameter
We repeat the steps 1–4 until user-specified convergence criteria are met. We do not consider gene–gene interactions in this model, which allows us to fit gene-specific parameters independently for every individual gene.
We optimize the likelihood functions by using the L-BFGS-U method (Byrd et al., 1995), which is available in the stats R package (R Core Team, 2017).
3 Discussion
3.1 Comparison with existing approaches
All published approaches differ in terms of data normalization, statistical model and level of detail with regard to RNA metabolism. We have compared the following software packages with pulseR: DRiLL (Rabani et al., 2014), INSPEcT (De Pretis et al., 2015), DTA (Schwalb et al., 2012), HALO (Friedel et al., 2010). In most cases, samples are normalized by utilizing overdetermination of the system. The normalization coefficients are either estimated via regression (DTA, HALO) or during the MLE procedure together with other parameters (INSPEcT, DRiLL). Additionally, pulseR allows to use spike-in counts as an alternative normalization strategy. HALO and DTA estimate degradation rates simply as a ratio of labelled and total RNA fractions. In DRiLL, expression levels are fitted to the binomial distribution. However, the kinetic rates are estimated via optimization of residual sum of squares in both, DRiLL and INSPEcT. In contrast, pulseR assumes the NB distribution in MLE of all parameters, which allows to work directly with count data. Experiments may vary in design and the number of conditions and pulseR offers unprecedented flexibility. While DTA and HALO estimate rates on the basis of a single time point only (using all three fractions), pulseR, DRiLL and INSPEcT can infer rates from several time points. Although pulseR can handle different designs including pulse, chase- or combined experiments, all other packages work only with pulse experiments. However, the DRiLL and INSPEcT packages can model time-dependent rates out of the box. Moreover, the DRiLL software is able to model the gene-transcript dependence structure. Table 1 summarizes all relevant software features. We simulated count data for a pulse experiment in order to evaluate pulseR performance. The detailed protocol of this procedure is described in the Supplementary Material.
Feature comparison of software packages for parameter estimation in pulse-chase experiments
. | pulseR . | DRiLL . | INSPEcT . | DTA . | HALO . |
---|---|---|---|---|---|
Statistical model | NB | N, BIN | N | − | − |
Spikestic | + | − | − | − | − |
Several time points | + | + | + | − | − |
Variable designa | + | − | − | − | − |
Non-constant rates | − | + | + | − | − |
Uridine bias | * | − | − | + | + |
RNA processing | * | + | + | − | − |
Language | R | MATLAB | R | R | Java |
. | pulseR . | DRiLL . | INSPEcT . | DTA . | HALO . |
---|---|---|---|---|---|
Statistical model | NB | N, BIN | N | − | − |
Spikestic | + | − | − | − | − |
Several time points | + | + | + | − | − |
Variable designa | + | − | − | − | − |
Non-constant rates | − | + | + | − | − |
Uridine bias | * | − | − | + | + |
RNA processing | * | + | + | − | − |
Language | R | MATLAB | R | R | Java |
Note: +, available; –, not implemented; *, must be defined by user; N, normal; NB, negative binomial; BIN, binomial.
Pulse, chase or combination thereof experiments.
Feature comparison of software packages for parameter estimation in pulse-chase experiments
. | pulseR . | DRiLL . | INSPEcT . | DTA . | HALO . |
---|---|---|---|---|---|
Statistical model | NB | N, BIN | N | − | − |
Spikestic | + | − | − | − | − |
Several time points | + | + | + | − | − |
Variable designa | + | − | − | − | − |
Non-constant rates | − | + | + | − | − |
Uridine bias | * | − | − | + | + |
RNA processing | * | + | + | − | − |
Language | R | MATLAB | R | R | Java |
. | pulseR . | DRiLL . | INSPEcT . | DTA . | HALO . |
---|---|---|---|---|---|
Statistical model | NB | N, BIN | N | − | − |
Spikestic | + | − | − | − | − |
Several time points | + | + | + | − | − |
Variable designa | + | − | − | − | − |
Non-constant rates | − | + | + | − | − |
Uridine bias | * | − | − | + | + |
RNA processing | * | + | + | − | − |
Language | R | MATLAB | R | R | Java |
Note: +, available; –, not implemented; *, must be defined by user; N, normal; NB, negative binomial; BIN, binomial.
Pulse, chase or combination thereof experiments.
Acknowledgements
We would like to thank all members from the Dieterich Lab for their great input. Special thanks go to Isabel Naarman-de Vries for all experimental support and insights.
Funding
The work of A.U. and C.M. was kindly supported by the Klaus Tschira Stiftung gGmbH and the German Center for Cardiovascular Research (DZHK).
Conflict of Interest: none declared.
References