Abstract

Summary

Measurement of single-cell gene expression at different timepoints enables the study of cell development. However, due to the resource constraints and technical challenges associated with the single-cell experiments, researchers can only profile gene expression at discrete and sparsely sampled timepoints. This missing timepoint information impedes downstream cell developmental analyses. We propose scNODE, an end-to-end deep learning model that can predict in silico single-cell gene expression at unobserved timepoints. scNODE integrates a variational autoencoder with neural ordinary differential equations to predict gene expression using a continuous and nonlinear latent space. Importantly, we incorporate a dynamic regularization term to learn a latent space that is robust against distribution shifts when predicting single-cell gene expression at unobserved timepoints. Our evaluations on three real-world scRNA-seq datasets show that scNODE achieves higher predictive performance than state-of-the-art methods. We further demonstrate that scNODE’s predictions help cell trajectory inference under the missing timepoint paradigm and the learned latent space is useful for in silico perturbation analysis of relevant genes along a developmental cell path.

Availability and implementation

The data and code are publicly available at https://github.com/rsinghlab/scNODE.

1 Introduction

A fundamental challenge in biology is understanding how gene expression changes over time during cell development (Moris et al. 2016, Fleck et al. 2022). With the advent of high-throughput single-cell RNA sequencing (scRNA-seq), researchers now routinely profile gene expression at single-cell resolution (Hwang et al. 2018, Chen et al. 2019), revealing substantial heterogeneity within the same tissue type or developmental stage (Chen et al. 2022a, Ding et al. 2022). Profiling scRNA-seq data at multiple timepoints allows us to understand how biological processes unfold within cell populations by inferring cellular trajectories, identifying cell fate transitions, and characterizing differentially expressed genes (Liu et al. 2022, Qiu et al. 2022).

Temporal scRNA-seq experiments, however, have significant limitations that impede developmental studies. Due to the substantial time and resources involved in conducting these experiments, researchers generally only profile gene expression at discrete and sparsely sampled timepoints (Ding et al. 2022). Since it is infeasible to make continuous-time observations, this results in information loss between consecutively measured discrete timepoints. Therefore, it is crucial to develop a model that predicts realistic in silico gene expression at any timepoint, whether between (interpolation) or beyond the measured time intervals (extrapolation), to enable single-cell temporal analyses (Fig. 1A and B).

Model overview. (A) The measurements in temporally recorded single-cell data are sparse and unaligned, such that some timepoints are not measured and measurements at different timepoints contain different sets of cells. The datasets in this study sample cellular populations at uniformly spaced time points, although uniform spacing is not a requirement (as depicted in this graphic). (B) scNODE is used to predict expression at unmeasured timepoints. Predictions can be used to analyze cell state transitions and developmental trajectories. (C) scNODE uses VAEs to find an optimal low-dimensional representation. scNODE then uses this representation to solve neural ODEs and predict gene expression at any timepoint t=t1,t2,… starting from the first timepoint t = 0. Optimal models are fitted by minimizing the Wasserstein distance between observations and predictions at measured timepoints. scNODE also introduces a dynamic regularization to learn a latent space robust to distribution shift.
Figure 1.

Model overview. (A) The measurements in temporally recorded single-cell data are sparse and unaligned, such that some timepoints are not measured and measurements at different timepoints contain different sets of cells. The datasets in this study sample cellular populations at uniformly spaced time points, although uniform spacing is not a requirement (as depicted in this graphic). (B) scNODE is used to predict expression at unmeasured timepoints. Predictions can be used to analyze cell state transitions and developmental trajectories. (C) scNODE uses VAEs to find an optimal low-dimensional representation. scNODE then uses this representation to solve neural ODEs and predict gene expression at any timepoint t=t1,t2, starting from the first timepoint t =0. Optimal models are fitted by minimizing the Wasserstein distance between observations and predictions at measured timepoints. scNODE also introduces a dynamic regularization to learn a latent space robust to distribution shift.

Single-cell gene expression prediction has been investigated in previous studies (Cao et al. 2021) through various generative models. Existing methods use generative adversarial networks (GANs) (Marouf et al. 2020) or statistical frameworks (Zappia et al. 2017, Risso et al. 2018, Li and Li 2019, Sun et al. 2021) to fit characteristics of the observed single-cell gene expression and augment the dataset. However, these methods are designed for single-cell gene expression measured at a single time snapshot and do not consider developmental dynamics.

Several computational methods have been developed to learn gene expression dynamics by inferring cell trajectories (Trapnell et al. 2014, Qiu et al. 2017, Herman et al. 2018, Saelens et al. 2019, Wolf et al. 2019, Li 2023) from temporal scRNA-seq. However, most of them infer pseudo-times, which analyze cell differentiation based on geometric distances in low-dimensional spaces. Hence, they do not accurately model how cells develop in real physical time. Sagittarius method (Woicik et al. 2023) predicts gene expressions at future timepoints but requires cell matching between timepoints. This matching is hard to obtain for real-world scRNA-seq as the cells are lysed during the experiment. RNA velocity based methods (La Manno et al. 2018, Bergen et al. 2021, 2020) are used to uncover developmental trends and underlying kinetics of gene expression. These methods require deeply sequenced datasets to obtain spliced and unspliced counts as prior information, and most of them are nongenerative methods. Waddington-OT (Schiebinger et al. 2019) and TrajectoryNet (Tong et al. 2020) interpolate single-cell gene expression between two timepoints in a developmental trajectory but are unable to extrapolate beyond the observed timepoints.

Recently, a few generative methods have modeled cell differentiation trajectory and predicted gene expression at multiple unobserved time points. For example, PRESCIENT (Yeo et al. 2021) applies principal component analysis (PCA) to reduce high-dimensional gene space to a low-dimensional representation. It then applies neural ordinary differential equations (ODEs) (Chen et al. 2018) in this low-dimensional space to model the cell developmental trajectory. Multiple studies (Ding et al. 2018, Tran et al. 2021, Xiang et al. 2021) have found that PCA-based low-dimensional representations of scRNA-seq expression cannot capture complex relationships in the highly heterogeneous single-cell data. Thus, PCA may have the issue of overcrowding representation (Kobak and Berens 2019, Tran et al. 2021), where cells of different types are poorly separated in PCA-based low-dimensional space, and cellular variations are lost. MIOFlow (Huguet et al. 2022) replaces PCA with a geodesic variational autoencoder (VAE) to learn nonlinear low-dimensional representations that better retain cellular variations. However, this method fixes its low-dimensional space at the beginning, followed by neural ODE modeling. We hypothesize that a fixed low-dimensional representation obtained using observed timepoints may not generalize to an unobserved timepoint, if its underlying distribution differs substantially.

We propose a single-cell neural ODE (scNODE) framework, which integrates a VAE (Kingma and Welling 2019) with neural ODE to predict accurate gene expression at any unmeasured timepoint using a dynamic low-dimensional space (Fig. 1C). First, scNODE uses VAE to obtain a low-dimensional representation that retains cellular variations. scNODE then predicts single-cell gene expression for unmeasured timepoints with a neural ODE. Importantly, scNODE uses a dynamic regularization term to incorporate the dynamic manifold learned from neural ODE as a prior. Therefore, the learned VAE space is robust to the distribution shift between observed and unobserved time points. We evaluate scNODE on three real-world scRNA-seq datasets. Our results demonstrate that scNODE predicts single-cell gene expression more accurately than state-of-the-art methods for unobserved timepoints (interpolation and extrapolation). We further show that scNODE gene expression predictions assist with cell developmental trajectory inference. Finally, scNODE learns an interpretable low-dimensional space, which enables conducting in silico perturbation analysis of relevant genes to study cell development.

2 Materials and methods

Notation: Let X(t)Rnt×p denote the gene expression matrix at the tth time point of nt cells by p genes. Given gene expression measurements at observed timepoints {X(t)}tT, our goal is to predict gene expression at any timepoint. Here, T{0,1,2,} denotes the observed timepoint indices. Note that we assume the same set of highly variable genes (HVGs) is measured at each time point, but due to the destruction of cells during scRNA experiments, a different set of cells is measured at each time point. Furthermore, let Z(t)Rnt×d denote the d-dimensional (dp) latent variable at the tth time point.

2.1 scNODE uses VAE to learn complex low-dimensional space

scNODE uses a VAE to learn a low-dimensional representation (also known as latent space) of the single-cell gene expression measurements at the observed timepoints. A VAE is a neural network-based generative model that maps high-dimensional data to a low-dimensional representation and is widely used in many single-cell studies (Wang and Gu 2018, Grønbech et al. 2020, Lin et al. 2020, Svensson et al. 2020), showing superior performance. The benefit of using VAEs for single-cell data over PCA is that the nonlinearity of neural networks can more effectively capture complex cell relationships and cellular variations.

scNODE first pre-trains a VAE to obtain a latent space that captures the gene expression of single cells at all available training timepoints. Specifically, scNODE trains the VAE component to perform multi-timepoint modeling by inputting the gene expression of all observed cells XALL=CONCAT(X(t) | tT). VAE is composed of two neural networks: (i) the encoder network Encϕ maps expression from gene space Rp to a low-dimensional latent space Rd, parameterized by a Gaussian distribution N(μ,σ), and (ii) the decoder network Decθ maps samples in the latent space to gene space in the opposite direction to reconstruct the input. Given XALL, VAE learns latent representations through
(1)
The encoder and decoder networks are parameterized by ϕ and θ, correspondingly. VAE minimizes a combination of the (i) mean squared error (MSE) between input gene expression and the reconstructed gene expression from the decoder and (ii) the Kulback-Leibler (KL) divergence (Csiszár 1975) between the latent distribution and a standard Gaussian prior:
(2)

KL divergence is an information-theoretic quantity that quantifies the distance between two probability distributions. By using a Gaussian prior, the KL divergence forces the encoder to find latent representations of the cells that are well-separated. scNODE pre-trains the VAE component with Equation (2), such that the encoder latent space preserves the variation of all observed cells. Next, we model the developmental dynamics of this latent space.

2.2 scNODE uses neural ODE to model cell dynamics

scNODE uses ODEs to model cell developmental dynamics in the latent space learned by the VAE. An ODE is an equation describing how a quantity x changes with respect to an independent variable y, such that dx=f(x;y)dy where function f represents the derivative. Therefore, we can use differential equations to model how gene expression changes with respect to time. However, for the high-dimensional data, finding the solution of the derivative function f through numerical methods is intractable and computationally expensive (Kidger 2021). Therefore, recent studies adopt neural networks to approximate the derivative function and have proposed neural ODEs (Chen et al. 2018). Neural ODEs (formulated with respect to time) are useful for constructing continuous-time trajectories and have been explored before to model single-cell development (Matsumoto et al. 2017, Chen et al. 2022b, Qiu et al. 2022).

scNODE uses neural ODEs to parameterize the continuous dynamics of gene expression in the latent space. Specifically, scNODE quantifies changes of cell latent representation Z(t) (from VAE encoding) at time t through a neural ODE
(3)
Here, Driftω is a nonlinear neural network with parameterization ω, modeling the developmental velocities in the latent space, such that Driftω(Z(t);t) represents the direction and strength of cellular transitions. scNODE computes the latent representation Z(0) of cells at the first timepoint t =0 through the encoder network (pre-trained in the previous step) and predicts the subsequent cell states step-wise at any timepoint t as follows:
(4)

Here, hyperparameter Δt denotes step size and drift term Driftω(Z(t),t)Δt represents the forward steps taken in the latent space. While there are several methods that can solve this ODE, in this work, we apply the commonly used first-order Euler method [in Equation (4)] for convenience of explanation. However, in our implementation, one can specify any ODE solver.

To fit the continuous trajectory (controlled by Driftω) to the observations, scNODE minimizes the difference between the input and the reconstructed gene expression. Specifically, at each measured timepoint tT, scNODE uses the decoder Decθ to convert latent variables Z(t) generated from Equation (4) back to the gene space through X^(t)=Decθ(Z(t)). Because we have no correspondence between true cells and cells generated from the ODE model, scNODE utilizes the Wasserstein metric (Cuturi 2013) to measure the distance between distributions defined by ground truth X and predictions X^ as
(5)

Here, Π(X,X^) denotes the set of all transport plans between each cell of X and X^ and Dij represents the 2 distance, such that the Wasserstein metric adopts the minimal-cost transport plan Γ to measure the data dissimilarity.

2.3 scNODE uses dynamic regularization for learning a robust latent space

The VAE and its corresponding latent space are trained using training (or observed) timepoints. In the temporal scRNA-seq data where gene expression distribution changes substantially, a fixed low-dimensional representation obtained using training timepoints may not generalize to a testing (or unobserved) timepoint. To overcome this distribution shift issue, scNODE uses a dynamic regularization term to update the VAE space dynamically such that it captures both cellular variations and the developmental dynamics of the scRNA-seq data. Specifically, scNODE minimizes the difference between the latent variables generated by the VAE (i.e. Encϕ(X)) and the dynamics learned by the ODE (i.e. Z). Because we have no correspondence between them, scNODE again uses Wasserstein distance to evaluate their difference at each observed timepoint tT and defines the dynamic regularization as
(6)
Therefore, scNODE jointly updates VAE and neural ODE, optimizing parameters ϕ, θ, and ω, by minimizing the regularized loss function
(7)
so that the overall dynamics update the final latent space of scNODE through dynamic regularization and corresponding hyperparameter β.

Our dynamic regularization improves upon previous generative models (Schiebinger et al. 2019, Tong et al. 2020, Yeo et al. 2021, Huguet et al. 2022), which fix the latent space at the beginning and then learn the cell dynamics. scNODE uses the dynamic manifold learned from neural ODE as a prior and enforces VAE latent space to incorporate it using dynamic regularization. Therefore, updating the VAE ensures that scNODE fits the data better and learns a latent space that is robust to distribution shift when generating gene expression for unobserved timepoints. Previous work by Connor et al. (2021) similarly introduced a linear dynamic system-based model to learn a smooth low-dimensional manifold and then incorporated this manifold (as a prior) into VAE to encourage the latent space to fit the data manifold better. This resolved the issue that VAE latent representations sometimes would not match the true data manifold and poorly defined natural paths between data points. Therefore, our dynamic regularization will result in improved representation learning.

3 Results

3.1 Experimental setup

3.1.1 Datasets

We use three publicly available scRNA-seq datasets to demonstrate the capabilities of scNODE in predicting developmental dynamics from real-world single-cell gene expression data. These datasets are summarized in Table 1, have >10 timepoints, and cover various species and tissues. For each timepoint, gene expression is measured at a developmental stage (ZB), every hour (DR), or every 12 h (SC). To make computations tractable, we relabel timepoints with consecutive natural numbers starting from 0. In particular, the meaning of interpolating or extrapolating is defined by these data-specific units. For example, extrapolating one timepoint in the DR dataset means predicting gene expressions for the next hour. We use the data after removing batch effects among different timepoints. In each experiment, we select the top 2000 most HVGs from the datasets and normalize the unique molecular identifier count expression through a log transformation with pseudo-count.

Table 1.

Data descriptions of the three real-world scRNA-seq datasets used in experiments.

IDDatasetSpeciesNo. of cellsNo. of timepointsSource
ZBzebrafish embryoDanio rerio38 73112Broad Single-Cell Portal SCP162 (Farrell et al. 2018)
DRdrosophilaDrosophila melanogaster27 38611GSE190149 (Calderon et al. 2022)
SCSchiebinger2019Mus musculus236 28519Schiebinger et al. (2019)
IDDatasetSpeciesNo. of cellsNo. of timepointsSource
ZBzebrafish embryoDanio rerio38 73112Broad Single-Cell Portal SCP162 (Farrell et al. 2018)
DRdrosophilaDrosophila melanogaster27 38611GSE190149 (Calderon et al. 2022)
SCSchiebinger2019Mus musculus236 28519Schiebinger et al. (2019)
Table 1.

Data descriptions of the three real-world scRNA-seq datasets used in experiments.

IDDatasetSpeciesNo. of cellsNo. of timepointsSource
ZBzebrafish embryoDanio rerio38 73112Broad Single-Cell Portal SCP162 (Farrell et al. 2018)
DRdrosophilaDrosophila melanogaster27 38611GSE190149 (Calderon et al. 2022)
SCSchiebinger2019Mus musculus236 28519Schiebinger et al. (2019)
IDDatasetSpeciesNo. of cellsNo. of timepointsSource
ZBzebrafish embryoDanio rerio38 73112Broad Single-Cell Portal SCP162 (Farrell et al. 2018)
DRdrosophilaDrosophila melanogaster27 38611GSE190149 (Calderon et al. 2022)
SCSchiebinger2019Mus musculus236 28519Schiebinger et al. (2019)

3.1.2 Training and testing

We test scNODE’s performance in predicting gene expression at unobserved timepoints. Specifically, for each dataset, we remove several timepoints to test whether scNODE can recover these left-out observations. We design three tasks: (i) easy tasks where we remove uniformly spaced timepoints in the middle of the measured time range for interpolating, (ii) medium task where we remove the last few timepoints for extrapolating beyond the measured time range, and (iii) hard tasks where we combine the interpolation and extrapolation schemes. Supplementary Table S1 shows the timepoints removed in each task. We consider the left-out timepoints to be the testing set, and the remaining timepoints are used to train the model. In each case, HVGs are selected based on cells corresponding to the training timepoints in order to avoid data leakage.

3.1.3 Baselines

We compare scNODE with two main state-of-the-art generative methods that are capable of both interpolating and extrapolating gene expression:

  • PRESCIENT: Yeo et al. (2021) propose a generative model, called Potential eneRgy undErlying Single Cell gradIENTs (PRESCIENT), to learn the differentiation landscape from single-cell time-series gene expression data. PRESCIENT maps single-cell gene expression to a lower-dimensional PCA space and models cell differentiation with a neural ODE.

  • MIOFlow: Huguet et al. (2022) integrate geodesic VAE and neural ODE to model the paths of cells in a lower-dimensional latent space while cellular variations are preserved.

3.1.4 Hyperparameter tuning

In each task for every dataset, we select corresponding hyperparameters for all methods (our and baselines) that yield the minimum averaged Wasserstein distance using the 3-fold cross-validation scheme. We use Optuna (Akiba et al. 2019) to automatically determine the optimal hyperparameters and use sufficiently large hyperparameter ranges for search and evaluation. The hyperparameter ranges of scNODE and baselines are listed in Supplementary Table S2. We set the latent space dimension as 50, use the first-order Euler ODE solver, and set ODE step size Δt=0.1 for all methods, and run each method for sufficient iterations to ensure they converge. We let every model predict 2000 cells at each testing timepoint to ensure a fair metric comparison. Moreover, we evaluate scNODE performance using different hyperparameter settings, conduct ablation studies, and give heuristic guidance on how to set hyperparameters in real-world scenarios in Supplementary Section S6.3.

3.2 scNODE can accurately predict gene expression at unobserved timepoints

We compare scNODE’s performance with baseline methods for predicting gene expression for left-out timepoints. Figure 2 (along with Supplementary Figs. S1–S3) visualizes true gene expression values and model predictions for left-out testing timepoints in 2D Uniform Manifold Approximation and Projection (UMAP) (McInnes et al. 2018) space. Our results indicate scNODE’s predictions align well with ground truth. Qualitative evaluation of all scRNA-seq datasets indicates that scNODE can accurately predict gene expression at missing timepoints.

2D UMAP visualization of true and predicted gene expressions in hard tasks. Gray points represent training data.
Figure 2.

2D UMAP visualization of true and predicted gene expressions in hard tasks. Gray points represent training data.

We then quantitatively evaluate model predictions. We use the Wasserstein distance to measure the similarity between true and predicted gene expression at each testing timepoint, where a lower value corresponds to more accurate predictions. This evaluation metric has been used in previous studies (Tong et al. 2020, Yeo et al. 2021) because Wasserstein distance learns a soft mapping across the distributions, and we do not have one-to-one cell matching between true and predicted gene expression for single-cell measurements. Table 2 shows the Wasserstein distance of left-out testing timepoints averaged on five trials for defined hard tasks, and Supplementary Table S3 shows metrics for easy and medium tasks. scNODE clearly outperforms the baselines in most cases. In other cases where scNODE has the second best predictions, such as t =7 of SC hard task, scNODE has similar performance as MIOFlow but significantly performs better than PRESCIENT. Moreover, in medium and hard tasks, where extrapolations are required, scNODE can have substantial improvement over baselines. For example, at t =15 of SC hard task (in Table 2), Wasserstein distance of scNODE predictions is around 132, while that of PRESCIENT and MIOFlow are about 150 and 162 respectively. In Supplementary Fig. S4, we further evaluate all model predictions when extrapolating more timepoints. As expected, their extrapolations become less accurate the farther out predictions are made. But scNODE still demonstrates improvement in most cases. Therefore, scNODE has consistent good performance, especially in more challenging prediction tasks.

Table 2.

Wasserstein distance of predictions in hard tasks (inter- and extrapolation).a

MethodZB/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =10t =11
scNODE579.10508.55440.92517.81652.36707.10
MIOFlow580.18516.59453.61536.35671.23734.42
PRESCIENT1381.961002.62730.974701.29916.51973.17
MethodZB/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =10t =11
scNODE579.10508.55440.92517.81652.36707.10
MIOFlow580.18516.59453.61536.35671.23734.42
PRESCIENT1381.961002.62730.974701.29916.51973.17
MethodDR/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =9t =10
scNODE445.82464.78535.78600.18585.60718.20
MIOFlow443.56469.51532.93617.48680.41852.02
PRESCIENT524.38511.61539.38621.31575.45718.56
MethodDR/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =9t =10
scNODE445.82464.78535.78600.18585.60718.20
MIOFlow443.56469.51532.93617.48680.41852.02
PRESCIENT524.38511.61539.38621.31575.45718.56
MethodSC/hard task
Interpolation
Extrapolation
t =5t =7t =9t =11t =15t =16t =17t =18
scNODE55.2259.89103.26140.81132.86148.89137.90151.13
MIOFlow55.0761.80108.72156.51162.12191.40189.39215.74
PRESCIENT85.3687.47114.16142.03150.53161.59147.23155.06
MethodSC/hard task
Interpolation
Extrapolation
t =5t =7t =9t =11t =15t =16t =17t =18
scNODE55.2259.89103.26140.81132.86148.89137.90151.13
MIOFlow55.0761.80108.72156.51162.12191.40189.39215.74
PRESCIENT85.3687.47114.16142.03150.53161.59147.23155.06
a

Bold numbers denote the best prediction, and underlined numbers represent the second best.

Table 2.

Wasserstein distance of predictions in hard tasks (inter- and extrapolation).a

MethodZB/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =10t =11
scNODE579.10508.55440.92517.81652.36707.10
MIOFlow580.18516.59453.61536.35671.23734.42
PRESCIENT1381.961002.62730.974701.29916.51973.17
MethodZB/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =10t =11
scNODE579.10508.55440.92517.81652.36707.10
MIOFlow580.18516.59453.61536.35671.23734.42
PRESCIENT1381.961002.62730.974701.29916.51973.17
MethodDR/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =9t =10
scNODE445.82464.78535.78600.18585.60718.20
MIOFlow443.56469.51532.93617.48680.41852.02
PRESCIENT524.38511.61539.38621.31575.45718.56
MethodDR/hard task
Interpolation
Extrapolation
t =2t =4t =6t =8t =9t =10
scNODE445.82464.78535.78600.18585.60718.20
MIOFlow443.56469.51532.93617.48680.41852.02
PRESCIENT524.38511.61539.38621.31575.45718.56
MethodSC/hard task
Interpolation
Extrapolation
t =5t =7t =9t =11t =15t =16t =17t =18
scNODE55.2259.89103.26140.81132.86148.89137.90151.13
MIOFlow55.0761.80108.72156.51162.12191.40189.39215.74
PRESCIENT85.3687.47114.16142.03150.53161.59147.23155.06
MethodSC/hard task
Interpolation
Extrapolation
t =5t =7t =9t =11t =15t =16t =17t =18
scNODE55.2259.89103.26140.81132.86148.89137.90151.13
MIOFlow55.0761.80108.72156.51162.12191.40189.39215.74
PRESCIENT85.3687.47114.16142.03150.53161.59147.23155.06
a

Bold numbers denote the best prediction, and underlined numbers represent the second best.

Next, we validate that scNODE is more robust to distribution shift when testing timepoints have substantially different distributions from training data compared to baseline methods. Specifically, for the hard task of all three datasets, we first compute the distribution shift level for each testing timepoint as the averaged pairwise 2 distance between cells from training and testing timepoints. Here, a higher value indicates the testing point is more different from the training data. Then, we define scNODE’s improvement as the difference between its performance (calculated as Wasserstein distance between model predictions and ground truth) and the performance of the best baseline model. Thus, a higher value indicates that scNODE improves the predictions over the baseline. Figure 3 shows that scNODE improvements positively correlate with the distribution shift level (Spearman correlation ρ0.3 and is as high as 0.93.), implying scNODE achieves more improvements for significant distribution shifts. In addition, Supplementary Fig. S4 shows that scNODE’s performance is more stable than baselines when extrapolating multiple timepoints. The ablation study in Supplementary Table S6 further shows that excluding dynamic regularization worsens scNODE predictions. Therefore, due to the dynamic regularization introduced in our framework, scNODE obtains significant improvements over baseline models when distribution shifts of testing timepoints are large from the training timepoints.

scNODE improvements over baseline models versus distribution shift levels of testing points on hard tasks. Each point denotes one testing timepoint with scNODE improvements averaged on five trials. ρ: Spearman’s ρ correlations between improvements and distribution shift.
Figure 3.

scNODE improvements over baseline models versus distribution shift levels of testing points on hard tasks. Each point denotes one testing timepoint with scNODE improvements averaged on five trials. ρ: Spearman’s ρ correlations between improvements and distribution shift.

3.3 scNODE predictions for unobserved timepoints help recover cell trajectories

To determine if scNODE can aid with temporal downstream analysis, we carry out trajectory analysis on both real and predicted single-cell gene expression. Specifically, we focus on the hard task of all three datasets. Figure 4 visualizes the ZB data with all timepoints, after the removal of testing timepoints, and with predictions from all models. We apply partition-based graph abstraction (PAGA) (Wolf et al. 2019), a method that computes the topological structures of cellular populations and has been used in previous studies (Luecken and Theis 2019, Saelens et al. 2019, Han et al. 2020) to understand cell developmental topologies. We find that after timepoint removal, the topology breaks down due to the gaps between timepoints, which can impede the trajectory analysis, while scNODE predictions recover smooth and continuous trajectories. To quantitatively compare scNODE with other models in helping infer cell trajectories, we use the Ipsen-Mikhailov (IM) distance (Ipsen 2004, Saelens et al. 2019) to measure the similarity between the cell trajectory graphs constructed in each case. IM(G1,G2) is a graph similarity measurement defined as the square-root difference between the Laplacian spectrum of graphs G1 and G2. It ranges from 0 to 1, where 0 indicates maximum similarity between two graph structures and 1 indicates maximum dissimilarity. We find IM(Gtrue,Gremoval)=0.200 and IM(Gtrue,GscNODE)=0.093, indicating GscNODE is more similar to Gtrue than Gremoval such that scNODE predictions help recovering cell trajectories. Moreover, IM(Gtrue,GscNODE) being smaller than IM(Gtrue,GMIOFlow) and IM(Gtrue,GPRESCIENT) (Fig. 4) implies that scNODE predictions for missing timepoints best help to infer cell trajectories. We observe same trend for DR and SC datasets as well, where IM distance is lowest for scNODE (Supplementary Fig. S8).

Cell trajectories of ZB data with all timepoints, after removal of timepoints in the hard task, and with model predictions. The connective structure is constructed with PAGA, where black nodes represent cell clusters and edges connect two nodes if their expressions are similar. We show the IM index between Gtrue and the corresponding graph.
Figure 4.

Cell trajectories of ZB data with all timepoints, after removal of timepoints in the hard task, and with model predictions. The connective structure is constructed with PAGA, where black nodes represent cell clusters and edges connect two nodes if their expressions are similar. We show the IM index between Gtrue and the corresponding graph.

3.4 scNODE’s interpretable latent space assists with perturbation analysis

scNODE’s ability to learn an informative and robust latent space allows it to detect key driver genes for cell developmental paths and simulate cells with perturbed gene expression. This ability can be useful for performing in silico perturbation predictions. We validate this hypothesis by using the differential dynamics learned from the ZB dataset. We choose ZB data because the original dataset provides cell type labels at the last timepoint, which enables us to evaluate the perturbation predictions.

We first train scNODE with all timepoints in the ZB dataset and map all cells to the latent space with the learned scNODE encoder. In the latent space, we construct a most probable path between any two points through the Least Action Path (LAP) method (Qiu et al. 2012, 2022, Wang et al. 2014), which finds the optimal path between two cell states while minimizing its action and transition time (see Supplementary Section S6.5). Here, we construct LAP paths from cells at the starting point (i.e. t =0) to presomitic mesoderm (PSM) and Hindbrain cell populations (Fig. 5A).

scNODE perturbation analysis results. (A) 2D UMAP visualization of the least action path between cells at the starting point (t = 0) and the PSM and Hindbrain cell populations, respectively. (B) Gene expression value of the top-rank DE gene of PSM path (TBX16) and Hindbrain path (SOX3). (C) The ratio of PSM cells in predictions of different perturbation levels when perturbing DE genes and random non-DE genes.
Figure 5.

scNODE perturbation analysis results. (A) 2D UMAP visualization of the least action path between cells at the starting point (t =0) and the PSM and Hindbrain cell populations, respectively. (B) Gene expression value of the top-rank DE gene of PSM path (TBX16) and Hindbrain path (SOX3). (C) The ratio of PSM cells in predictions of different perturbation levels when perturbing DE genes and random non-DE genes.

Once the optimal path is constructed, we use the Wilcoxon rank-sum test to find differentially expressed (DE) genes for each path, representing key driver genes for the developmental trajectories. Figure 5 shows the expression of top-rank DE genes for the PSM path (TBX16) and Hindbrain path (SOX3), such that their respective expression levels vary along the path (as well as in adjacent regions). Warga et al. (2013) have found TBX16 regulates intermediate mesoderm cell fate. Other studies (Vriz et al. 1996, Dee et al. 2008) have found SOX3 regulates neural fates and is involved in central nervous system development.

Finally, we perturb the expression profiles of key genes in all cells at the starting timepoint by multiplying their expression values with different levels of coefficient {103,,103} to mimic overexpressing and knocking-out, and let scNODE predict trajectories for the perturbed gene expression. The perturbations are expected to result in changes in cell fates. We classify predicted cells with a Random Forest (RF) classifier (Breiman 2001) trained with all unperturbed cells. We find overexpressing TBX16 expression results in the increment of PSM cell ratios (Fig. 5C) to over 70%, while perturbing random non-DE genes leads to no changes in cell ratios. Moreover, the ratio of Hindbrain cells decreased by around 25% when we overexpressed SOX3 expressions (Supplementary Fig. S9). These results indicate that scNODE learns an interpretable latent space, where we can detect development-related DE genes and conduct in silico perturbation analysis.

4 Conclusion

We propose a generative model called scNODE to predict gene expression during cell development. Given a time-series scRNA-seq dataset, our model can both interpolate and extrapolate gene expression. Predicting gene expression for unobserved timepoints enables critical downstream analyses, such as differential gene expression analysis, perturbation analysis, and trajectory inference. To make accurate predictions, scNODE uses a nonlinear dimensionality reduction approach, neural ODE, and dynamic regularization. The regularization improves the robustness of the learned latent representation to distribution shifts in unobserved timepoints, resulting in better predictive performance.

We find that the primary challenge for the temporal scRNA-seq prediction task is extrapolating to multiple timepoints beyond the last observed timepoint, as the accuracy deteriorates the farther timepoint predictions. Moreover, time units in predictions are defined by time intervals in the data. For example, a generative model cannot accurately produce gene expression for every hour if the model is trained on observations that are only measured every day, especially when the development is rapid or contains complex trajectories.

In future work, we will incorporate knowledge about cell proliferation—which is known to be important to cellular differentiation—into the model to improve extrapolating performance. Also, although RNA velocity methods can be used as complementary approaches to scNODE to learn fine-grained dynamics from coarse-grained timepoints. We will also incorporate other modalities, such as single-cell chromatin accessibility (scATAC) datasets, which exhibit a regulatory mechanism that scRNA-seq data might not capture.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

None declared.

Funding

This project was funded by the National Institutes of Health (NIH) award 1R35HG011939-01. This paper was published as part of a supplement financially supported by ECCB2024.

Data availability

All data used in this study are available at https://github.com/rsinghlab/scNODE.

References

Akiba
T
,
Sano
S
,
Yanase
T
 et al. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA,
2019
,
2623
31
.

Bergen
V
,
Lange
M
,
Peidli
S
 et al.  
Generalizing RNA velocity to transient cell states through dynamical modeling
.
Nat Biotechnol
 
2020
;
38
:
1408
14
.

Bergen
V
,
Soldatov
RA
,
Kharchenko
PV
 et al.  
RNA velocity–current challenges and future perspectives
.
Mol Syst Biol
 
2021
;
17
:
e10282
.

Breiman
L.
 
Random forests
.
Mach Learn
 
2001
;
45
:
5
32
.

Calderon
D
,
Blecher-Gonen
R
,
Huang
X
 et al.  
The continuum of drosophila embryonic development at single-cell resolution
.
Science
 
2022
;
377
:
eabn5800
.

Cao
Y
,
Yang
P
,
Yang
JYH.
 
A benchmark study of simulation methods for single-cell RNA sequencing data
.
Nat Commun
 
2021
;
12
:
6911
.

Chen
G
,
Ning
B
,
Shi
T.
 
Single-cell RNA-seq technologies and related computational data analysis
.
Front Genet
 
2019
;
10
:
441123
.

Chen
RT
,
Rubanova
Y
,
Bettencourt
J
 et al.  
Neural ordinary differential equations
.
Adv Neural Inf Process Syst
 
2018
;
31
:
6572
83
.

Chen
W
,
Guillaume-Gentil
O
,
Rainer
PY
 et al.  
Live-seq enables temporal transcriptomic recording of single cells
.
Nature
 
2022a
;
608
:
733
40
.

Chen
Z
,
King
WC
,
Hwang
A
 et al.  
DeepVelo: single-cell transcriptomic deep velocity field learning with neural ordinary differential equations
.
Sci Adv
 
2022b
;
8
:
eabq3745
.

Connor
M
,
Canal
G
,
Rozell
C.
Variational autoencoder with learned latent structure. In: The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, Virtual Event. PMLR,
2021
,
2359
67
.

Csiszár
I.
 
I-divergence geometry of probability distributions and minimization problems
.
Ann Probab
 
1975
;
3
:
146
58
.

Cuturi
M.
 
Sinkhorn distances: lightspeed computation of optimal transport
.
Adv Neural Inf Process Syst
 
2013
;
26
:
2292
300
.

Dee
CT
,
Hirst
CS
,
Shih
Y-H
 et al.  
Sox3 regulates both neural fate and differentiation in the zebrafish ectoderm
.
Dev Biol
 
2008
;
320
:
289
301
.

Ding
J
,
Condon
A
,
Shah
SP.
 
Interpretable dimensionality reduction of single cell transcriptome data with deep generative models
.
Nat Commun
 
2018
;
9
:
2002
.

Ding
J
,
Sharon
N
,
Bar-Joseph
Z.
 
Temporal modelling using single-cell transcriptomics
.
Nat Rev Genet
 
2022
;
23
:
355
68
.

Farrell
JA
,
Wang
Y
,
Riesenfeld
SJ
 et al.  
Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis
.
Science
 
2018
;
360
:
eaar3131
.

Fleck
JS
,
Jansen
SMJ
,
Wollny
D
 et al.  
Inferring and perturbing cell fate regulomes in human brain organoids
.
Nature
 
2022
;
621
:
1
8
.

Grønbech
CH
,
Vording
MF
,
Timshel
PN
 et al.  
scVAE: variational auto-encoders for single-cell gene expression data
.
Bioinformatics
 
2020
;
36
:
4415
22
.

Han
X
,
Zhou
Z
,
Fei
L
 et al.  
Construction of a human cell landscape at single-cell level
.
Nature
 
2020
;
581
:
303
9
.

Herman
JS
,
Sagar
N
,
Gruen
D.
 
FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data
.
Nat Methods
 
2018
;
15
:
379
86
.

Huguet
G
,
Magruder
DS
,
Tong
A
 et al.  
Manifold interpolating optimal-transport flows for trajectory inference
.
Adv Neural Inf Process Syst
 
2022
;
35
:
29705
18
.

Hwang
B
,
Lee
JH
,
Bang
D.
 
Single-cell RNA sequencing technologies and bioinformatics pipelines
.
Exp Mol Med
 
2018
;
50
:
1
14
.

Ipsen
M.
 
Evolutionary reconstruction of networks
.
Function and Regulation of Cellular Systems
 
2004
;
241
9
.

Kidger
P.
 On neural differential equations. University of Oxford, 2021.

Kingma
DP
,
Welling
M.
 
An introduction to variational autoencoders
.
FNT Mach Learn
 
2019
;
12
:
307
92
.

Kobak
D
,
Berens
P.
 
The art of using t-SNE for single-cell transcriptomics
.
Nat Commun
 
2019
;
10
:
5416
.

La Manno
G
,
Soldatov
R
,
Zeisel
A
 et al.  
RNA velocity of single cells
.
Nature
 
2018
;
560
:
494
8
.

Li
Q.
 
scTour: a deep learning architecture for robust inference and accurate prediction of cellular dynamics
.
Genome Biol
 
2023
;
24
:
1
33
.

Li
WV
,
Li
JJ.
 
A statistical simulator scDesign for rational scrna-seq experimental design
.
Bioinformatics
 
2019
;
35
:
i41
i50
.

Lin
E
,
Mukherjee
S
,
Kannan
S.
 
A deep adversarial variational autoencoder model for dimensionality reduction in single-cell RNA sequencing analysis
.
BMC Bioinformatics
 
2020
;
21
:
1
11
.

Liu
B
,
Hu
X
,
Feng
K
 et al.  
Temporal single-cell tracing reveals clonal revival and expansion of precursor exhausted t cells during anti-PD-1 therapy in lung cancer
.
Nat Cancer
 
2022
;
3
:
108
21
.

Luecken
MD
,
Theis
FJ.
 
Current best practices in single-cell RNA-seq analysis: a tutorial
.
Mol Syst Biol
 
2019
;
15
:
e8746
.

Marouf
M
,
Machart
P
,
Bansal
V
 et al.  
Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks
.
Nat Commun
 
2020
;
11
:
166
.

Matsumoto
H
,
Kiryu
H
,
Furusawa
C
 et al.  
SCODE: an efficient regulatory network inference algorithm from single-cell RNA-seq during differentiation
.
Bioinformatics
 
2017
;
33
:
2314
21
.

McInnes
L
,
Healy
J
,
Melville
J.
Umap: uniform manifold approximation and projection for dimension reduction. arXiv, arXiv:1802.03426,
2018
, preprint: not peer reviewed.

Moris
N
,
Pina
C
,
Arias
AM.
 
Transition states and cell fate decisions in epigenetic landscapes
.
Nat Rev Genet
 
2016
;
17
:
693
703
.

Qiu
X
,
Ding
S
,
Shi
T.
 
From understanding the development landscape of the canonical fate-switch pair to constructing a dynamic landscape for two-step neural differentiation
.
PloS One
 
2012
;
7
:
e49271
.

Qiu
X
,
Mao
Q
,
Tang
Y
 et al.  
Reversed graph embedding resolves complex single-cell trajectories
.
Nat Methods
 
2017
;
14
:
979
82
.

Qiu
X
,
Zhang
Y
,
Martin-Rufino
JD
 et al.  
Mapping transcriptomic vector fields of single cells
.
Cell
 
2022
;
185
:
690
711
.

Risso
D
,
Perraudeau
F
,
Gribkova
S
 et al.  
A general and flexible method for signal extraction from single-cell RNA-seq data
.
Nat Commun
 
2018
;
9
:
284
.

Saelens
W
,
Cannoodt
R
,
Todorov
H
 et al.  
A comparison of single-cell trajectory inference methods
.
Nat Biotechnol
 
2019
;
37
:
547
54
.

Schiebinger
G
,
Shu
J
,
Tabaka
M
 et al.  
Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming
.
Cell
 
2019
;
176
:
928
43
.

Sun
T
,
Song
D
,
Li
WV
 et al.  
scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured
.
Genome Biol
 
2021
;
22
:
163
.

Svensson
V
,
Gayoso
A
,
Yosef
N
 et al.  
Interpretable factor models of single-cell RNA-seq via variational autoencoders
.
Bioinformatics
 
2020
;
36
:
3418
21
.

Tong
A
,
Huang
J
,
Wolf
G
 et al. Trajectorynet: a dynamic optimal transport network for modeling cellular dynamics. In: Proceedings of the 37th  International Conference on Machine Learning, ICML 2020, Virtual Event. PMLR,
2020
,
9526
36
.

Tran
D
,
Nguyen
H
,
Tran
B
 et al.  
Fast and precise single-cell data analysis using a hierarchical autoencoder
.
Nat Commun
 
2021
;
12
:
1029
.

Trapnell
C
,
Cacchiarelli
D
,
Grimsby
J
 et al.  
The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells
.
Nat Biotechnol
 
2014
;
32
:
381
6
.

Vriz
S
,
Joly
C
,
Boulekbache
H
 et al.  
Zygotic expression of the zebrafish sox-19, an hmg box-containing gene, suggests an involvement in Central nervous system development
.
Mol Brain Res
 
1996
;
40
:
221
8
.

Wang
D
,
Gu
J.
 
VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder
.
Genomics Proteomics Bioinf
 
2018
;
16
:
320
31
.

Wang
P
,
Song
C
,
Zhang
H
 et al.  
Epigenetic state network approach for describing cell phenotypic transitions
.
Interface Focus
 
2014
;
4
:
20130068
.

Warga
RM
,
Mueller
RL
,
Ho
RK
 et al.  
Zebrafish Tbx16 regulates intermediate mesoderm cell fate by attenuating FGF activity
.
Dev Biol
 
2013
;
383
:
75
89
.

Woicik
A
,
Zhang
M
,
Chan
J
 et al.  
Extrapolating heterogeneous time-series gene expression data using sagittarius
.
Nat Mach Intell
 
2023
;
5
:
699
713
.

Wolf
FA
,
Hamey
FK
,
Plass
M
 et al.  
PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells
.
Genome Biol
 
2019
;
20
:
9
.

Xiang
R
,
Wang
W
,
Yang
L
 et al.  
A comparison for dimensionality reduction methods of single-cell RNA-seq data
.
Front Genet
 
2021
;
12
:
646936
.

Yeo
GHT
,
Saksena
SD
,
Gifford
DK.
 
Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions
.
Nat Commun
 
2021
;
12
:
3222
.

Zappia
L
,
Phipson
B
,
Oshlack
A.
 
Splatter: simulation of single-cell RNA sequencing data
.
Genome Biol
 
2017
;
18
:
174
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data