Abstract

Circadian rhythm plays a critical role in regulating various physiological processes, and disruptions in these rhythms have been linked to a wide range of diseases. Identifying molecular biomarkers showing differential circadian (DC) patterns between biological conditions or disease status is important for disease prevention, diagnosis, and treatment. However, circadian pattern is characterized by three key components: amplitude, phase, and MESOR, which poses a great challenge for DC analysis. Existing statistical methods focus on detecting differential shape (amplitude and phase) but often overlook MESOR difference. Additionally, these methods lack flexibility to incorporate external knowledge such as differential circadian information from similar clinical and biological context to improve the current DC analysis. To address these limitation, we introduce a novel Bayesian hierarchical model, BayesDCirc, designed for detecting differential circadian patterns in a two-group experimental design, which offer the advantage of testing MESOR difference and incorporating external knowledge. Benefiting from explicitly testing MESOR within the Bayesian modeling framework, BayesDCirc demonstrates superior FDR control over existing methods, with further performance improvement by leveraging external knowledge of DC genes. Applied to two real datasets, BayesDCirc successfully identify key circadian genes, particularly with external knowledge incorporated. The R package “BayesDCirc” for the method is publicly available on GitHub at https://github.com/lichen-lab/BayesDCirc.

Introduction

Circadian rhythms are intrinsic, |$\sim $|24-h oscillations of behavior and physiology, synchronized with the light-dark cycle. Present in virtually all cells throughout the body, the circadian clock governs a wide array of physiological and behavioral processes, including sleep-wake cycles, hormone release, and metabolic activities [1–4]. Studies have linked the disruption in circadian gene expression to diseases including cancer [5–7], type 2 diabetes [8], sleep disorders [9], major depression disorder [10], schizophrenia [11], Alzheimer’s disease [12], metabolic disorders [13], and cardiovascular diseases [14], highlighting the significant role of molecular circadian clocks in physiological processes and disease outcomes.

Unraveling molecular circadian clock has emerged as a significant field in recent years, particularly following the 2017 Nobel Prize award for the discovery of the molecular mechanisms controlling circadian rhythm. To achieve this purpose, studies, which utilize different multi-omics data measured by microarray or next-generation sequencing to interrogate the rhythmic expression of genes at tissue or organ level, have emerged. Consequently, genome-wide gene expression profiles have been profiled, which enable the discovery of circadian genes in the postmortem brain [15, 16], skeletal muscle [17], liver [18], and blood [19]. In addition, the discovery of circadian genes has been extended to cell type-specific and single cell level with the advent of single-cell RNA sequencing [20]. Besides evaluating rhythmic gene expression, the study of molecular circadian clock is further extended to other molecular types including but not limited to DNA methylation [21], transcription factor and histone modification [22], proteomics [23], and metabolomics [24]. The growing popularity of molecular circadian studies by leveraging multi-omics has resulted in a surge of research and a subsequent influx of data in public repositories. This expanding body of knowledge presents an unparalleled opportunity to deepen our understanding of the molecular clock’s role in regulating various physiological processes and disease outcomes.

Differential circadian (DC) pattern refers to the variation in circadian rhythmicity across different conditions, such as between health and disease states, different tissues, or varying environmental contexts. This concept is crucial as it helps to elucidate how circadian rhythms adapt to physiological changes and external influences. Recently, identifying DC patterns of molecular profiles such as gene expression due to varied experimental conditions is emerging as a pivotal research area [19, 25, 26]. Accordingly, DC analysis has been performed to compare circadian rhythmicity of molecular profiles under different conditions, so researchers can identify key regulatory elements and pathways that may contribute to differential responses. For instance, the disruption of transcriptomic circadian rhythmicity patterns have been linked with aging in both human studies [15] and mouse studies [27]. The traditional approach for DC analysis is comparing circadian rhythmicity between two experimental conditions, typically relying on a hard threshold (P <0.05) for rhythmicity detection in each condition independently. Rhythmicity is then considered differential only if one condition meets this threshold while the other does not [19, 28]. While straightforward, this approach often falls short in accurately quantifying the significance level (i.e. P value). To address these limitations, several algorithms have been developed for DC analysis, including LimoRhyde [29], HANOVA, and robustDODR [30]. LimoRhyde is essentially a linear regression model that follows the strategy of cosinor regression. It models the gene expression as a combination of intercept, experimental condition, time components (sine and cosine terms), and interaction between experimental condition and time components. DC analysis is conducted by jointly testing the two coefficients of the interaction terms, corresponding to a jointly test for the change of amplitude (A) and phase (⁠|$\phi $|⁠) in the form of a cosine wave (Fig. 1A). HANOVA, another approach for DC analysis, is based on harmonic regression, to test the change in amplitude and phase as quantified by regression coefficient. Unlike HANOVA, robustDODR further addresses the potential outliers in the gene expression by incorporating a robust rank-based dispersion measure into the statistical test while still focusing on testing both the amplitude and phase change.

Illustration of DC pattern of terminologies and graphical representation of the proposed method BayesDCirc. (A) Illustration of a Cosine wave fitting and explanation of terminologies; (B) differential shape (amplitude and phase); (C) differential MESOR; (D) graphical representation of BayesDCirc. The shaded nodes indicate observed variables. The unshaded nodes indicate circadian parameters. The dashed nodes indicate prefixed hyper-parameters. The arrows show data generative process.
Figure 1

Illustration of DC pattern of terminologies and graphical representation of the proposed method BayesDCirc. (A) Illustration of a Cosine wave fitting and explanation of terminologies; (B) differential shape (amplitude and phase); (C) differential MESOR; (D) graphical representation of BayesDCirc. The shaded nodes indicate observed variables. The unshaded nodes indicate circadian parameters. The dashed nodes indicate prefixed hyper-parameters. The arrows show data generative process.

Although these widely applicable methods show promise in DC analysis, they cannot detect DC patterns driven by MESOR, representing the average value around which a rhythmic biological variable fluctuates over time. In addition, these approaches, which perform DC analysis, cannot incorporate external knowledge such as DC genes from related tissues or studies, to improve the current analysis. Given the limitation of the existing frequentist-based methods, there is a significant need for a Bayesian statistical approach that can detect the DC patterns driven by MESOR and integrate the external knowledge, which will improve the performance for detecting DC genes. Moreover, a Bayesian method can improve the DC analysis by borrowing information across genome-wide transcriptome data, particularly when the sample size in the experimental design is moderate. Unlike frequentist methods, Bayesian methods provide full probability distributions for parameter estimates rather than just point estimates, which offers a more comprehensive understanding of the uncertainty surrounding model parameters and enable probabilistic inferences. Despite these advantages, a Bayesian method for DC analysis has yet to be fully developed.

To address these research gaps and fully leverage the advantages of Bayesian approaches, we propose a novel Bayesian method, named BayesDCirc, for detecting DC genes in a two-group experimental design. BayesDCirc empowers DC analysis in several aspects: (i) BayesDCirc enables the detection of DC genes driven by MESOR by explicitly testing MESOR, a factor overlooked by existing methods; (ii) BayesDCirc can incorporate external knowledge, such as the previously identified DC genes, into its modeling framework, which strengthen the selection of DC genes in this study; (iii) BayesDCirc adopts conjugate priors for parameters, resulting in closed form for posterior distributions, and uses efficient Gibbs sampling with a spike-and-slab prior to control the false discovery rate (FDR) in selecting DC genes; (iv) the developed R package “BayesDCirc” is publicly available on GitHub, to facilitate its application in the research field. Consequently, BayesDCirc demonstrates superior performance in comprehensive simulation studies in terms of FDR control. Additionally, the performance of BayesDCirc is further improved by incorporating external knowledge of DC genes. BayesDCirc is further applied to two real datasets and successfully identify more DC genes benefiting from detecting differential MESOR and incorporating external knowledge.

Method

We developed a novel Bayesian hierarchical model denoted BayesDCirc (A Bayesian hierarchical model for detecting differential circadian pattern) in a two experimental groups from genome-wide transcriptomic data. BayesDCirc enables the detection of DC patterns, which include differential shape (amplitude and phase) and differential MESOR, in a single framework. The decision framework for identifying DC patterns is based on FDR control. Besides, if external knowledge, such as the proportion of DC patterns, in a similar experimental design, is available, our method can improve the detection of DC patterns. The motivation, annotation, and statistical inference are detailed in the following sections.

Notations for a cosinusoidal wave fitting

Our model postulates that the connection between gene expression value and circadian time can be represented by a cosine wave as depicted in Fig. 1A, where |$y$| denotes the gene expression value and |$t$| represents the circadian time. The parameters in the conisor model include |$A$| as the amplitude, |$M$| for the MESOR (Midline Estimating Statistic of Rhythm), and |$\omega $| for the wave’s frequency, where |$\omega = \frac{2\pi }{Period}$|⁠. To mimic the diurnal cycle, we set the period to 24 h by default. The phase of the cosine wave, or the time of the peak (⁠|$t_{p}$|⁠), is indicated by |$\phi $|⁠. For simplicity, we will omit the unit “hours” for period, phase, and other related quantities throughout the rest of the sections. Accordingly, expression level of one circadian gene for Group |$c$| is represented as

(1)

where |$c$| indicates the experimental group (⁠|$c=1,2$|⁠); |$A_{c}$|⁠, |$\phi _{c}$|⁠, and |$M_{c} $| are amplitude, phase, and MESOR for Group |$c$|⁠; |$i (1\le i \le n)$| indicates subject |$i$| and |$n$| is the number of subjects; |$\varepsilon _{ci}$| is the error term for subject |$i$|⁠, |$\varepsilon _{ci} \sim \mathit{N}(0, \sigma ^{2})$|⁠, which are assumed to be identically and independently distributed. For DC analysis, we are comparing cosinusoidal wave curves from two experimental groups. Particularly, two common DC patterns of interest are

  • Differential shape: |$\Omega _{S}$| versus |$\Omega _{\overline{S}}$| (Fig. 1B).
  • Differential MESOR: |$\Omega _{M}$| versus |$\Omega _{\overline{M}}$| (Fig. 1C).

Bayesian hierarchical model

In the two-group comparison, we rewrite the Equation 1 after the Triangular transformation

(2)

where |$E_{cg}=A_{c}\cos (\omega \phi _{c})$|⁠, |$F_{cg}=A_{c}\sin (\omega \phi _{c})$|⁠, |$x_{i}=\cos (\omega t_{i})$|⁠, and |$z_{i}=\sin (\omega t_{i})$|⁠. By denoting |$E_{2g}=E_{1g}+\mu _{Eg}$|⁠, |$ F_{2g}=F_{1g}+\mu _{Fg}$|⁠, and |$ M_{2g}=M_{1g}+\mu _{Mg}$|⁠. Comparing with Group 1, the equation for Group 2 can be rewritten as |$Y_{2gj}= (E_{1g} +\mu _{Eg} ) X_{j}+ (F_{1g} + \mu _{Fg} ) Z_{j}+ (M_{1g} + \mu _{Mg}) +\varepsilon _{2gj}$|⁠, where |$\mu _{Eg} $|⁠, |$\mu _{Fg}$|⁠, and |$\mu _{Mg}$| are denoted as “differential coefficients.” DC patterns can be determined by testing whether the differential coefficients are zero. Specifically, |$\mu _{Eg} $| and |$\mu _{Fg}$| jointly represent the differential shape and |$\mu _{Mg}$| represents differential MESOR. We further introduce |$I_{g}$| as an indicator variable for measuring differential shape, where |$I_{g} = 1$| indicates that the gene |$g$| has differential shape (i.e. |$\mu _{Eg} \ne 0$| or |$\mu _{Fg} \ne 0$|⁠) and |$I_{g} = 0$| indicates that the gene |$g$| has the same circadian shape (i.e. |$\mu _{Eg} = 0$| and |$\mu _{Fg} = 0$|⁠). Similarly, |$L_{g}$| is another indicator variable for measuring differential MESOR, with |$L_{g}=1$| indicating MESOR difference (i.e. |$\mu _{Mg} \ne 0$|⁠) and |$L_{g}=0$| indicating the same MESOR (i.e. |$\mu _{Mg} \ne 0$|⁠). To facilitate the modeling of the differential coefficients, we adopt the spike-and-slab prior, which combines a Gaussian distribution (centered at 0, known as the “slab”) with a point mass at 0 (known as the “spike”)[31]. The “spike” represents a probability distribution concentrated at or near zero, promoting the differential coefficients to be exactly or nearly zero. The “slab” on the other hand represents a broader probability distribution encompassing a range of nonzero values, thereby allowing the differential coefficients to have nonzero coefficients. By applying a spike-and-slab prior to the differential coefficients, one can directly estimate the likelihood of each differential coefficient being nonzero. Additionally, the spike-and-slab prior offers computational advantages, including an analytical form for the posterior, which has been detailed in the methods section. For |$I_{g}$| and |$L_{g}$|⁠, we set

where |$\tau _{1}^{2}$| and |$\tau _{0}^{2}$| represent the variance of slab component and spike component, respectively. If |$\tau _{1}^{2}> \tau _{0}^{2}$| and |$\tau _{0}^{2} \rightarrow 0$|⁠, |$\mu _{Eg} | I_{g} = 1$| and |$\mu _{Fg} | I_{g} = 1$| are likely to be nonzero, and |$\mu _{Eg} | I_{g} = 0$| and |$\mu _{Fg} | I_{g} = 0$| is likely to be close to zero. By applying this spike-and-slab prior, genes showing DC shapes are likely to be selected (i.e. |$I_{g} = 1$|⁠). The same reasoning is using |$L_{g}$| and spike-and-slab prior to select genes showing differential MESOR. We further assume independent Bernoulli distribution as the prior distribution for |$I_{g}$| and |$L_{g}$| as

where |$p_{I}$| and |$p_{L}$| are the proportion of genes showing differential shape and MESOR, respectively.

We apply independent conjugate priors to each component in |$\mathbf{\Theta }$|=(⁠|$E_{1g}$|⁠, |$F_{1g}$|⁠, |$M_{1g}$|⁠, |$\sigma _{1g}^{2}$|⁠, |$\sigma _{2g}^{2}$|⁠, |$p_{I}$|⁠, |$p_{L}$|⁠, |$\tau _{1}^{2}$|⁠, |$\tau _{0}^{2}$|⁠, |$\delta _{1}^{2}$|⁠, |$\delta _{0}^{2}$|⁠), as follows: |$E_{1g} \sim \mbox{N} (\mu _{E}, \sigma _{E_{1}}^{2})$|⁠; |$F_{1g} \sim \mbox{N} (\mu _{F}, \sigma _{F_{1}}^{2})$|⁠; |$M_{1g} \sim \mbox{N} (\mu _{M}, \sigma _{M_{1}}^{2})$|⁠; |$\sigma _{1g}^{2} \sim \mbox{Inv}\Gamma (a_{\sigma _{1}}, b_{\sigma _{1}})$|⁠; |$\sigma _{2g}^{2} \sim \mbox{Inv}\Gamma (a_{\sigma _{2}}, b_{\sigma _{2}})$|⁠; |$p_{I} \sim \mbox{Beta}(a_{p_{I}}, b_{p_{I}})$|⁠; |$p_{L} \sim \mbox{Beta}(a_{p_{L}}, b_{p_{L}})$|⁠; |$\tau _{1}^{2} \sim \mbox{Inv}\Gamma (a_{\tau _{1}}, b_{\tau _{1}})$|⁠; |$\tau _{0}^{2} \sim \mbox{Inv}\Gamma (a_{\tau _{0}}, b_{\tau _{0}})$|⁠; |$\delta _{1}^{2} \sim \mbox{Inv}\Gamma (a_{\delta _{1}}, b_{\delta _{1}})$|⁠; |$\delta _{0}^{2} \sim \mbox{Inv}\Gamma (a_{\delta _{0}}, b_{\delta _{0}})$|⁠. Here, |$\mu _{E_{1}}$|⁠, |$\sigma _{E_{1}}^{2}$|⁠, |$\mu _{F_{1}}$|⁠, |$\sigma _{F_{1}}^{2}$|⁠, |$\mu _{M_{1}}$|⁠, |$\sigma _{M_{1}}^{2}$|⁠, |$a_{\sigma _{1}}$|⁠, |$b_{\sigma _{1}}$|⁠, |$a_{\sigma _{2}}$|⁠, |$b_{\sigma _{2}}$|⁠, |$a_{p_{I}}$|⁠, |$b_{p_{I}}$|⁠, |$a_{p_{L}}$|⁠, |$b_{p_{L}}$|⁠, |$a_{\tau _{1}}$|⁠, |$b_{\tau _{1}}$|⁠, |$a_{\tau _{0}}$|⁠, |$b_{\tau _{0}}$|⁠, |$a_{\delta _{1}}$|⁠, |$b_{\delta _{1}}$|⁠, |$a_{\delta _{0}}$|⁠, |$b_{\delta _{0}}$| are hyperparameters. The graphical representation of BayesDCirc can be found in Fig. 1D. The procedure of detailed Gibbs sampling can be found in the Supplementary Notes.

Decision framework

We apply FDR to declare DC patterns. We will use differential shape as an example to illustrate the procedure to calculate FDR, and the same reasoning holds for calculating FDR for differential MESOR. We denote |$P^{I}_{g} = \mbox{Pr} (g \in \Omega _{\bar{S}}| I_{g} = 1)$| as the probability of gene |$g$| not showing differential shape given |$I_{g}=1$|⁠, which is also referred as the local FDR [32]. A gene |$g$| is declared differential shape if |$P^{I}_{g} \leq \eta $| given a threshold |$\eta $| (⁠|$0 < \eta <1$|⁠) and the expected number of false discoveries is |$\sum _{g} P^{I}_{g} \chi (P^{I}_{g} \le \eta )$|⁠, where |$ \chi $| is the indicator function with value 1 if the inner expression is true, and 0 otherwise. Therefore, the expected FDR is |$FDR_{I}(\eta ) = \cfrac{\sum ^{G}_{g=1} P^{I}_{g} \cdot \chi (P^{I}_{g}\leq \eta )}{max(1,\sum ^{G}_{g=1} \chi (P^{I}_{g}\leq \eta ))}$| [33]. In practice, |$P^{I}_{g}$| is estimated as |$1-\frac{1}{N}\sum ^{N_{iteration}-N_{burn in}}_{k =N_{burn in}+1} I_{g} (k)$|⁠, where |$N_{burn in}$| is the number of burn-in samples, |$N_{iteration}$| is the total number of Gibbs sampling iterations and |$k$| is the index of Gibbs sampling iteration. Similarly, we denote |$P^{L}_{g} = \mbox{Pr} (g \in \Omega _{\bar{M}} | L_{g} = 1)$|⁠, as the probability of gene |$g$| not showing differential MESOR given |$L_{g}=1$|⁠. We estimate |$P^{L}_{g}$| in the same way as |$P^{I}_{g}$|⁠. Throughout this manuscript, we set |$ N_{iteration} = 10\,000$|⁠, |$ N_{burnin} = 5000$|⁠, and we use a nominal FDR level of 0.05 to declare differential shape or MESOR unless otherwise specified.

Hyperparameter selection

For hyperparameters of spike-and-slab prior, which include |$\tau _{1}^{2}$|⁠, |$\tau _{0}^{2}$|⁠, |$\delta _{1}^{2}$|⁠, and |$\delta _{0}^{2}$| requiring |$\tau _{1}^{2}> \tau _{0}^{2}$| and |$\delta _{1}^{2}> \delta _{0}^{2}$|⁠, we adopt a data-driven approach to determine these parameters. Specifically, we fit a linear model for each gene in each group, resulting a total of |$2G$| linear models |$y_{cgi} = \hat{E_{cg}} x_{i} + \hat{F_{cg}} z_{i} + \hat{M_{cg}}, c=1,2$| to obtain initial estimates of these parameters for each gene in each group. Based on the posterior distribution of these parameters, we calculate |$T^\tau _{g} = (\hat{E_{1g}} - \hat{E_{2g}})^{2} + (\hat{F_{1g}} - \hat{F_{2g}})^{2}$| and |$T^\delta _{g} = (\hat{M_{1g}} - \hat{M_{2g}})^{2}$|⁠, which are used to select the top genes for estimating the hyperparameters for |$\tau _{1}^{2}$| and |$\tau _{0}^{2}$|⁠, and |$\delta _{1}^{2}$| and |$\delta _{0}^{2}$|⁠. By ranking |$T_{g}$| in a decreasing order, we select top |$p$| percent (default |$p$|=5%) of all |$G$| genes denoted as |$T_{top}$|⁠, we then set |$a_{\tau _{1}} = pG$|⁠, |$a_{\delta _{1}} = p\frac{G}{2}$|⁠, |$b_{\tau _{1}} = \sum _{g\in T^\tau _{top}} \frac{1}{2} T^\tau _{g}$|⁠, |$b_{\delta _{1}} = \sum _{g\in T^\delta _{top}} \frac{1}{2} T^\delta _{g}$|⁠. Similarly, |$a_{\tau _{0}}$|⁠, |$a_{\delta _{0}}$|⁠, |$b_{\tau _{0}}$|⁠, and |$b_{\delta _{0}}$| are determined by the bottom genes denoted as |$T_{bottom}$| such that |$a_{\tau _{0}} = pG$|⁠, |$a_{\delta _{0}} = p\frac{G}{2}$|⁠, |$b_{\tau _{0}} = \sum _{g\in T^\tau _{ bottom }} \frac{1}{2} T^\tau _{g}$|⁠, |$b_{\delta _{0}} = \sum _{g\in T^\delta _{ bottom }} \frac{1}{2} T^\delta _{g}$|⁠. For other hyperparameters, we assign noninformative prior such as |$\mu _{E_{1}} = 0$|⁠; |$\sigma _{E_{1}}^{2} = 100$|⁠; |$\mu _{F_{1}} = 0$|⁠; |$\sigma _{F_{1}}^{2} = 100$|⁠; |$\mu _{M_{1}} = 0$|⁠; |$\sigma _{M_{1}}^{2} = 100$|⁠; |$a_{\sigma _{1}} = b_{\sigma _{1}} = 0.001$|⁠; |$a_{\sigma _{2}} = b_{\sigma _{2}} = 0.001$|⁠; |$a_{p_{I}}=b_{p_{I}}=a_{p_{L}}=b_{p_{L}}=1$|⁠.

Incorporating external knowledge

BayesDCirc can incorporate external information, which is the proportion of genes showing DC patterns from a similar study, as hyperparameters in the hierarchical model to improve the analysis in this study. When such external knowledge is available, we assume |$\Psi _{I}$| as the gene set showing DC shapes (⁠|$\Psi _{I} = \{g: 1 \le g \le G$|⁠) and |$\Psi _{L}$| as the gene set showing differential MESOR from the prior knowledge (⁠|$\Psi _{L} = \{g: 1 \le g \le G$|⁠), while |$\Psi _{\bar{I}}$| and |$\Psi _{\bar{L}}$| are the prior collection of non-DC genes. Consequently, |$|\Psi _{I}| = G - |\Psi _{\bar{I}}|$| and |$|\Psi _{L}| = G - |\Psi _{\bar{L}}|$|⁠, where |$G$| is total number of genes, |$|\Psi _{I}|$|⁠, |$|\Psi _{L}|$|⁠, |$|\Psi _{\bar{I}}|$|⁠, and |$|\Psi _{\bar{L}}|$| are the numbers of genes in |$\Psi _{I}$|⁠, |$\Psi _{L}$|⁠, |$\Psi _{\bar{I}}$|⁠, and |$\Psi _{\bar{L}}$|⁠, respectively. In the hierarchical model, |$p_{I}$| and |$p_{L}$| represent the proportion of genes showing different circadian shape and different MESOR, respectively, which are assumed to follow a beta distribution such as |$p_{I} \sim \mbox{Beta}(a_{p_{I}}, b_{p_{I}})$| and |$p_{L} \sim \mbox{Beta}(a_{p_{L}}, b_{p_{L}})$|⁠. In the default setting, the hyperparameters |$a_{p_{I}}, b_{p_{I}}$| and |$a_{p_{L}}, b_{p_{L}}$| are assigned the noninformative prior with value of 1, which assumes a uniform distribution of |$p_{I}$| or |$p_{L}$| across all genes. To incorporate the external knowledge, we adopt different hyperparameters of the beta distribution for gene set in |$\Psi _{I}$| and |$\Psi _{\bar{I}}$|⁠, respectively, and similarly for |$\Psi _{L}$| and |$\Psi _{\bar{L}}$|⁠. For genes in |$\Psi _{\bar{l}}$| and |$\Psi _{\bar{L}}$|⁠, we will stick with the noninformative prior with value of 1. For genes in |$\Psi _{I}$| and |$\Psi _{L}$|⁠, we design priors for |$p_{I}$| and |$p_{L}$| so that these genes can borrow information from the prior knowledge, enhancing their likelihood of being classified as DC genes. Taking |$\Psi _{I}$| as the example, we can assume |$p_{I}$| of all genes in |$\Psi _{I}$| follow |$\mbox{Beta}(|\Psi _{I}|*(1-\alpha ), |\Psi _{I}|*\alpha )$|⁠, where |$\alpha $| is the nominal FDR level, default 0.05. The expected values for |$p_{I}$| is |$1-\alpha = \frac{|\Psi _{I}|*(1-\alpha )}{|\Psi _{I}|*(1-\alpha ) + |\Psi _{I}|*\alpha }$|⁠, aligning with the observed proportion of DC genes. The same principle is applied to the hyperparameter settings for |$p_{L}$| of all genes in |$\Psi _{L}$|⁠.

Simulation studies

We have developed comprehensive simulation settings to rigorously evaluate the performance of our proposed methods together with competing methods. BayesDCirc was applied to two classes of simulated data. The first set of simulation setting allows the DC pattern is driven by the change of one parameter at a time (e.g. amplitude |$A$|⁠, phase |$\phi $|⁠, and MESOR |$M$|⁠) between two groups at various levels of magnitude, sample size, noise level, and violation of Gaussian assumption. The second set of simulation setting assesses the performance of detecting DC patterns driven by the change of arbitrary two parameters between the two groups. In both simulation settings, performance evaluation focused on FDR control and power analysis.

Performance measure

Performance is evaluated based on FDR control and power analysis. FDR is defined as |$FDR(\eta ) = \cfrac{\sum ^{G}_{g=1} P_{g} \cdot \chi (P_{g}\leq \eta )}{max(1,\sum ^{G}_{g=1} \chi (P_{g}\leq \eta ))}$|⁠, where |$P_{g}$| is obtained from Gibbs sampling as mentioned in Section 2.3. We denote the FDR for detecting genes showing differential shape (amplitude |$A$| and phase |$\phi $|⁠) as |$FDR_{I}$| and genes showing differential MESOR (⁠|$M$|⁠) as |$FDR_{L}$|⁠. A gene |$g$| is declared differential shape or MESOR if |$FDR_{I}$| or |$FDR_{L}$| is less than the nominal level |$\alpha $| (default 0.05). Power is defined as |$Power=\cfrac{\sum _{g \in \Omega } \chi (FDR^{g} \leq \alpha )}{|\Omega |}$|⁠, where |$\Omega $| is the collection of genes with either shape change or MESOR change, |$\sum _{g \in \Omega } I(FDR^{g} \leq \alpha )$| is the number of genes with either shape change or MESOR change, identified by BayesDCirc on a nominal level |$\alpha $|⁠. Accordingly, we have corresponding power |$Power_{S}$| and |$Power_{M}$|⁠.

Single-parameter simulation

Simulation setup

Without loss of generality, we sample circadian time |$t_{1}$| and |$t_{2}$| from uniform distribution Uniform(0,24) and let the total number of genes as |$G=1000$|⁠. We set the baseline parameters for the two groups as |$ A_{1g} = A_{2g} = 5 $|⁠; |$\phi _{1g}=\phi _{2g} \sim \mbox{uniform} (0, 24)$|⁠; |$M_{1g}=M_{2g} \sim \mbox{uniform} (3, 7)$|⁠; |$\sigma _{1}^{2} = \sigma _{2}^{2} =0.25$|⁠. We denote |$p_{S} (0\leq p_{S} \leq 1)$| and |$p_{M} (0\leq p_{M} \leq 1)$| as the proportion of genes showing differential shape and differential MESOR between the two groups, respectively. For genes showing DC patterns, we define the difference of circadian parameters as |$\mu _{A} = A_{2} - A_{1}$|⁠, |$\mu _\phi = \phi _{2} - \phi _{1}$|⁠, and |$\mu _{M} = M_{2} - M_{1}$|⁠. We will vary the magnitude level of |$\mu _{A}$|⁠, |$\mu _\phi $|⁠, |$\mu _{M}$| to evaluate their impact on the model performance. After obtaining |$t_{c}, A_{c}, \phi _{c}, M_{c}, c=1,2$|⁠, we generate the gene expression |$y_{c}$| by equations 1. We focus on investigating the impact of |$\mu _{A}$|⁠, |$\mu _\phi $|⁠, |$\mu _{M}$| and sample size on the detection of DC patterns. Additionally, we evaluate the robustness of proposed model by varying different noise level and violation of Gaussian assumption.

We start by examining the impact of magnitude of the change in circadian parameters between the two groups, represented by |$\mu _{A}$|⁠, |$\mu _\phi $|⁠, and |$\mu _{M}$|⁠. Specifically, to evaluate |$\mu _{A}$|⁠, we set |$\mu _{A} = 2$|⁠, |$\mu _\phi = 0$|⁠, |$\mu _{M} = 0$| for DC genes, with |$p_{S} = 0.1 $| and |$p_{M} = 0$|⁠. For the assessment of |$\mu _\phi $|⁠, we set |$\mu _{A} = 0$|⁠, |$\mu _\phi = 2$|⁠, |$\mu _{M} = 0$| for DC genes, with |$p_{S} = 0.1 $| and |$p_{M} = 0$|⁠. When evaluating |$\mu _{M}$|⁠, we set |$\mu _{A} = 0$|⁠, |$\mu _\phi = 0$|⁠, |$\mu _{M} = 2$| for DC genes, with |$p_{S} = 0$| and |$p_{M} = 0.1 $|⁠. In each scenario, we vary each of the three magnitude from small (1) to medium (2) to high (3). To further examine the impact of sample size on model performance, we fix |$\mu _{A}$|⁠, |$\mu _\phi $|⁠, |$\mu _{M}$| to the value of 2 and vary the sample size from small (12) to medium (24) to high (48). Additionally, we vary the noise level from small (0.125) to medium (0.25) to high (0.5), and violation of Gaussian assumption with three levels of degree freedom from small (3), medium (5) in t-distribution and high (Inf) corresponding to a Gaussian distribution. Our BayesDCirc is compared with three other competing methods (i) LimoRhyde (ii) HANOVA (iii) robustDODR in terms of FDR control and power. For each test, we set the nominal level of FDR at 0.05.

FDR control and power analysis

We evaluate BayesDCirc together with three competing methods HANOVA, Limorhyde, and robustDODR in detecting genes with DC patterns driven by one parameter (Amplitude |$A$|⁠, Phase |$\phi $|⁠, and MESOR |$M$|⁠) at a time. FDR is controlled at a nominal |$\alpha $| level 5%. For BayesDCirc, we report |$FDR_{I}$| to measure the performance in detecting differential shape (amplitude |$A$| and phase |$\phi $|⁠) and |$FDR_{L}$| for MESOR |$M$|⁠. For other competing methods, which can only detect differential shape, we use |$FDR_{I}$| for measuring the performance. Our analysis focuses on two main factors in the simulation: (1) varying magnitudes of parameter difference between the two groups (Fig. 2A) and (2) different sample sizes per group (Fig. 2B). In each scenario, the simulation is repeated 10 times and the average of FDR is reported. Overall, BayesDCirc controls the FDR around the nominal level, ranging between 0.033 and 0.079 for amplitude, 0.057 to 0.08 for phase and 0.067 to 0.096 for MESOR.

FDR control analysis for single-parameter simulation at nominal $\alpha $ level 5%. BayesDCirc along with three competing methods HANOVA, limorhyde, and robustDODR in detecting genes with DC patterns driven by one parameter ($A$, $\phi $, $M$) at a time. (A) Simulations settings included varying magnitudes for each parameter with small (S), medium (M), and large (L) magnitude differences between two groups, corresponding to 1, 2, and 3. (B) Simulation settings included varying different sample sizes under different parameter configurations with small (S), medium (M), and large (L), corresponding to small ($n$=12), medium ($n$=24), and large ($n$=48). Simulation in each scenario is repeated 10 times, mean FDR is reported and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to $A$, $\phi $, and $M$.
Figure 2

FDR control analysis for single-parameter simulation at nominal |$\alpha $| level 5%. BayesDCirc along with three competing methods HANOVA, limorhyde, and robustDODR in detecting genes with DC patterns driven by one parameter (⁠|$A$|⁠, |$\phi $|⁠, |$M$|⁠) at a time. (A) Simulations settings included varying magnitudes for each parameter with small (S), medium (M), and large (L) magnitude differences between two groups, corresponding to 1, 2, and 3. (B) Simulation settings included varying different sample sizes under different parameter configurations with small (S), medium (M), and large (L), corresponding to small (⁠|$n$|=12), medium (⁠|$n$|=24), and large (⁠|$n$|=48). Simulation in each scenario is repeated 10 times, mean FDR is reported and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to |$A$|⁠, |$\phi $|⁠, and |$M$|⁠.

Specifically, when amplitude difference between the two groups is small, BayesDCirc is slightly conservative for detecting differential amplitude, likely due to its relatively lower power under small magnitude. Limorhyde achieves a similar level of FDR control with FDR close the nominal level and demonstrates higher power than BayesDCirc (0.94 versus 0.84) (Figure S1). robustDODR and HANOVA, on the other hand, show significant FDR inflation and lower power (0.71 for robustDODR and 0.74 for HANOVA). As the magnitude increases, both BayesDCirc and Limorhyde become comparable. BayesDCirc starts with a slightly inflated FDR (0.079) with a moderate magnitude to an FDR (0.06) with a large magnitude, close to the nominal level, while Limorhyde maintains a consistently conserved FDR around 0.04, regardless of the magnitude. In contrast, robustDODR and HANOVA have a severely inflated FDR and lower power than BayesDCirc and Limorhyde (Figure S1) across all levels of magnitude. When phase differences are present between the two groups, both BayesDCirc and Limorhyde continue to outperform other methods in FDR control, with BayesDCirc’s FDR gradually approaching the nominal level as magnitude increases, in contrast to Limorhyde’s stable FDR. All methods achieve nearly identical power levels in this setting, which is unsurprising given that they are all designed to jointly test amplitude and phase (Figure S1). If the DC pattern is driven by differential MESOR, BayesDCirc presents a robust performance in both FDR control and power across all levels of magnitude. BayesDCirc’s FDR remains close to the nominal level and approaches it as magnitude increases while at the same time maintains a high power (Figure S1). In contrast, other methods, which cannot detect differential MESOR, are not included in the comparison. When we evaluate each method under varying sample sizes and observe similar trends (Fig. 2B, Figure S2), as well as under different noise levels and deviations from Gaussian distribution assumptions, the results remain consistent across scenarios (Figures S3–S6).

Multiple-parameter simulation

Simulation setup

In multiple-parameter simulation setting, we will evaluate FDR control at nominal |$\alpha $| level of 5% for BayesDCirc, when the DC pattern is driven by two parameters (⁠|$A$| & |$\phi $|⁠, |$A$| & |$M$|⁠, |$\phi $| & |$M$|⁠) simultaneously. All parameters are set the same as single-parameter setting, with variations in the combination of two parameters. For instance, we assign 10% genes to exhibit change in both |$A$| and |$\phi $| and vary magnitude of |$A$| and |$\phi $| at different levels (⁠|$\mu _{A} =\mu _\phi = 1,2,3$|⁠) with |$\mu _{M} = 0$|⁠. When both |$A$| and |$M$| changes, we set 10% genes with concurrent change in both parameters, varying magnitude of both parameters at different levels (⁠|$\mu _{A} =\mu _{M} = 1,2,3$|⁠). Similarly, we set 10% genes with change in both |$\phi $| and |$M$| at different magnitude (⁠|$\mu _\phi =\mu _{M} = 1,2,3$|⁠).

We compare BayesDCirc to three competing methods HANOVA, Limorhyde, and robustDODR in detecting genes with the change of shape parameters (⁠|$A$| and |$\phi $|⁠) simultaneously as all these methods are designed for this purpose. Our results show that when multiple shape parameter change, BayesDCirc performs comparably to Limorhyde, consistently controlling the FDR at the nominal level across different groups, including variations in magnitude, sample size, noise, and violation of Gaussian distribution. This trend aligns with single-parameter simulations discussed above, where only |$A$| or |$\phi $| changes. Specifically, while the FDR is slight conservative with the sample size is small, as the magnitude increases, the noise decreases, and the t-distribution approximates to normal distribution, the observed FDR is closer to the nominal level (Fig. 3A). Additionally, the power of different methods remains comparable (Figure S7A). Since only BayesDCirc can detect genes with differential MESOR, we report the performance of BayesDCirc exclusively for change involving differential MESOR (Fig. 3B). Specifically, we report FDR for differential shape (⁠|$FDR_{I}$|⁠) and for differential MESOR (⁠|$FDR_{L}$|⁠) separately, and use |$FDR_{I}$| for both amplitude and phase. When multiple parameters change and include MESOR, BayesDCirc demonstrate robust FDR control close to the nominal level, whether changes occur in both |$A$| and |$M$| or |$\phi $| and |$M$|⁠. Together, all the evidence indicate that BayesDCirc is robust in detecting genes showing DC patterns either differential shape or differential MESOR. Since the underlying DC patterns and the level of parameter change in real data application are unknown, the comprehensive simulation studies demonstrate that BayesDCirc is an optimal solution for the DC analysis.

FDR control analysis for multiple-parameter simulation at nominal $\alpha $ level 5%. (A) BayesDCirc along with three competing methods HANOVA, Limorhyde, and robustDODR in detecting genes with DC patterns driven by shape parameters ($A$ & $\phi $) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both shape parameters $A$ & $\phi $. (B) BayesDCirc in detecting genes with DC patterns driven by two parameters including MESOR ($A$ & $M$ and $\phi $ & $M$) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both $M$ & $A$ and $\phi $ & $M$. Simulation in each scenario is repeated 10 times, mean FDR is reported, and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to $A$, $\phi $, and $M$.
Figure 3

FDR control analysis for multiple-parameter simulation at nominal |$\alpha $| level 5%. (A) BayesDCirc along with three competing methods HANOVA, Limorhyde, and robustDODR in detecting genes with DC patterns driven by shape parameters (⁠|$A$| & |$\phi $|⁠) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both shape parameters |$A$| & |$\phi $|⁠. (B) BayesDCirc in detecting genes with DC patterns driven by two parameters including MESOR (⁠|$A$| & |$M$| and |$\phi $| & |$M$|⁠) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both |$M$| & |$A$| and |$\phi $| & |$M$|⁠. Simulation in each scenario is repeated 10 times, mean FDR is reported, and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to |$A$|⁠, |$\phi $|⁠, and |$M$|⁠.

FDR control after incorporating external knowledge

Comparing BayesDCirc without and with incorporating external knowledge for FDR control analysis for single-parameter simulation at nominal $\alpha $ level 5% (BayesDCirc-NP versus BayesDCirc-P) in detecting genes with DC patterns driven by one parameter ($A$, $\phi $, $M$) at a time. (A) Simulations settings included varying magnitudes for each parameter with small (S), medium (M), and large (L) magnitude differences between two groups, corresponding to 1,2, and 3. (B) Simulations settings included varying different sample sizes under different parameter configurations with small (S), medium (M), and large (L), corresponding to small ($n$=12), medium ($n$=24), and large ($n$=48). Simulation in each scenario is repeated 10 times, mean FDR is reported, and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to $A$, $\phi $, and $M$.
Figure 4

Comparing BayesDCirc without and with incorporating external knowledge for FDR control analysis for single-parameter simulation at nominal |$\alpha $| level 5% (BayesDCirc-NP versus BayesDCirc-P) in detecting genes with DC patterns driven by one parameter (⁠|$A$|⁠, |$\phi $|⁠, |$M$|⁠) at a time. (A) Simulations settings included varying magnitudes for each parameter with small (S), medium (M), and large (L) magnitude differences between two groups, corresponding to 1,2, and 3. (B) Simulations settings included varying different sample sizes under different parameter configurations with small (S), medium (M), and large (L), corresponding to small (⁠|$n$|=12), medium (⁠|$n$|=24), and large (⁠|$n$|=48). Simulation in each scenario is repeated 10 times, mean FDR is reported, and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to |$A$|⁠, |$\phi $|⁠, and |$M$|⁠.

BayesDCirc provides the flexibility to incorporate external knowledge such as genes with DC patterns identified from other studies with similar clinical or biological context. To include the external knowledge in the simulation, we randomly select 50% genes showing DC patterns as gene set |$\Psi _{I}$|⁠, with the remaining genes forming gene set |$\Psi _{\bar{I}}$| (see section 2.5). In the Bayesian hierarchical model of BayesDCirc, the proportion of genes |$p_{I}$| and |$p_{L}$| showing DC patterns, follows a beta distribution with default hyperparameter set as noninformative prior with value 1.To include external knowledge, |$p_{I}$| of each gene in |$\Psi _{I}$| is set as |$p_{I} \sim \mbox{Beta}(|\Psi _{I}|*(1-\alpha ), |\Psi _{I}|*\alpha )$|⁠, where the nominal level |$\alpha $| is set as 0.05 by default. For genes in |$\Psi _{\bar{l}}$|⁠, we stick with the noninformative prior with value of 1. The same principle is applied to the prior setting for |$p_{L}$| of genes in |$\Psi _{L}$| and |$\Psi _{\bar{L}}$|⁠.

We compare BayesDCirc without and with incorporating external knowledge, denoted as BayesDCirc-NP and BayesDCirc-P, in the one-parameter setting. Overall, adding external knowledge improves the FDR control, with observed FDR from BayesDCirc-P aligning more closely with the nominal level than that from BayesDCirc-NP for each parameter change. This improvement is seen across nearly all levels of magnitude, sample size, noise, degree of freedom of t-distribution, with BayesDCirc-P showing reduced FDR inflation compared to BayesDCirc-NP (Fig. 4, S8, S9). We further extend the comparison to the multiple-parameter setting (Fig. 5), where the DC pattern is driven by both shape and MESOR parameters (⁠|$A$| & |$\phi $|⁠, |$A$| & |$M$|⁠, |$\phi $| & |$M$|⁠). Similar to single-parameter setting, BayesDCirc-P reduces FDR inflation closer to the nominal level for both shape and MESOR parameters across various levels of magnitude, sample size, noise and degree of freedom for t-distribution. These evidence, collectively, demonstrate that incorporating external knowledge can potentially improve BayesDCirc’s performance in terms of FDR control.

Comparing BayesDCirc without and with incorporating external knowledge for FDR control analysis for multiple-parameter simulation at nominal $\alpha $ level 5% (BayesDCirc-NP versus BayesDCirc-P) in detecting genes with DC patterns driven by two parameters ($A$ & $\phi $, $A$ & $M$, $\phi $ & $M$) at a time. (A) Comparing BayesDCirc-NP and BayesDCirc-P in detecting genes with DC patterns driven by shape parameters ($A$ and $\phi $) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both shape parameters $A$ and $\phi $. (B) Comparing BayesDCirc-NP and BayesDCirc-P in detecting genes with DC patterns driven by two parameters including MESOR ($A$ & $M$) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both A & $M$. (C) Comparing BayesDCirc-NP and BayesDCirc-P in detecting genes with DC patterns driven by two parameters including MESOR ($\phi $ and $ M $) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both $\phi $ and $M $. Simulation in each scenario is repeated 10 times, mean FDR is reported, and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to $A$, $\phi $, and $M$.
Figure 5

Comparing BayesDCirc without and with incorporating external knowledge for FDR control analysis for multiple-parameter simulation at nominal |$\alpha $| level 5% (BayesDCirc-NP versus BayesDCirc-P) in detecting genes with DC patterns driven by two parameters (⁠|$A$| & |$\phi $|⁠, |$A$| & |$M$|⁠, |$\phi $| & |$M$|⁠) at a time. (A) Comparing BayesDCirc-NP and BayesDCirc-P in detecting genes with DC patterns driven by shape parameters (⁠|$A$| and |$\phi $|⁠) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both shape parameters |$A$| and |$\phi $|⁠. (B) Comparing BayesDCirc-NP and BayesDCirc-P in detecting genes with DC patterns driven by two parameters including MESOR (⁠|$A$| & |$M$|⁠) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both A & |$M$|⁠. (C) Comparing BayesDCirc-NP and BayesDCirc-P in detecting genes with DC patterns driven by two parameters including MESOR (⁠|$\phi $| and |$ M $|⁠) simultaneously. The simulations settings included varying magnitude, sample size, noise, and violation of Gaussian distribution at small (S), medium (M), and large (L) level for both |$\phi $| and |$M $|⁠. Simulation in each scenario is repeated 10 times, mean FDR is reported, and the standard error of the mean FDR is marked on the bar plots. Amplitude, phase, and MESOR correspond to |$A$|⁠, |$\phi $|⁠, and |$M$|⁠.

Real data application

We evaluate BayesDCirc alongside competing methods on two real datasets. The first dataset, referred to as “Human brain aging data,” investigates the effects of normal aging on circadian rhythms in gene expression across two brain regions (Brodmann’s area 11 (BA11) and BA47) in the human prefrontal cortex. The second dataset, referred to as “Mouse multiple tissues aging data,” examines the impact of aging on rhythmically expressed genes across three ages (6, 18, and 27 months) and six organs/tissues (hypothalamus, lung, heart, kidney, skeletal muscle, and adrenal gland) in male mice [15, 27]. Additionally, we assess whether incorporating external knowledge further enhances BayesDCirc’s performance. Specifically, for “Human brain aging data,” we leverage DC knowledge from another brain region (BA47) to improve the DC analysis for the targeted brain region (BA11). For “Mouse multiple tissues aging data,” we focus on the heart as the targeted tissue and use DC information from related skeletal muscle as the external knowledge.

Human brain aging data

The study population of “Human brain aging data” consists of 146 human postmortem brain samples from 146 individuals, with each sample profiled with genome-wide transcriptome data measured by microarray from both BA11 and BA47 (GSE71620). For all samples, time of death has been adjusted to Zeitgeber time, representing circadian time, by accounting for time zone, latitude, longitude, and altitude. To assess the effect of aging on the gene expression rhythm in the human brain, the 146 individuals are divided into two groups-young group and old group, based on the median age of 53 years, resulting in balanced sample sizes for both age groups. Given that both BA11 and BA47 are from human prefrontal cortex, they likely share circadian clock genes and molecular pathways. This makes it is valuable to explore whether leveraging DC knowledge from BA47 can enhance the DC analysis for the targeted region BA11.

By default, we apply BayesDCirc for identifying genes showing DC patterns from BA11, both with and without external knowledge from BA47, to demonstrate the flexibility and advantage of Bayesian framework. The analysis proceeds in three steps: (i) identifying circadian genes in both young and old groups for BA11 and BA47, focusing the DC analysis on circadian genes present in both groups. We then combine the circadian genes from both BA11and BA47; (ii) performing DC analysis using BayesDCirc on the union of circadian genes from step (i) to obtain |$FDR_{I}$| and |$FDR_{L}$| from BA47; (iii) Using |$FDR_{I}$| and |$FDR_{L}$| from BA47 obtained from step (ii) as hyperparameters for the prior distribution of |$p_{I}$| and |$p_{L}$|⁠, representing the expected proportion of genes showing differential shape and MESOR, respectively (⁠|$p_{I} \sim \mbox{Beta}(a_{p_{I}}, b_{p_{I}})$| and |$p_{L} \sim \mbox{Beta}(a_{p_{L}}, b_{p_{L}})$|⁠) (Section 2.5); (iv) BayesDCirc is then used to detect genes showing differential shape and MESOR from BA11 after incorporating the external knowledge derived from step (iii). Competing methods are applied on the same union set on BA11 to obtain the |$FDR_{I}$| as comparison.

Specifically, |$diffCircadian$| R package is used to identify circadian genes in BA11(P|$value<0.05$|⁠) [34], resulting in 1921 circadian genes in young group, 3497 circadian genes in old group, and 193 circadian genes common to both age groups. Similarly, in BA47,1864 circadian genes are identified in the young group, 2400 circadian genes in the old group, and 178 common circadian genes. The union of circadian genes from BA11 and BA47, totaling 346 genes, is used for the DC analysis. We apply BayesDCirc-NP on BA47 to identify genes showing differential shape and MESOR, with each set recorded as |$\Psi _{I}$| and |$\Psi _{L}$|⁠. These values are then used to construct the informative prior for each of the 346 circadian genes. As a benchmark, we apply BayesDCirc-NP and BayesDCirc-P along with Limorhyde, robustDOOR, and HANOVA on BA11 to identify genes showing DC patterns at a nominal FDR level of 5%.

As a result, BayesDCirc-NP identifies two genes, |$FKBP5$| and |$MARCH4$|⁠, exhibiting differential MESOR. robustDODR finds |$FKBP5$| and |$ALDH1B1$|⁠, HANOVA detects |$ALDH1B1$| and LimoRhyde identifies nine genes: |$DDIT4$|⁠, |$CALML5$|⁠, |$APOLD1$|⁠, |$NOC4L$|⁠, |$MARCH4$|⁠, |$CRYBB2P1$|⁠, |$FKBP5$|⁠, |$ALDH1B1$|⁠, and |$MIR128-1$|⁠. According to the literature, |$FKBP5$|⁠, a gene rhythmically expressed in most tissues, is involved in the interaction between circadian clock and the stress response system. Notably, |$FKBP5$| is implicated in multiple stress-related psychiatric disorders, many of which exhibit circadian rhythm abnormalities [35, 36]. Although |$FKBP5$| is detected by other methods designed to detect differential shape, it clearly shows differential MESOR rather than differential shape between young and old groups. This suggests that the identification of |$FKBP5$| by competing methods could be a false positive, underscoring the advantage of BayesDCirc in specifically detecting differential MESOR (Fig. 6). By incorporating external knowledge from BA47, BayesDCirc-P expand its findings to include five additional genes showing differential shape, which include |$ALDH1B1$|⁠, |$C11orf24$|⁠, |$CALML5$|⁠,|$SIAH3$|⁠, and |$TRIB2$|⁠, and 3 additional genes showing differential MESOR such as |$TRIB2$|⁠, |$RNF165$|⁠, and |$PDE7B$|⁠. Remarkably, |$PDE7B$|⁠, identified as differential MESOR gene, is a well-known circadian gene with a critical role in molecular clock function [15]. This gene is also potentially linked to the reduced amplitude of cAMP circadian oscillation, which is affected by aging in rat Leydig cells [37]. Altogether, BayesDCirc demonstrates robust capability in identifying genes showing DC patterns and it is further empowered by incorporating external knowledge.

Two circadian genes show differential MESOR in human BA11 brain region.
Figure 6

Two circadian genes show differential MESOR in human BA11 brain region.

Mouse multiple tissues aging data

This study examines the circadian transcriptome in multiple tissues at multiple time points throughout the lifespan of male mice. It includes six tissues from the hypothalamus, lung, kidney, skeletal muscle, heart, and adrenal gland from young (6 months), aged (18 months), and old (27 months) male mice. Tissues are collected every 4 h over a 48-h period, resulting in a total of 12 time points to track circadian transcriptome changes. Each tissue at each time point is profiled with RNA-seq to capture genome-wide transcriptome profile (GSE201207). Counts per million (CPM) normalization is applied to the RNA-seq data, and logarithm of CPM is used as the measure of gene expression. Genes with mean CPM less than 1 are filtered out, resulting in 16 155 gene for the circadian analysis. In the subsequent analysis, we will focus on detecting genes of DC patterns in the heart by leveraging DC information from skeletal muscle due to their shared circadian clock genes and molecular pathways[38]. Given that the majority of heart’s mass is cardiac muscle cells, skeletal muscle serves as an ideal source of external knowledge to improve the detection of DC patterns in the heart.

Following the previously described steps, for heart, we identify 3214 circadian genes in young group and 2660 circadian genes in old group, with 1026 common circadian genes in both age groups, using diffCircadian R package (P|$value<0.05$|⁠). For skeletal muscle, we detect 2496 circadian genes in young group and 1859 circadian genes in old group, with 506 circadian genes common in both age groups. After taking the union of two sets of circadian genes, we obtain 1396 circadian genes for the DC analysis. We apply all methods, including BayesDCirc-NP, BayesDCirc-P, Limorhyde, robustDOOR, and HANOVA, to identify genes showing DC patterns at a FDR nominal level of 5%. Surprisingly, none of the competing methods detect any genes showing DC patterns. In contrast, BayesDCirc-NP identifies five genes showing differential MESOR, such as |$Aqp8$|⁠, |$Cilp$|⁠, |$Fstl4$|⁠, |$Myoc$|⁠, and |$Nlrc3$|⁠.

By incorporating external knowledge, BayesDCirc-P finds two additional genes showing differential shape: |$Nav3$| and |$Per2$|⁠, along with six additional genes showing differential MESOR: |$Aplnr$|⁠, |$Cdkn1a$|⁠, |$Col3a1$|⁠, |$Gck$|⁠, |$Rflnb$|⁠, and |$Slc25a30$|⁠. Evidence suggests these genes play important roles in aging and heart. Notably, |$Per2$| (Period Circadian Regulator 2), a core circadian gene, has been linked to cardioprotection and is considered as a therapeutic target for cardiovascular diseases due to its central role in circadian clock [39]. Additionally, |$Per2$| expression is associated with skeletal muscle contraction, highlighting its crucial role in both cardiac and muscle circadian regulation [40]. The gene expression pattern of |$Per2$| demonstrates that it is more like differential amplitude rather than differential phase (Fig. 7A). The gene |$Apln$| and its receptor |$Aplnr$| are also noted as potential anti-aging factors, with age-related decline observed in multiple organs including the heart. Restoring |$Apln$| in old mice has been shown to reduce cardiac hypertrophy, suggesting its potential as a therapeutic target [41]. Furthermore, mice deficient in either |$Apln$| or |$Aplnr$| exhibit significant alterations in muscle function with age [42]. The gene |$Cdkn1a (p21)$|⁠, a clock-controlled gene and cell cycle inhibitor, display rhythmic gene expression patterns across peripheral organs such as liver, heart and skeletal muscle, and is regulated by the circadian clock to modulate cellular proliferation [43]. The gene |$Gck$| (Glucokinase),a key gene for whole-body glucose homeostasis and influenced by the circadian clock, has been linked to both aging and coronary heart disease [44]. The three genes, which show significant differential MESOR between old and young groups (Fig. 7B,C,D) further validate the effectiveness of incorporating external knowledge in BayesDCirc.

Four circadian genes showing DC patterns in mouse heart.
Figure 7

Four circadian genes showing DC patterns in mouse heart.

Discussion

With the advent of multi-omics data, DC analysis has become a widely used approach for studying molecular profiles, such as gene expression, to identify genes showing DC patterns. These genes offer valuable insights into the role of circadian rhythms in biological processes, aid in understanding disease mechanism, and improve biomarker discovery, ultimately contributing to disease prevention, diagnosis, and treatment. Detecting DC patterns poses significant challenges as circadian rhythms are characterized by three key components: amplitude, phase, and MESOR. Most existing methods, primarily frequentist-based methods like Limorhyde, robustDOOR, and HANOVA, focus on testing differential shape (amplitude and phase) while overlooking MESOR, an essential component representing the average level of molecular profile (e.g. gene expression) across a circadian cycle. Additionally, multiple circadian studies are conducted in similar clinical or biological context such as related diseases, tissues, organs, or cell types, where integrating findings from circadian studies could enhance the DC analysis for the targeted study. Nevertheless, existing frequentist-based methods are limited in their ability to effectively leverage such enriched external knowledge.

To address these research gaps, we propose BayesDCirc, a novel Bayesian hierarchical method, for detecting genes of DC patterns in a two-group experimental design. BayesDCirc enables the detection of genes showing various DC patterns, which include differential shape and MESOR, by explicitly testing shape (amplitude and phase) and MESOR, respectively. Additionally, benefiting from Bayesian modeling, BayesDCirc allows for the integration of external knowledge, such as the previously identified DC genes, to improve the identification of DC genes in this study. Though fully Bayesian, BayesDCirc employs an efficient Gibbs sampling for statistical inference, benefiting from conjugate priors for parameters, which yields closed-form posterior distributions. For instance, detecting DC patterns across 1000 genes with a sample size of 24 would take less than 1 min. Furthermore, BayesDCirc introduces a dual spike-and-slab prior for selecting differential shape and MESOR, which also enable the incorporation of the external knowledge by modeling the proportion of genes showing DC patterns from external knowledge as the prior. The selection of genes showing DC patterns are based on FDR, which are derived from the Gibbs sampling. Extensive simulations, covering scenarios where each or combinations of the three circadian components (amplitude, phase, and MESOR) vary, demonstrate BayesDCirc’s superior performance in identifying two types of DC patterns. Performance is further enhanced when BayesDCirc incorporates external knowledge. Application to two real datasets illustrates BayesDCirc’s increased power in identifying genes showing DC patterns, especially differential MESOR. These findings are also corroborated by literature, attributed to its strength in detecting differential MESOR and leveraging external knowledge.

BayesDCirc offers promising directions for future research. First, while this study focuses on detecting DC patterns within the transcriptome, the approach can readily extend to other omics data type such as transcription factor, histone modification,DNA methylation, proteomics, and metabolomics to map circadian rhythms across molecular domains. Such findings can reveal age-dependent and tissue-specific circadian biomarkers, offering insight into circadian rhythm disruption in disease context. Second, BayesDCirc’s flexible framework, which models gene expression as a function of MESOR, phase, and amplitude along with their circadian interactions, can accommodate additional covariates, enabling adjustments for confounders to provide robust comparisons. Despite BayesDCirc’s strengths, it has some limitations. Although it can detect differential shape, it currently does not distinguish between amplitude and phase. Future work will involve refining BayesDCirc to statistically separate these components. Additionally, BayesDCirc currently supports only two-group comparisons, and extending it to multifactorial designs is an immediate goal. This extension is essential as experimental designs grow increasingly complex, incorporating multiple conditions, batches, platforms, and treatments. We plan to explore these directions to advance and enhance BayesDCirc in future studies.

Key Points
  • A novel Bayesian hierarchical model designed for detecting DC patterns in a two-group experimental design.

  • The Bayesian hierarchical model offers the advantage of testing MESOR difference and incorporating external knowledge.

  • Systemically evaluating the Bayesian hierarchical model in detecting DC patterns, and comparing it with existing methods in terms of the FDR and statistical power.

  • Implemented the Bayesian hierarchical model in an R software package, which has been made publicly available on GitHub.

Author contributions

Y.Z. contributed to the method development, data analysis, and R package preparation. Y.Z., H.D., Z.H., and L.C. contributed to the conceptualization and the development of the idea in the manuscript. All co-authors have reviewed the manuscript.

Competing interests

No competing interest is declared.

Funding

L.C. is supported by National Institutes of Health (R35GM142701).

References

1.

Badia
 
P
,
Myers
 
B
,
Boecker
 
M
. et al. .  
Bright light effects on body temperature, alertness, EEG and behavior
.
Physiol Behav
 
1991
;
50
:
583
8
.

2.

Cagnacci
 
A
,
Elliott
 
JA
,
Yen
 
SS
.
Melatonin: A major regulator of the circadian rhythm of core temperature in humans
.
J Clin Endocrinol Metabol
 
1992
;
75
:
447
52
.

3.

Dijk
 
DJ
,
Duffy
 
JF
,
Czeisler
 
CA
.
Circadian and sleep/wake dependent aspects of subjective alertness and cognitive performance
.
J Sleep Res
 
1992
;
1
:
112
7
.

4.

Jung
 
CM
,
Khalsa
 
SB
,
Scheer
 
FA
. et al. .  
Acute effects of bright light exposure on cortisol levels
.
J Biol Rhythms
 
2010
;
25
:
208
16
.

5.

Lee
 
Y
.
Roles of circadian clocks in cancer pathogenesis and treatment
.
Exp Mol Med
 
2021
;
53
:
1529
38
.

6.

Zhu
 
X
,
Maier
 
G
,
Panda
 
S
.
Learning from circadian rhythm to transform cancer prevention, prognosis, and survivorship care
.
Trends Cancer
 
2024
;
10
:
196
207
.

7.

Zhou
 
L
,
Zhang
 
Z
,
Nice
 
E
. et al. .  
Circadian rhythms and cancers: The intrinsic links and therapeutic potentials
.
J Hematol Oncol
 
2022
;
15
:
21
.

8.

Gabriel
 
BM
,
Altintaş
 
A
,
Smith
 
JAB
. et al. .  
Disrupted circadian oscillations in type 2 diabetes are linked to altered rhythmic mitochondrial metabolism in skeletal muscle
.
Sci Adv
 
2021
;
7
:
eabi9654
.

9.

Kim
 
JH
,
Duffy
 
JF
.
Circadian rhythm sleep-wake disorders in older adults
.
Sleep Med Clin
 
2018
;
13
:
39
50
.

10.

Germain
 
A
,
Kupfer
 
DJ
.
Circadian rhythm disturbances in depression
.
Hum Psychopharmacol Clin Exp
 
2008
;
23
:
571
85
.

11.

Boiko
 
DI
,
Chopra
 
H
,
Bilal
 
M
. et al. .  
Schizophrenia and disruption of circadian rhythms: An overview of genetic, metabolic and clinical signs
.
Schizophr Res
 
2024
;
264
:
58
70
.

12.

Homolak
 
J
,
Mudrovčić
 
M
,
Vukić
 
B
. et al. .  
Circadian rhythm and Alzheimer’s disease
.
Med Sci
 
2018
;
6
:
52
.

13.

Shimizu
 
I
,
Yoshida
 
Y
,
Minamino
 
T
.
A role for circadian clock in metabolic disease
.
Hypertens Res
 
2016
;
39
:
483
91
.

14.

Thosar
 
SS
,
Butler
 
MP
,
Shea
 
SA
.
Role of the circadian system in cardiovascular disease
.
J Clin Invest
 
2018
;
128
:
2157
67
.

15.

Chen
 
CY
,
Logan
 
RW
,
Ma
 
T
. et al. .  
Effects of aging on circadian patterns of gene expression in the human prefrontal cortex
.
Proc Natl Acad Sci
 
2016
;
113
:
206
11
.

16.

Seney
 
ML
,
Cahill
 
K
,
Enwright
 
JF
3rd
. et al. .  
Diurnal rhythms in gene expression in the prefrontal cortex in schizophrenia
.
Nat Commun
 
2019
;
10
:
1
11
.

17.

Hodge
 
BA
,
Wen
 
Y
,
Riley
 
LA
. et al. .  
The endogenous molecular clock orchestrates the temporal separation of substrate metabolism in skeletal muscle
.
Skeletal muscle
 
2015
;
5
:
17
.

18.

Hughes
 
ME
,
DiTacchio
 
L
,
Hayes
 
KR
. et al. .  
Harmonics of circadian gene transcription in mammals
.
PLoS Genet
 
2009
;
5
:
e1000442
.

19.

Möller-Levet
 
CS
,
Archer
 
SN
,
Bucca
 
G
. et al. .  
Effects of insufficient sleep on circadian rhythmicity and expression amplitude of the human blood transcriptome
.
Proc Natl Acad Sci
 
2013
;
110
:
E1132
41
.

20.

Wen
 
S
,
Ma
 
D
,
Zhao
 
M
. et al. .  
Spatiotemporal single-cell analysis of gene expression in the mouse suprachiasmatic nucleus
.
Nat Neurosci
 
2020
;
23
:
456
67
.

21.

Tang
 
H
,
Chen
 
S
,
Yi
 
L
. et al. .  
Circadian rhythms correlated in DNA methylation and gene expression identified in human blood and implicated in psychiatric disorders
.
Am J Med Genet B Neuropsychiatr Genet
 
2025
;
198
:
e33005
.

22.

Koike
 
N
,
Yoo
 
SH
,
Huang
 
HC
. et al. .  
Transcriptional architecture and chromatin landscape of the core circadian clock in mammals
.
Science
 
2012
;
338
:
349
54
.

23.

Wang
 
Y
,
Song
 
L
,
Liu
 
M
. et al. .  
A proteomics landscape of circadian clock in mouse liver
.
Nat Commun
 
2018
;
9
:
1553
.

24.

Dallmann
 
R
,
Viola
 
AU
,
Tarokh
 
L
. et al. .  
The human circadian metabolome
.
Proc Natl Acad Sci
 
2012
;
109
:
2625
9
.

25.

Hughey
 
JJ
,
Butte
 
AJ
.
Differential phasing between circadian clocks in the brain and peripheral organs in humans
.
J Biol Rhythms
 
2016
;
31
:
588
97
.

26.

Hsu
 
PY
,
Harmer
 
SL
.
Circadian phase has profound effects on differential expression analysis
.
PloS One
 
2012
;
7
:
e49853
.

27.

Wolff
 
CA
,
Gutierrez-Monreal
 
MA
,
Meng
 
L
. et al. .  
Defining the age-dependent and tissue-specific circadian transcriptome in male mice
.
Cell Rep
 
2023
;
42
:
111982
.

28.

Pelikan
 
A
,
Herzel
 
H
,
Kramer
 
A
. et al. .  
Studies overestimate the extent of circadian rhythm reprogramming in response to dietary and genetic changes
.
BioRxiv
 
2020
;
2020
12
.

29.

Singer
 
JM
,
Hughey
 
JJ
.
LimoRhyde: A flexible approach for differential analysis of rhythmic transcriptome data
.
J Biol Rhythms
 
2019
;
34
:
5
18
.

30.

Thaben
 
PF
,
Westermark
 
PO
.
Differential rhythmicity: Detecting altered rhythmicity in biological data
.
Bioinformatics
 
2016
;
32
:
2800
8
.

31.

Ishwaran
 
H
,
Rao
 
JS
. Spike and slab variable selection: frequentist and bayesian strategies.
Ann. Statist
. 2005;
33
:730–73. .

32.

Efron
 
B
,
Tibshirani
 
R
.
Empirical bayes methods and false discovery rates for microarrays
.
Genet Epidemiol
 
2002
;
23
:
70
86
.

33.

Newton
 
MA
,
Noueiry
 
A
,
Sarkar
 
D
. et al. .  
Detecting differential gene expression with a semiparametric hierarchical mixture method
.
Biostatistics
 
2004
;
5
:
155
76
.

34.

Ding
 
H
,
Meng
 
L
,
Liu
 
AC
. et al. .  
Likelihood-based tests for detecting circadian rhythmicity and differential circadian patterns in transcriptomic applications
.
Brief Bioinform
 
2021
;
22
.

35.

Landgraf
 
D
,
McCarthy
 
MJ
,
Welsh
 
DK
.
Circadian clock and stress interactions in the molecular biology of psychiatric disorders
.
Curr Psychiatry Rep
 
2014
;
16
:
1
11
.

36.

Zannas
 
AS
,
Jia
 
M
,
Hafner
 
K
. et al. .  
Epigenetic upregulation of FKBP5 by aging and stress contributes to NF-|$\kappa $|-B–driven inflammation and cardiovascular risk
.
Proc Natl Acad Sci
 
2019
;
116
:
11370
9
.

37.

Baburski
 
AZ
,
Sokanovic
 
SJ
,
Andric
 
SA
. et al. .  
Aging has the opposite effect on cAMP and cGMP circadian variations in rat Leydig cells
.
J Comp Physiol B
 
2017
;
187
:
613
23
.

38.

Fagiani
 
F
,
Di Marino
 
D
,
Romagnoli
 
A
. et al. .  
Molecular regulations of circadian rhythm and implications for physiology and diseases
.
Signal Transduct Target Ther
 
2022
;
7
:
41
.

39.

Oyama
 
Y
,
Walker
 
LA
,
Eckle
 
T
.
Targeting circadian PER2 as therapy in myocardial ischemia and reperfusion injury
.
Chronobiol Int
 
2021
;
38
:
1262
73
.

40.

Small
 
L
,
Altintaş
 
LRC
. et al. .  
Contraction influences Per2 gene expression in skeletal muscle through a calcium-dependent pathway
.
J Physiol
 
2020
;
598
:
5739
52
.

41.

Rai
 
R
,
Ghosh
 
AK
,
Eren
 
M
. et al. .  
Downregulation of the apelinergic axis accelerates aging, whereas its systemic restoration improves the mammalian healthspan
.
Cell Rep
 
2017
;
21
:
1471
80
.

42.

Vinel
 
C
,
Lukjanenko
 
L
,
Batut
 
A
. et al. .  
The exerkine apelin reverses age-associated sarcopenia
.
Nat Med
 
2018
;
24
:
1360
71
.

43.

Gréchez-Cassiau
 
A
,
Rayet
 
B
,
Guillaumond
 
F
. et al. .  
The circadian clock component BMAL1 is a critical regulator of p21WAF1/CIP1 expression and hepatocyte proliferation
.
J Biol Chem
 
2008
;
283
:
4535
42
.

44.

Xu
 
L
,
Zheng
 
D
,
Wang
 
L
. et al. .  
GCK gene-body hypomethylation is associated with the risk of coronary heart disease
.
Biomed Res Int
 
2014
;
2014
:
151723
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data