Abstract

DNA methylation plays a crucial role in transcriptional regulation. Reduced representation bisulfite sequencing (RRBS) is a technique of increasing use for analyzing genome-wide methylation profiles. Many computational tools such as Metilene, MethylKit, BiSeq and DMRfinder have been developed to use RRBS data for the detection of the differentially methylated regions (DMRs) potentially involved in epigenetic regulations of gene expression. For DMR detection tools, as for countless other medical applications, P-values and their adjustments are among the most standard reporting statistics used to assess the statistical significance of biological findings. However, P-values are coming under increasing criticism relating to their questionable accuracy and relatively high levels of false positive or negative indications. Here, we propose a method to calculate E-values, as likelihood ratios falling into the null hypothesis over the entire parameter space, for DMR detection in RRBS data. We also provide the R package ‘metevalue’ as a user-friendly interface to implement E-value calculations into various DMR detection tools. To evaluate the performance of E-values, we generated various RRBS benchmarking datasets using our simulator ‘RRBSsim’ with eight samples in each experimental group. Our comprehensive benchmarking analyses showed that using E-values not only significantly improved accuracy, area under ROC curve and power, over that of P-values or adjusted P-values, but also reduced false discovery rates and type I errors. In applications using real RRBS data of CRL rats and a clinical trial on low-salt diet, the use of E-values detected biologically more relevant DMRs and also improved the negative association between DNA methylation and gene expression.

INTRODUCTION

The P-value is a probability that measures the observed result of the statistical test falls into the acceptance or rejection category under a null hypothesis. In doing so, it provides a judgment level on statistical significance, the smaller the P-value the stronger the evidence against the null hypothesis. In biomedical research, a P-value of less than 0.05 is usually considered statistically significant. In high-dimension omics studies, P-values are usually corrected through P-value adjustment methods to control ‘false positive’ findings [1–5]. Up until now, P-values and adjusted P-values have become one of the most standard reporting statistics occurring in published medical literature [6–11].

However, debates over the use of P-values and their adjustments have continued over the past few decades. This has been largely due to examples and cases where the misuse and/or misinterpretation of the results of statistical tests and P-values has led to negative outcomes or potentially questionable over-confidence in results [6, 12–18]. In 2016, the American Statistical Association (ASA) issued a P-value guideline to attempt to supervise the proper use and interpretation of P-values [19]. Despite this, criticisms of P-values have continued to be widely reported, as highlighted in the 2019 Nature paper ‘Retire statistical significance’, and endorsed by more than 800 scholars worldwide [20]. This clearly indicates that the P-value controversy is far from resolved. In another 2019 example, the ASA published a further special issue of 43 papers on ‘Statistical Inference in the 21st Century: A World Beyond P < 0.05’, again suggesting that the scientific community needs to move beyond its addiction to the P-values [21, 22].

To overcome the limitation of P-values and adjusted P-values, a theoretical framework for E-values has been recently proposed to address the multiple hypothesis testing problem [23]. E-values can be directly interpreted as ‘betting scores’ or ‘likelihood ratios’, or can be interpreted as ‘how likely is the rejection to be true?’. Some evidence is being put forward that E-values have advantages for multiple testing corrections under certain theoretical settings [23]. However, the application of E-values to real-world data still requires further investigation. First of all, the definition of E-values is too broad to be applied in real-world data as E-values could be derived from any non-negative function the expectation of which is no more than 1. Second, due to the fundamental limitations of using P-values [23], transforming P-values to E-values via a calibrator function is not always valid. This is partly because the computing of P-values requires the full distribution of test statistics, often requiring asymptotic analysis. Without a large sample size, computing P-values by asymptotic analysis is far from robust. Therefore, direct E-value calculation, rather than transforming of P-values to E-values via a calibration function, is necessary when moving toward real data applications.

DNA methylation is a well-known aspect of epigenetic modification that plays a vital role in gene regulation. Reduced representation bisulfite sequencing (RRBS) is becoming increasingly popular for its use in analyzing genome-wide methylation profiles at a single-base resolution [24–27]. One of the main goals of RRBS studies is to identify differentially methylated regions (DMRs) between different biological conditions. Such detected DMRs have revealed important mechanisms of epigenetic regulation as mediated by alterations in DMR methylation. However, up to the present, most DMR detection tools have utilized P-values and adjusted P-values in their genome-wide DNA methylation studies. As common in other fields and studies, in these DMR detection studies a DMR is generally considered statistically significant when the adjusted P-value is less than 0.05. However, little clear analysis has been conducted to provide any comparisons between, or to analyze the relative merits of, the behaviors of P-values and adjusted P-values versus E-values in the identification of DMRs using RRBS.

In this study, we have proposed a method to calculate E-values for DMR detection in RRBS data. We also provided the R package ‘metevalue’ to facilitate user-based calculation of E-values. To assess the performance of E-values in identifying DMRs, we further generated various RRBS benchmarking datasets using our simulator ‘RRBSsim’. The performance of the detection of DMRs using P-values and their adjusted P-values or E-values is compared here in terms of accuracy (ACC), area under ROC curve (AUC), false discovery rate (FDR), type I error and power. Four commonly used DMR detection tools (Metilene, MethylKit, BiSeq and DMRfinder) were then tested, revealing the comparably excellent performance of E-values when applied to RRBS data to identify DMRs. Finally, to further demonstrate their utility in real RRBS data, we applied E-values to identify DMRs by Metilene in Dahl salt-sensitive (SS/JrHsdMcwiCrl, referred to as CRL) rats [30, 31] and a clinical trial on low-salt diet [32].

METHODS

Simulated RRBS data

RRBS sequence reads on the Illumina platform were generated using our simulator, RRBSsim, with default parameters (sequencing depth of 30× and read length of 100 base pairs), as described in our recent studies [28, 29]. Briefly, the reference genome (hg19) was randomly digested into variable sized regions as Msp1 fragments. The reference sequence of these Msp1 fragments in FASTA format was then inputted to RRBSsim. Once the baseline methylation level of a genomic region had been specified, the methylation value of CpG sites within the region was fitted with a Beta distribution by modeling real RRBS data from our recent lung cancer study [30]. The real RRBS data contained 18 tumors and adjacent normal tissues of non-small cell lung cancer, which can be accessed on the Sequence Read Archive (SRA) database (SRP125064). The RRBS data generated from the RRBSsim simulator also accounted for the strong dependence upon methylation levels of adjacent CpG sites within the genomic region [29].

Since RRBS experiments are often performed with a small number of biological replicates (ranging from a few to dozen subjects), in our simulation two experimental groups (treatment versus control groups) were set, each with eight RRBS samples. Most of the CpG islands (CGIs) are located in gene promoter regions which are often differentially methylated under certain disease conditions. We randomly chose approximately 50% of these CGIs and 25% of CGI shores (with 2 kb to their neighboring CGIs) as hyper- or hypo-methylated DMRs. The predefined DMRs contained CpG sites varying from a few to a thousand, with an average of 80 CpG sites per DMR. There were different degrees of methylation differences between the two groups on these predefined DMRs (<0.2, 0.2–0.4 and >0.4). A total of 100 repeats were carried out in our study. Details in generating the RRBS simulation data were described previously [28, 29]. Simulation codes and a set of simulated RRBS data as example data were deposited in the GitHub repository (https://github.com/yfyang86/metevalue-demos/tree/main/simulation). The simulated RRBS data in FASTQ format were analyzed using our in-house pipeline as described in the next section.

Real RRBS data and mRNA-seq data in applications

The raw RRBS and mRNA-seq data of CRL rats and a clinical trial on low-salt diet came from our previous studies [31–33]. Briefly, the CRL rat experiment involved rats on 0.4% NaCl (low-salt) diet since weaning or switched to a 4% NaCl (high-salt) diet for 3 weeks. Both RRBS and mRNA-seq data were obtained from renal T lymphocytes of CRL rats in low-salt and high-salt diet groups. In each group, there were four pooled samples with three rats in each pool [31, 32].

We recently conducted a clinical trial on 2 week low-salt diet (1200 mg/day Na diet) [32]. Both RRBS and mRNA-seq data of biopsied arterioles were obtained from the same subjects before and after low-salt diet (referred to as pre-diet and post-diet, respectively). There were 10 subjects in each group with each subject having both RRBS and mRNA-seq data. The detailed research protocol for this can be found in our previous study [33]. The RRBS and mRNA-seq data (Supplementary Table S1) have been deposited in the National Genomics Data Center under the accession numbers OMIX004080 and OMIX004085 (https://ngdc.cncb.ac.cn/omix).

Analysis of mRNA-seq data

Our in-house pipeline described previously [31, 33] was used to reanalyze the mRNA-seq data from CRL rats and a clinical trial on low-salt diet. The reference genome was rn6 for rats and hg19 for humans.

Analysis of RRBS data

Our in-house pipeline [32–36] was used to analyze both the simulated RRBS data and real RRBS data from CRL rats and a clinical trial on low-salt diet. Briefly, adapters were first removed from the output reads. The low-quality sequences at both ends of the reads (quality score < 20) were then further trimmed. These trimmed reads were then mapped to the reference genome (hg19 for humans and rn6 for rats) using Bismark (v0.16.1) [37]. Finally, the methylation call at each CpG site was extracted from the sequence alignment files (BAM). At each CpG site, the methylation rate was calculated as the percentage of unconverted cytosines in each library. RRBS data from different libraries were merged based on CpG coordinates. Transcription start site (TSS) regions were defined as 1000 bp upstream and downstream of the TSS. The genomic positions of CGIs on the reference genomes (hg19 for humans and rn6 for rats) were downloaded from the UCSC Genome Browser (http://genome.ucsc.edu/). CGI shores were defined as ±2 kb flanking the islands.

DMR detection tools

Four commonly used tools, Metilene, MethylKit, BiSeq and DMRfinder, have shown excellent performance in the identification of DMRs, and these were used in the evaluations of this study [28]. Metilene identifies genomic regions dynamically to maximize the absolute methylation difference between groups. P-values of each region are then calculated using the Wilcoxon rank-sum test [38]. MethylKit clusters CpG sites on predefined regions or tiling windows, and then calculates the P-value of each region using logistic regression or Fisher’s exact test [39]. BiSeq first clusters CpG sites, prunes the rejected CpG clusters and then merges the remaining CpG clusters into DMRs [40]. DMRfinder extracts CpG methylation counts from the sequence alignment files, clusters valid CpG sites into genomic regions and applies the Wald test to calculate the P-value for each region [28]. In all analyses, these DMR detection tools were run on default settings to identify differentially methylated regions between the two experimental groups.

P-values and adjusted P-values

Each computational tool was performed separately to identify DMRs between the two experimental groups. We first filtered out any identified regions with an absolute methylation difference of less than 0.05 between the two groups. This pre-filtering procedure is routinely done in detecting DMRs from RRBS data due to the potential for small methylation changes linked to potential bisulfite conversion errors during library preparation. P-values with the remaining regions were then adjusted using the Bonferroni or Benjamini–Hochberg (BH) methods [41, 42], implemented in the R statistical package ‘stats’ (https://www.r-project.org/).

A computational approach to estimate E-values

To facilitate E-value calculation toward real data applications, we provided a computational approach as follows. Specifically, two experimental groups (treatment and control groups) were considered. Suppose that there are |$ {m}_1$| samples in the treatment group, |$ {m}_2 $| samples in the control group and |$ k $| CpG sites that are located in a given DMR. The methylation rate in the given DMR was first averaged across |$ k $| CpG sites for each sample and the averaged methylation rate denoted |$ r=\left({\overline{r}}_{11},\dots, {\overline{r}}_{1{m}_1};{\overline{r}}_{21},\dots, {\overline{r}}_{2{m}_2}\right)$|⁠. We assumed that the averaged methylation rates in treatment and control groups were normally distributed. Considering different mean and variance parameters of the averaged methylation rate between treatment and control groups, we denoted |$ {\overline{r}}_{ij}\sim N\left({\mu}_i,{\sigma}_i^2\right),\,j=1,\cdots, {m}_i;i=1,2 $|⁠. Under the null hypothesis, there is no difference in the average methylation rate between treatment and control groups. Therefore, we denoted |$ {\overline{r}}_{ij}\sim N\left(\mu, {\sigma}^2\right),j=1,\cdots, {m}_i;i=1,2 $|⁠. The E-value was defined as a ratio of the observed likelihood falling into the null hypothesis |$ {\varTheta}_0=\left\{\left({\mu}_1,{\mu}_2,{\sigma}_1^2,{\sigma}_2^2\right):{\mu}_1={\mu}_2\in R,{\sigma}_1={\sigma}_2\in{R}^{+}\right\} $| to the likelihood on the entire parameter space |$\varTheta = \{\left({\mu}_1,{\mu}_2,{\sigma}_1^2,{\sigma}_2^2\right){:}\,{\mu}_i\in R,{\sigma}_i^2\in{R}^{+}, i=1,2\}$|

(1)
  • where unknown parameters |${\mu}_i,\mu, {\sigma}_i^2,{\sigma}^2$| were estimated from RRBS data using maximum likelihood estimation. It is worth noting that E-value tends to be small if the distributions of the averaged methylation rates in the treatment and control groups are identical. Instead of computing the limiting distribution of |$E(r)$| to generate P-values, we directly considered |$E(r)>20$| to be statistically significant. Based on the recommended calibration function |$f\left(E(r)\right)=\min \left\{1,1/E(r)\right\}$| for transforming E-values to P-values [23], the threshold of 20 for E-value is equivalent to our usual statistical significance level of a P-value or an adjusted P-value of 0.05.

We developed an R package named ‘metevalue’ to implement the estimation of E-value to identify DMRs in RRBS studies. The package ‘metevalue’ is a user-friendly interface that allows researchers to apply various tools to identify DMRs using E-values and their adjustments in epigenetic studies. It can be accessed directly through The Comprehensive R Archive Network (CRAN) (https://cran.r-project.org). Given a data matrix containing methylation levels in each CpG site for each sample and an input file containing methylation regions, ‘metevalue’ generates an E-value for any given methylation region. Examples of calculating E-values in RRBS and RNA-seq data are also provided in ‘metevalue’. These example data sets and the source codes for ‘metevalue’ are also available in the GitHub repository (https://github.com/yfyang86/metevalue).

Criteria for evaluating the performance of various tools

In our simulation, we already knew the DMRs. As defined in our previous study [28], if a region detected by the tool covers more than 50% of CpG sites in the predefined DMR, the tool detects a true DMR. True negative (TN) or false positive (FP) referred to the situation where a predefined non-DMR was identified as a non-DMR or DMR, respectively. Similarly, a false negative (FN) or true positive (TP) referred to the situation where a predefined DMR was identified as a non-DMR or DMR, respectively. This is summarized in the confusion matrix (Supplementary Table S2). The performance of these tools was then evaluated using multiple criteria following the confusion matrix. This included ACC, AUC, FDR, type I errors and power. The performance metrics were defined as follows:

Statistical analysis

Pearson correlation analysis was performed to quantify the linear correlation between log-transformed gene expression level and DNA methylation level. Wilcoxon rank-sum tests were used to compare the differences of each performance metric between using P-value, adjusted P-value (BH or Bonferroni) and E-value. The following comparisons between two groups were considered: E-value versus P-value, E-value versus BH, and E-value versus Bonferroni.

RESULTS

A workflow of evaluating E-values in identifying DMRs

Figure 1 illustrates a comprehensive comparative workflow for detecting DMRs using E-values, P-values or adjusted P-values. Briefly, the four commonly used tools (Metilene, MethylKit, BiSeq and DMRfinder) were applied to detect DMRs. Each tool detected a list of DMRs with P-values. Using methylation rates and DMRs from each tool as input data, the R package ‘metevalue’ provided E-values as an additional column to the output from each DMR detection tool. Any identified regions of an absolute methylation with a difference of less than 0.05 between the two groups was filtered out. Then, P-values with the remaining regions were adjusted using the Bonferroni or BH methods, as implemented in the R statistical package ‘stats’. Regions with either a P-value <0.05, adjusted P-value <0.05 or E-value >20 were considered to be identified as statistically significant DMRs.

A workflow for comprehensive evaluations of E-values for detecting DMR.
Figure 1

A workflow for comprehensive evaluations of E-values for detecting DMR.

RRBS data were generated using our simulator ‘RRBSsim’ to comprehensively evaluate the performance of P-values, adjusted P-values and E-values in identifying DMRs. A region identified as a DMR covering more than 50% of CpG sites in the predefined DMR was considered as correctly identified. True negatives (TN) or false positives (FP) were then assigned to refer to predefined non-DMRs that had been identified as non-DMRs or DMRs, respectively. False negatives (FN) or true positives (TP) were similarly assigned to refer to predefined DMRs that had been identified as non-DMRs or DMRs, respectively (Supplementary Table S2).

Finally, to evaluate the performance of E-values and adjusted P-values using the BH method in a real RRBS data application, the DMR detection tool ‘Metilene’ was further applied to our recent RRBS studies of CRL rats and our low-salt clinical trial.

E-values improved the concordance of DMRs detected by different tools

To evaluate the similarity between DMRs from different detection tools in simulated RRBS data, the overlap of DMRs predicted by different tools using P-values, BH, Bonferroni or E-values was analyzed (Figure 2). In general, BiSeq showed the largest proportion of DMRs that overlapped with other detection tools, while MethylKit showed the lowest proportion of DMRs that overlapped with other detection tools, with Metilene and DMRfinder lying in between. We concluded that using E-values could further improve the consistency of DMRs detected by different tools, as compared to using P-values, BH or Bonferroni. Ultimately, the percentage of DMRs detected by all four tools increased by 17.5% when using E-values.

The concordance of DMRs detected by different tools using P-values, adjusted P-values or E-values in simulated RRBS data. In each panel, the DMRs identified by the four tools are visualized using a Venn diagram.
Figure 2

The concordance of DMRs detected by different tools using P-values, adjusted P-values or E-values in simulated RRBS data. In each panel, the DMRs identified by the four tools are visualized using a Venn diagram.

E-values improved accuracy and reduced false discovery rates

ACC was used to evaluate the accuracy of identifying true DMRs. All DMR detection tools yielded the highest ACC when using E-values rather than P-values, BH or Bonferroni (Figure 3). The average ACC increased by up to 7.9% when using E-values compared to using P-values, BH or Bonferroni.

FDR and ACC of identifying DMRs using P-values, adjusted P-values and E-values in simulated RRBS data. FDR: false discovery rate; ACC: accuracy. In each panel, the y-axis represents FDR and the x-axis represents ACC. The average FDR and ACC in 100 repeats are presented.
Figure 3

FDR and ACC of identifying DMRs using P-values, adjusted P-values and E-values in simulated RRBS data. FDR: false discovery rate; ACC: accuracy. In each panel, the y-axis represents FDR and the x-axis represents ACC. The average FDR and ACC in 100 repeats are presented.

In the identification of DMRs, the FDR was highest when using P-values and decreased when using adjusted P-values. However, the FDR could be further reduced (up to 27.2%) by using E-values. Surprisingly, BiSeq could control FDR regardless of using P-values, adjusted P-values or E-values, while in Metilene, MethyKit and DMRfinder, the FDRs were largely out of control, even with BH-adjusted P-values. Taken together, using E-values not only improved accuracy but also reduced the false discovery rate of these tools in detecting DMRs compared to using P-values or adjusted P-values.

E-values increased AUC

Among all DMR detection tools, using adjusted P-values with the Bonferroni method generally yielded the lowest AUC, while using E-values yielded the highest AUC. AUC was significantly increased by up to 14.4% for all DMR detection tools when using E-values compared to P-values or adjusted P-values (Figure 4).

AUC of identifying DMRs using P-values, adjusted P-values and E-values in simulated RRBS data. AUC: area under receiver operating characteristic curve. The white dots in the box represent the average value of AUC. ****P ≤ 0.0001.
Figure 4

AUC of identifying DMRs using P-values, adjusted P-values and E-values in simulated RRBS data. AUC: area under receiver operating characteristic curve. The white dots in the box represent the average value of AUC. ****P ≤ 0.0001.

E-values improved power and reduced type I error

In general, DMR detection tools using P-values yielded higher power than those using adjusted P-values (BH or Bonferroni), while DMR detection tools using E-values, especially for BiSeq and Metilene, were more powerful than those using P-values (Figure 5).

Type I error and power of identifying DMRs using P-values, adjusted P-values and E-values in simulated RRBS data. In each panel, the y-axis represents type I error and the x-axis represents power. The average type I error and power in 100 repeats are presented.
Figure 5

Type I error and power of identifying DMRs using P-values, adjusted P-values and E-values in simulated RRBS data. In each panel, the y-axis represents type I error and the x-axis represents power. The average type I error and power in 100 repeats are presented.

Among all DMR detection tools, using P-values yielded a higher number of type I errors than using adjusted P-values or E-values. Conversely, using E-values could reduce type I errors compared to using P-values or BH-adjusted P-values. With the exception of MethyKit, all other DMR detection tools could control type I errors below 0.05, regardless of whether P-values, adjusted P-values or E-values were used. In MethylKit, the type I error exceeded 0.12 and was not well controlled when using P-value or adjusted P-value, while the type I error was well controlled to less than 0.05 when using E-values. These results suggested that the use of E-values in these DMR detection tools provided good control over type I errors and improved the ability to identify DMRs.

E-values identified DMRs of more biological relevance

To illustrate the utility of E-values, we applied them to identify DMRs in real RRBS datasets. It is well known that high sodium intake increases the risk of cardiovascular and renal diseases. Therefore, a better identification of DMRs from CRL rats or our clinical trial on low-salt diet should be able to identify more genes related with cardiovascular and renal diseases. In addition, DNA methylation in gene promoters plays a crucial role in transcriptional regulation. A better identification of DMRs should be able to reflect the strong negative correlation between the methylation levels of DMR detected in the gene promoters and their corresponding gene expressions.

The first real RRBS dataset came from our previous studies [31, 32] that analyzed genome-wide DNA methylome and transcriptome of renal T lymphocytes from CRL rats treated with high- or low-salt diets. Metilene was previously used to identify DMRs between high-salt and low-salt diet CRL rats using BH-adjusted P-values. This resulted in the identification of 193 DMRs located at TSS regions with BH-adjusted P-values <0.05. When using E-values >20, Metilene identified 233 DMRs located at TSS regions (Figure 6A). The methylation levels of these identified DMRs were concentrated around 0.1, which was consistent to the expected low methylation level in TSS regions (Figure 6B). Among the identified DMRs in TSS regions, 141 overlapped DMRs were detected by both BH-adjusted P-values and E-values. Genes with their TSS located within these overlapped DMRs were related to diseases such as Type 2 Diabetes. 92 DMRs were identified only when using E-values rather than BH-adjusted P-values. Genes with their TSS located within these unique DMRs were enriched in several diseases including blood pressure, hypertension and kidney diseases (Figure 6C). We then calculated Pearson correlation between the methylation level of these DMRs and the corresponding gene expression. Interestingly, DMRs detected using E-values were more negatively correlated with their corresponding gene expressions than those DMRs detected using BH-adjusted P-values (Figure 6D–E).

Comparison of DMRs detected in CRL rats by using E-values or BH-adjusted P-values. (A) Venn diagram of DMRs detected by E-values and BH-adjusted P-values. (B) Methylation levels in these detected DMRs. (C) Number of disease-associated genes with TSS located within the detected DMRs. (D–E) The negative association between gene expression and methylation level in the DMRs as detected by E-values or BH-adjusted P-values. Metilene was used to detect DMRs in the real RRBS data analysis. DMRs carrying TSS regions were used for gene ontology analysis.
Figure 6

Comparison of DMRs detected in CRL rats by using E-values or BH-adjusted P-values. (A) Venn diagram of DMRs detected by E-values and BH-adjusted P-values. (B) Methylation levels in these detected DMRs. (C) Number of disease-associated genes with TSS located within the detected DMRs. (DE) The negative association between gene expression and methylation level in the DMRs as detected by E-values or BH-adjusted P-values. Metilene was used to detect DMRs in the real RRBS data analysis. DMRs carrying TSS regions were used for gene ontology analysis.

The second real RRBS dataset came from our clinical trial in which 10 subjects were placed on 2 week low-salt diet (1200 mg sodium/day) [33]. Both RRBS and mRNA-seq data of biopsied arterioles were obtained from the same subjects before and after low-salt diet (referred to as pre-diet and post-diet, respectively). The original Metilene had implemented the Wilcoxon rank-sum test for independent experimental groups. Here, we replaced the Wilcoxon rank-sum test in Metilene by Wilcoxon signed-rank test for pre- and post-diet paired samples. Then, the revised Metilene using BH-adjusted P-values or E-values was applied to identify DMRs between pre-diet and post-diet groups. DMRs detected using E-values showed stronger negative correlations with their corresponding gene expressions than DMRs detected using BH-adjusted P-values (Supplementary Figure S1). These real RRBS data analyses revealed that the use of E-values leads to detection of DMRs of increased biological relevancy.

DISCUSSION

In this study, we have proposed a method to calculate E-values for DMR detection in RRBS data and provided an R package ‘metevalue’ as a user-friendly interface to implement the calculation of E-values that is compatible with various DMR detection tools. We then generated various RRBS benchmarking datasets using our simulator ‘RRBSsim’ to comprehensively evaluate the performance of P-values, adjusted P-values and E-values in identifying DMRs. Such benchmarking analyses showed that using E-values, rather than P-values or adjusted P-values, not only significantly improved ACC, AUC and power, but also reduced FDR and type I errors in identifying DMRs in RRBS studies (Table 1). The real data applications from CRL rats and our clinical trial on low-salt diet demonstrated that the use of E-values detected more disease-related genes containing these DMRs and improved the negative correlation between DNA methylation and gene expression. This suggests that E-values detect biologically more relevant DMRs than P-values.

Table 1

Summary of the performance of E-values, P-values and their adjustments

SoftwareCriteriaACCAUCFDRType I errorPower
MetileneE-values11221
BH23333
Bonferroni34114
P-values42442
MethylKitE-values11114
BH32331
Bonferroni22222
P-values32341
BiSeqE-values11221
BH32333
Bonferroni44114
P-values22442
DMRfinderE-values11222
BH22333
Bonferroni34114
P-values43441
SoftwareCriteriaACCAUCFDRType I errorPower
MetileneE-values11221
BH23333
Bonferroni34114
P-values42442
MethylKitE-values11114
BH32331
Bonferroni22222
P-values32341
BiSeqE-values11221
BH32333
Bonferroni44114
P-values22442
DMRfinderE-values11222
BH22333
Bonferroni34114
P-values43441

Note: Performance is ranked numerically from 1 (best) to 4 (worst).

Table 1

Summary of the performance of E-values, P-values and their adjustments

SoftwareCriteriaACCAUCFDRType I errorPower
MetileneE-values11221
BH23333
Bonferroni34114
P-values42442
MethylKitE-values11114
BH32331
Bonferroni22222
P-values32341
BiSeqE-values11221
BH32333
Bonferroni44114
P-values22442
DMRfinderE-values11222
BH22333
Bonferroni34114
P-values43441
SoftwareCriteriaACCAUCFDRType I errorPower
MetileneE-values11221
BH23333
Bonferroni34114
P-values42442
MethylKitE-values11114
BH32331
Bonferroni22222
P-values32341
BiSeqE-values11221
BH32333
Bonferroni44114
P-values22442
DMRfinderE-values11222
BH22333
Bonferroni34114
P-values43441

Note: Performance is ranked numerically from 1 (best) to 4 (worst).

Improving accuracy and reducing errors are major considerations in the development of any computational tools, especially when it comes to identifying DMRs. To our knowledge, our study is the first to provide an E-value computation tool and to evaluate its performance in various DMR detection tools for RRBS data. Our findings indicate that multiple testing correction may greatly influence the performance of DMR detection tools and highlight the applicability of E-values in identifying DMRs in RRBS studies. This provides valuable information to guide other researchers toward the advancement of sequence-based DMR analysis and may also provide an example for E-value incorporation that could lead to similar improvements for other areas of biological data analysis.

In addition to their superior performance, E-values are easy to interpret and compute. They can be directly interpreted as ‘betting scores’ or ‘likelihood ratios’, or can be interpreted as ‘how likely is the rejection to be true?’. Neither asymptotic analysis or full distribution of test statistics is required to derive E-values. On the contrary, it is the use of P-values and adjusted P-values that have fundamental limitations in these areas [23]. Neither P-values nor adjusted P-values can answer the question ‘how likely is the rejection to be true?’. Furthermore, the computing of P-values and adjusted P-values requires a known full distribution of test statistics and often requires asymptotic analysis. Without a large sample size, the computing P-values by asymptotic analysis is far from robust [23]. Therefore, it is not always easy to accurately compute P-values and adjusted P-values. In omics data analysis, it remains challenging to control the FDR using statistical methods that generate incorrect raw P-values [42, 43].

There have been different terms in literature that refer to the ‘E-value’, such as ‘betting score’, ‘Skeptic’s capital’ and ‘S-value’. They have been used interchangeably to express the same definition that represents a measure of the ratio of the observed likelihood falling into the null hypothesis to the likelihood on the entire parameter space. The E-value can also be interpreted as a Bayes factor. In the P-value guideline issued in 2016, the ASA also recommended the use of Bayes Factors as an alternative approach to prevent misuse and misconception of P-values [15]. Over recent years, researchers have adopted Bayes factor analysis to improve the accuracy of the identification of causal gene variants in genome-wide association studies [44]. In the present study, we have extended E-values to the detection of DMRs in RRBS data and provided an R package to calculate E-values for DMR detection tools including Metilene, MethylKit, BiSeq and DMRfinder. Our study suggests that the use of E-values may be a promising approach to address the multiple testing correction problems when using RRBS data to identify differentially methylated regions.

Several caveats should be acknowledged in our study. First, we considered just eight samples in each experimental group. Despite the small sample size in this study, RRBS experiments including animal studies and the clinical trial in our previous studies have often been performed with small number of biological replicates (ranging from a few to a dozen subjects), and this is considered sufficient to reveal the importance of epigenetic regulation involving DMR methylation changes [45–48]. Second, we used the default setting of each DMR detection tool, focusing mainly on the performance of P-values, adjusted P-values and E-values in each DMR detection tool. Third, we evaluated the performance of the E-value in the detection of DMR using RRBS data. This benchmark workflow would also be potentially applicable to identify DMRs in methylation array data and whole genome bisulfate sequencing data and is also be suitable for identifying differentially methylated CpG sites due to these DMR detection tools containing optional analysis of CpG sites. Overall, our benchmark workflow sheds light on potential general applications to other types of genomic data besides DNA methylation. However, the applicability toward the construction of E-values in other specific types of genomic data still requires further investigation.

CONCLUSIONS

In summary, we have proposed a method to calculate E-values for the detection of DMRs in RRBS data and provided the R package ‘metevalue’ as a user-friendly interface to implement E-value calculations into various DMR detection tools. Our comprehensive benchmarking analyses demonstrated that using E-values not only significantly improved accuracy, AUC and power, over that of P-values or adjusted P-values, but also reduced false discovery rates and type I errors. In applications upon real RRBS data, the use of E-values detected biologically more relevant DMRs and improved the negative association between DNA methylation levels and their associated gene expressions. Taken together, E-value is a superior alternative to P-value and its use in DNA methylation studies is likely to receive increasing attention related to genome-wide epigenetic analysis.

Key Points
  • An E-value computation tool is provided as an alternative framework to P-value and its adjustments in DNA methylation studies.

  • Using E-values not only significantly improved accuracy, area under ROC curve and power, but also reduced false discovery rates and type I errors compared to P-values or adjusted P-values.

  • Using E-values detected biologically more relevant differentially methylated regions (DMRs) in our real reduced representation bisulfite sequencing data.

  • Our findings highlight the applicability of E-values in the detection of DMRs and provide valuable information that may guide researchers in advancing the analysis of other types of genomic data.

ACKNOWLEDGEMENTS

We thank Ms Chao Bi at the Bioinformatics Core Facility of Zhejiang University School of Medicine, and Mathematics and Science College of Shanghai Normal University for providing computing capacity, Drs Tiantian Mao and Taizhong Hu at University of Science and Technology for hosting lectures introducing E-value theory, Dr Ruodu Wang at University of Waterloo for comments and suggestions on applying E-value theory, Christopher Wood of Zhejiang University for English grammar polishing, and the anonymous reviewers and editors for reading and commenting on the manuscript.

FUNDING

The Natural Science Foundation of Shanghai (20JC1413800), the Medical Health Science and Technology Key Project of Zhejiang Provincial Health Commission (WKJ-ZJ-2007), the Key Research & Development Program of Zhejiang Province of China (2021C03126), the National Natural Science Foundation of China (82072857), the National Institutes of Health (HL149620) and the American Heart Association (15SFRN23910002).

DATA AVAILABILITY

The real RRBS data and gene expression data from Dahl salt-sensitive (SS/JrHsdMcwiCrl, referred to as CRL) rats and the clinical trial on low-salt diet have been deposited in the National Genomics Data Center under the accession numbers OMIX004080 and OMIX004085, respectively (https://ngdc.cncb.ac.cn/omix). A set of simulated RRBS data has also been deposited as example data in the GitHub repository at https://github.com/yfyang86/metevalue-demos/tree/main/simulation.

CODE AVAILABILITY

The R package ‘metevalue’ can be accessed directly through The Comprehensive R Archive Network (CRAN) (https://cran.r-project.org). Its source code is also available in the GitHub repository at https://github.com/yfyang86/metevalue. Source codes for generating simulated RRBS data used in our study can be also found in the GitHub repository at https://github.com/yfyang86/metevalue-demos/tree/main/simulation.

AUTHORS’ CONTRIBUTION

P.L. and X.P. designed the study. Y.L. generated the RRBS simulation data from RRBSsim and prepared the algorithms for DMR detection tools. H.L. performed the data analysis. H.L., Y.Y. and X.P. developed the algorithm and its associated R package to calculate E-values. L.Z. and Y.Y. provided technical support for high performance computation. D.L.M., S.K. and M.L. generated real RRBS and RNA-seq data from CRL rats and from the clinical trial on low-salt diet. X.Z., R.Y., D.L.M., S.K. and M.L. discussed and commented upon the study. Y.Y., P.L. and X.P. wrote the manuscript. All authors commented and approved the manuscript.

Author Biographies

Yifan Yang is the R&D director for the Transwarp Technology (Shanghai) Co., Ltd, and a part-time graduate advisor of Shanghai Normal University. His research interests are survival analysis, bioinformatics, knowledge graphs and big data.

Haoyuan Liu is a master’s student at the Shanghai Center for Mathematical Science of Fudan University, which he joined after his undergraduate studies in the Department of Mathematics at Shanghai Normal University. His research interests are statistics and bioinformatics.

Yi Liu is a PhD student at Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine. Her research interests are statistics and bioinformatics.

Liyuan Zhou is a research scientist at Sir Run Run Shaw Hospital, Zhejiang University School of Medicine. She received her PhD in bioinformatics from Zhejiang University. Her research interests are complex trait analysis, bioinformatics and statistical genetics.

Xiaoqi Zheng is a professor in the Department of Mathematics at Shanghai Normal University. His research interests are bioinformatics and computational biology.

Rongxian Yue is a professor in the Department of Mathematics at Shanghai Normal University. His research interests are statistical design of experiments and statistical computing.

David L. Mattson is the department chair and professor in the Department of Physiology, Medical College of Georgia, Augusta University. His research interests include normal and pathophysiological regulation of renal function and arterial blood pressure.

Srividya Kidambi is a professor and chief in the Division of Endocrinology, Metabolism, and Clinical Nutrition at the Department of Medicine, Medical College of Wisconsin/Froedtert Hospital. Her research interests are treatment of conditions associated with severe uncontrolled blood pressure, blood pressure in young people and blood pressure secondary to hormonal disorders.

Mingyu Liang is a professor and vice chair for Interdisciplinary and Translational Research in the Department of Physiology, Kohler Co. Chair and Director of Center of Systems Molecular Medicine at the Medical College of Wisconsin. His research interests are (epi)genomics and precision medicine, regulatory RNA and cellular metabolism, as related to hypertension and cardiovascular and kidney diseases.

Pengyuan Liu is a professor at Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, and an Adjunct Professor at the Department of Physiology, Medical College of Wisconsin, USA. His research interests are non-coding RNA, bioinformatics and computational biology.

Xiaoqing Pan is a professor in the Department of Mathematics at Shanghai Normal University. Her research interests are statistical genetics and epigenomics in complex traits.

REFERENCES

1.

Korthauer
 
K
,
Kimes
 
PK
,
Duvallet
 
C
, et al.  
A practical guide to methods controlling false discoveries in computational biology
.
Genome Biol
 
2019
;
20
(
1
):
118
.

2.

Jafari
 
M
,
Ansari-Pour
 
N
.
Why, when and how to adjust your P values?
 
Cell J
 
2019
;
20
(
4
):
604
7
.

3.

Knight
 
R
,
Vrbanac
 
A
,
Taylor
 
BC
, et al.  
Best practices for analysing microbiomes
.
Nat Rev Microbiol
 
2018
;
16
(
7
):
410
22
.

4.

Meijer
 
RJ
,
Goeman
 
JJ
.
Multiple testing of gene sets from gene ontology: possibilities and pitfalls
.
Brief Bioinform
 
2016
;
17
(
5
):
808
18
.

5.

van den
 
Oord
 
EJ
.
Controlling false discoveries in genetic studies
.
Am J Med Genet B Neuropsychiatr Genet
 
2008
;
147b
(
5
):
637
44
.

6.

Gelman
 
A
,
Robert
 
CP
.
Revised evidence for statistical standards
.
Proc Natl Acad Sci U S A
 
2014
;
111
(
19
):
E1933
.

7.

Ziliak
 
ST
.
The Validus Medicus and a new gold standard
.
Lancet
 
2010
;
376
(
9738
):
324
5
.

8.

Tyanova
 
S
,
Temu
 
T
,
Cox
 
J
.
The MaxQuant computational platform for mass spectrometry-based shotgun proteomics
.
Nat Protoc
 
2016
;
11
(
12
):
2301
19
.

9.

Rothman
 
KJ
.
A show of confidence
.
N Engl J Med
 
1978
;
299
(
24
):
1362
3
.

10.

Greenland
 
S
,
Senn
 
SJ
,
Rothman
 
KJ
, et al.  
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
.
Eur J Epidemiol
 
2016
;
31
(
4
):
337
50
.

11.

Davis
 
ML
,
Huang
 
Y
,
Wang
 
K
.
Rank normalization empowers a t-test for microbiome differential abundance analysis while controlling for false discoveries
.
Brief Bioinform
 
2021
;
22
(
5
):bbab059.

12.

Huber
 
W
.
A clash of cultures in discussions of the P value
.
Nat Methods
 
2016
;
13
(
8
):
607
.

13.

Halsey
 
LG
,
Curran-Everett
 
D
,
Vowler
 
SL
,
Drummond
 
GB
.
The fickle P value generates irreproducible results
.
Nat Methods
 
2015
;
12
(
3
):
179
85
.

14.

Sterne
 
JA
,
Davey Smith
 
G
.
Sifting the evidence-what’s wrong with significance tests?
 
BMJ
 
2001
;
322
(
7280
):
226
31
.

15.

Nuzzo
 
R
.
Scientific method: statistical errors
.
Nature
 
2014
;
506
(
7487
):
150
2
.

16.

Lew
 
MJ
.
Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know P
.
Br J Pharmacol
 
2012
;
166
(
5
):
1559
67
.

17.

Ioannidis
 
JP
.
Why most discovered true associations are inflated
.
Epidemiology
 
2008
;
19
(
5
):
640
8
.

18.

Ioannidis
 
JP
.
Contradicted and initially stronger effects in highly cited clinical research
.
JAMA
 
2005
;
294
(
2
):
218
28
.

19.

Wasserstein
 
RL
,
Lazar
 
NA
.
The ASA statement on P-values: context, process, and purpose
.
The American Statistician
 
2016
;
70
(
2
):
129
33
.

20.

Amrhein
 
V
,
Greenland
 
S
,
McShane
 
B
.
Scientists rise up against statistical significance
.
Nature
 
2019
;
567
(
7748
):
305
7
.

21.

Murtaugh
 
PA
.
In defense of P values
.
Ecology
 
2014
;
95
(
3
):
611
7
.

22.

Wasserstein
 
RL
,
Schirm
 
AL
,
Lazar
 
NA
.
Moving to a world beyond “p < 0.05”
.
The American Statistician
 
2019
;
73
(
sup1
):
1
19
.

23.

Vovk
 
V
,
Wang
 
R
.
E-values: calibration, combination and applications
.
The Annals of Statistics
 
2021
;
49
(
3
):
1736
, 1719–
54
.

24.

Yin
 
L
,
Dai
 
Y
,
Jiang
 
X
, et al.  
Role of DNA methylation in bisphenol a exposed mouse spermatocyte
.
Environ Toxicol Pharmacol
 
2016
;
48
:
265
71
.

25.

Plongthongkum
 
N
,
Diep
 
DH
,
Zhang
 
K
.
Advances in the profiling of DNA modifications: cytosine methylation and beyond
.
Nat Rev Genet
 
2014
;
15
(
10
):
647
61
.

26.

Meissner
 
A
,
Gnirke
 
A
,
Bell
 
GW
, et al.  
Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis
.
Nucleic Acids Res
 
2005
;
33
(
18
):
5868
77
.

27.

Gu
 
H
,
Bock
 
C
,
Mikkelsen
 
TS
, et al.  
Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution
.
Nat Methods
 
2010
;
7
(
2
):
133
6
.

28.

Liu
 
Y
,
Han
 
Y
,
Zhou
 
L
, et al.  
A comprehensive evaluation of computational tools to identify differential methylation regions using RRBS data
.
Genomics
 
2020
;
112
(
6
):
4567
76
.

29.

Sun
 
X
,
Han
 
Y
,
Zhou
 
L
, et al.  
A comprehensive evaluation of alignment software for reduced representation bisulfite sequencing data
.
Bioinformatics
 
2018
;
34
(
16
):
2715
23
.

30.

Sun
 
X
,
Yi
 
J
,
Yang
 
J
, et al.  
An integrated epigenomic-transcriptomic landscape of lung cancer reveals novel methylation driver genes of diagnostic and therapeutic relevance
.
Theranostics
 
2021
;
11
(
11
):
5346
64
.

31.

Abais-Battad
 
JM
,
Alsheikh
 
AJ
,
Pan
 
X
, et al.  
Dietary effects on Dahl salt-sensitive hypertension, renal damage, and the T lymphocyte transcriptome
.
Hypertension
 
2019
;
74
(
4
):
854
63
.

32.

Dasinger
 
JH
,
Alsheikh
 
AJ
,
Abais-Battad
 
JM
, et al.  
Epigenetic modifications in T cells: the role of DNA methylation in salt-sensitive hypertension
.
Hypertension
 
2020
;
75
(
2
):
372
82
.

33.

Kidambi
 
S
,
Pan
 
X
,
Yang
 
C
, et al.  
Dietary sodium restriction results in tissue-specific changes in DNA methylation in humans
.
Hypertension
 
2021
;
78
(
2
):
434
46
.

34.

Roberts
 
ML
,
Kotchen
 
TA
,
Pan
 
X
, et al.  
Unique associations of DNA methylation regions with 24-hour blood pressure phenotypes in black participants
.
Hypertension
 
2022
;
79
(
4
):
761
72
.

35.

Liu
 
P
,
Liu
 
Y
,
Liu
 
H
, et al.  
Role of DNA de novo (de)methylation in the kidney in salt-induced hypertension
.
Hypertension
 
2018
;
72
(
5
):
1160
71
.

36.

Liu
 
Y
,
Liu
 
P
,
Yang
 
C
, et al.  
Base-resolution maps of 5-methylcytosine and 5-hydroxymethylcytosine in Dahl S rats: effect of salt and genomic sequence
.
Hypertension
 
2014
;
63
(
4
):
827
38
.

37.

Krueger
 
F
,
Andrews
 
SR
.
Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications
.
Bioinformatics
 
2011
;
27
(
11
):
1571
2
.

38.

Jühling
 
F
,
Kretzmer
 
H
,
Bernhart
 
SH
, et al.  
Metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data
.
Genome Res
 
2016
;
26
(
2
):
256
62
.

39.

Akalin
 
A
,
Kormaksson
 
M
,
Li
 
S
, et al.  
methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles
.
Genome Biol
 
2012
;
13
(
10
):
R87
9
.

40.

Hebestreit
 
K
,
Dugas
 
M
,
Klein
 
H-U
.
Detection of significantly differentially methylated regions in targeted bisulfite sequencing data
.
Bioinformatics
 
2013
;
29
(
13
):
1647
53
.

41.

Hochberg
 
Y
.
A sharper Bonferroni procedure for multiple tests of significance
.
Biometrika
 
1988
;
75
(
4
):
800
2
.

42.

Benjamini
 
Y
,
Hochberg
 
Y
.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Stat Soc B Methodol
 
1995
;
57
(
1
):
289
300
.

43.

Sham
 
PC
,
Purcell
 
SM
.
Statistical power and significance testing in large-scale genetic studies
.
Nat Rev Genet
 
2014
;
15
(
5
):
335
46
.

44.

Morisawa
 
J
,
Otani
 
T
,
Nishino
 
J
, et al.  
Semi-parametric empirical Bayes factor for genome-wide association studies
.
Eur J Hum Genet
 
2021
;
29
(
5
):
800
7
.

45.

Sharp
 
AJ
,
Migliavacca
 
E
,
Dupre
 
Y
, et al.  
Methylation profiling in individuals with uniparental disomy identifies novel differentially methylated regions on chromosome 15
.
Genome Res
 
2010
;
20
(
9
):
1271
8
.

46.

Lister
 
R
,
Pelizzola
 
M
,
Dowen
 
RH
, et al.  
Human DNA methylomes at base resolution show widespread epigenomic differences
.
Nature
 
2009
;
462
(
7271
):
315
22
.

47.

Laporte
 
M
,
le Luyer
 
J
,
Rougeux
 
C
, et al.  
DNA methylation reprogramming, TE derepression, and postzygotic isolation of nascent animal species
.
Sci Adv
 
2019
;
5
(
10
):
eaaw1644
.

48.

Grimm
 
SA
,
Shimbo
 
T
,
Takaku
 
M
, et al.  
DNA methylation in mice is influenced by genetics as well as sex and life experience
.
Nat Commun
 
2019
;
10
(
1
):
305
.

Author notes

Yifan Yang, Haoyuan Liu and Yi Liu contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Supplementary data