-
PDF
- Split View
-
Views
-
Cite
Cite
Kelly P. Stanton, Jiaqi Jin, Roy R. Lederman, Sherman M. Weissman, Yuval Kluger, Ritornello: high fidelity control-free chromatin immunoprecipitation peak calling, Nucleic Acids Research, Volume 45, Issue 21, 1 December 2017, Page e173, https://doi.org/10.1093/nar/gkx799
- Share Icon Share
Abstract
With the advent of next generation high-throughput DNA sequencing technologies, omics experiments have become the mainstay for studying diverse biological effects on a genome wide scale. Chromatin immunoprecipitation (ChIP-seq) is the omics technique that enables genome wide localization of transcription factor (TF) binding or epigenetic modification events. Since the inception of ChIP-seq in 2007, many methods have been developed to infer ChIP-target binding loci from the resultant reads after mapping them to a reference genome. However, interpreting these data has proven challenging, and as such these algorithms have several shortcomings, including susceptibility to false positives due to artifactual peaks, poor localization of binding sites and the requirement for a total DNA input control which increases the cost of performing these experiments. We present Ritornello, a new approach for finding TF-binding sites in ChIP-seq, with roots in digital signal processing that addresses all of these problems. We show that Ritornello generally performs equally or better than the peak callers tested and recommended by the ENCODE consortium, but in contrast, Ritornello does not require a matched total DNA input control to avoid false positives, effectively decreasing the sequencing cost to perform ChIP-seq. Ritornello is freely available at https://github.com/KlugerLab/Ritornello.
INTRODUCTION
Reliable and precise characterization of where proteins, such as transcription factors (TFs), interact with the genome, enables biologists to understand how gene expression is regulated at the molecular level. The human genome, for example, encodes about 1500 TFs (1) and many of them directly recognize and bind to specific DNA sequences to regulate gene expression. Therefore, identification of where each TF binds to the DNA is critical for reconstructing the complex regulatory network of gene expression. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) is a powerful tool for detecting protein–DNA interactions at the genome-wide scale and has become the method of choice. In a ChIP-seq experiment, first, proteins interacting with the DNA are chemically attached to the DNA using formaldehyde-mediated crosslinking. Then the DNA is fragmented into short pieces and antibodies specifically targeting the protein of interest are used to pull down DNA fragments bound by that protein. Finally, the immunoprecipitated DNA fragments are released from the protein of interest and subjected to high-throughput DNA sequencing. The resulting sequenced reads are mapped to a reference genome and computational peak calling algorithms are applied to process mapped reads and infer protein-binding positions.
TFs usually bind to short specific DNA sequences (motifs) and generate sharp point-source peaks (2). For most ChIP-seq experiments currently available, only one of the two 5′ ends of each double-stranded DNA fragment has been sequenced (single end sequencing), so the read coverage near the point-source peaks follow a characteristic bimodal shape. However, calling peaks accurately from a large quantity of mapped reads is nontrivial and over 40 algorithms have been developed (3–44) since the ChIP-seq technology was first introduced (45). Peak calling remains challenging due to the presence of artifactual-binding events (false positives) and background noise from reads outside of peaks, multi-binding events with overlapping read contributions and variability of experimental quality. Additionally, for most peak calling algorithms, matched negative controls, which are usually DNA samples obtained without performing immunoprecipitation (total DNA input control) or immunoprecipitated by non-specific antibodies (IgG control), are often required to control the false positive rate (FPR).
Performing a negative control experiment for each sample, effectively doubles the sequencing cost of ChIP-seq, limiting the number of samples that can be run per experiment. Peak calling algorithms that do not use the control (including those that have the option to run with or without it) have been developed; however, they underperform due to the lack of a detailed characterization of ChIP-seq signal and noise.
Binding events can also occur in close proximity to one another and it is often difficult to resolve how many binding sites are present and precisely where binding occurs. BRACIL (46) and CSDeconv (47) use blind deconvolution algorithms to resolve individual peaks at multi-binding loci but are not scalable for peak calling and are thus used for post processing when peaks have been identified by other peak callers. GEM (23) incorporates de novo motif discovery into the peak identification process aiding in resolving individual peaks, but may not be suitable if the TF of interest does not bind to DNA directly or does not have any specific motif.
ChIP-seq experiments can also be of varying quality. Collective efforts by large consortia have provided guidelines on how to evaluate the quality and signal-to-noise ratio of ChIP-seq experiments. The opposing strand cross-correlation between the read coverage on the positive and that on the negative strands has been used to assess experimental quality by ENCODE (2). The cross-correlations of ChIP-seq as well as input control experiments exhibit two modes, one at or around their respective average fragment lengths and an additional one at or around their respective read lengths. High quality experiments tend to have a greater contribution from the fragment length mode, while low quality experiments and input controls tend to have a larger contribution from the read length mode. Specifically, to assess the quality of ChIP-seq experiments, ENCODE recommends two metrics, the normalized strand coefficient (NSC) and the relative strand correlation (RSC)(2). The NSC is the ratio between the the fragment length mode and the baseline for large offsets of cross-correlation. The RSC is the ratio between the fragment length mode and the read length mode of the cross-correlation. If the NSC or RSC scores are low, indicating poor experimental quality, ENCODE recommends repeating the experiment. Given the considerable cost of repeating a ChIP-seq experiment, it is useful to be able to ‘rescue’ samples with suboptimal quality for use as additional replicates, rather than discarding them.
Here, we present Ritornello, a novel algorithm for finding binding sites of TFs. Ritornello is based on both digital signal processing (DSP) and statistical techniques. In the current work, we contribute the following innovations and insights:
a peak caller, that does not require a matched control and still maintains a low FPR, outperforming even algorithms that use the control.
an efficient method to perform full deconvolution of multi-binding events on a genome wide scale.
samples of low quality can be ‘rescued’, instead of being discarded.
a rigorous characterization of the binding signals and artifacts in the presence of noise in ChIP-seq data.
a non-parametric approach to calculate the fragment length distribution (FLD) for any single-end NGS experiment.
We benchmarked Ritornello against MACS2 (3) and GEM (23), two algorithms recommended by the ENCODE consortium (2). In the default modes each requires the matched control. We demonstrated that Ritornello, a matched control free method, outperformed MACS2 and GEM.
MATERIALS AND METHODS
We have developed Ritornello to find candidate peaks efficiently, with minimal use of memory and computation time, by using a DSP technique called a matched filter, classify candidate peaks as true-binding events or artifacts based on their shape and finally test candidate-binding positions for significance based on comparison to a model absent of binding at that position. The scheme of the Ritornello method is detailed in Figure 1.
![Overview of the Ritornello approach. Step 1: the reads are mapped to the genome and the distributions of read starts $\Pr [R^p]$ and $\Pr [R^n]$ are calculated. Step 2: the FLD $\Pr [F]$ is calculated using only those areas identified as background coverage (free of peaks and artifacts). Step 3: the expected distribution of reads around a binding event is calculated from a model including the FLD $\Pr [F]$ as well as the distribution of relative-binding positions within fragments $\Pr [K]$. Initially, K is modeled with a uniform distribution or equivalently K ∼ β(α, α) with α = 1. Step 4: the expected distribution of reads around a binding event is used to locate candidate-binding event peaks (e.g. bk − 1, bk and bk + 1) by identifying the positions with the highest match with impulse response function. Step 5: candidate-binding events are classified as either read length artifacts (e.g. bk + 1) or are retained as putative-binding events (e.g. bk − 1 and bk) based on the shape of the local cross-correlation between opposing strands. Step 6: the binding intensity, $\beta ^p_k+\beta ^n_k$, of each peak, bk are deconvolved from the mixture of local peaks bk − 1 and background noise using a maximum likelihood approach. The likelihood ratio test is then applied to determine the significance of the peak. Step 7: the distribution of relative-binding positions within fragments, $\Pr [K]$, is updated using a maximum likelihood estimate of α, where K ∼ β(α, α). This is obtained using a combined likelihood model for the top 200 most significant peaks. Steps 3–6 are repeated using the new $\Pr [K]$.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/nar/45/21/10.1093_nar_gkx799/2/m_gkx799fig1.jpeg?Expires=1747987861&Signature=N4bfZxnHFPxTuyFJVDb0xpqysuVRzIfkdMfNBtNKSwVLIWy8RN5N--sFFKoZoED2IfR--QkzS5tj~PusbGK1AoSMYJF1C8kwqdTWM43KDZV1UwJJOgOJUNtywLo6FhFqdPEBWMDjZB6WGkPLc7S7L3Dj~qqH5xR6VdJIKxlUN8VZVayYn-QrwHK3yB5Dp5Yzs-d3ehmCqYMmQqTID8JlrImLf05ibehxhB-cenZRLNGD1RDMBzurSW7S4rGHFrCwypVlmWYmy5zJOOOvKAB~YkgV2CrZ7vA0LRilmKScePtopWXhSUAaPj2v8cyd5z0o2ihOZNenq1jZ6r9wAzu5pw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Overview of the Ritornello approach. Step 1: the reads are mapped to the genome and the distributions of read starts |$\Pr [R^p]$| and |$\Pr [R^n]$| are calculated. Step 2: the FLD |$\Pr [F]$| is calculated using only those areas identified as background coverage (free of peaks and artifacts). Step 3: the expected distribution of reads around a binding event is calculated from a model including the FLD |$\Pr [F]$| as well as the distribution of relative-binding positions within fragments |$\Pr [K]$|. Initially, K is modeled with a uniform distribution or equivalently K ∼ β(α, α) with α = 1. Step 4: the expected distribution of reads around a binding event is used to locate candidate-binding event peaks (e.g. bk − 1, bk and bk + 1) by identifying the positions with the highest match with impulse response function. Step 5: candidate-binding events are classified as either read length artifacts (e.g. bk + 1) or are retained as putative-binding events (e.g. bk − 1 and bk) based on the shape of the local cross-correlation between opposing strands. Step 6: the binding intensity, |$\beta ^p_k+\beta ^n_k$|, of each peak, bk are deconvolved from the mixture of local peaks bk − 1 and background noise using a maximum likelihood approach. The likelihood ratio test is then applied to determine the significance of the peak. Step 7: the distribution of relative-binding positions within fragments, |$\Pr [K]$|, is updated using a maximum likelihood estimate of α, where K ∼ β(α, α). This is obtained using a combined likelihood model for the top 200 most significant peaks. Steps 3–6 are repeated using the new |$\Pr [K]$|.
Derive fragment length distribution via deconvolution
For fragments overlapping a binding position, the positive strand mapping reads will be upstream of the binding site whereas negative strand reads will be downstream of the site. As such, the distance between a read and the binding position is dependent on the fragment length, which most peak calling algorithms must estimate to obtain accurate predictions for binding locations. Specifically, most current peak callers only estimate the average fragment length rather than the whole distribution. Our first innovation for Ritornello is calculating, not just the mean fragment length, but the entire sample specific empirical FLD from single-end reads (step 2 of Figure 1). Ritornello utilizes this FLD, as a key component, for more accurate peak predictions, which we will describe in detail below.
We note that the local coverage on either Rp or Rn is unlikely to be uniform when the FLD changes as a function of genomic position on that strand. Therefore, to estimate the fragment lenght distribution, we exclude regions that clearly do not satisfy the assumptions made in Equations (1) and (3) by retaining only regions whose coverage is roughly locally uniform for Rp in Equation (1) and Rn in Equation (3).
Why the fragment length distribution can be inferred from single end data?
We have just shown that the fragment length distribution can be derived via deconvolution from |$\Pr [R^P]$| and |$\Pr [R^n]$| when those distributions are related by Equations (1) or (3). This is intuitive in paired-end sequencing. However, in single-end sequencing, only one end is randomly selected from each fragment, and it is hard to imagine how the fragment length information can be inferred. For a given set of fragments, single end sequenced reads are a sub-sample of paired-end sequenced reads. Thus, |$\Pr [R^P]$| and |$\Pr [R^n]$|, the only inputs to Equations (2) or (4), should be the same for the single-end reads as that in the paired-end, resulting in similar estimated FLDs, |$\Pr [F]$|.
To directly demonstrate that the computed |$\Pr [F]$| of singled-end data is a reasonable approximation for the FLD, we have applied Equations (6) and (7) to a pseudo single-end data created by randomly sampling one end from each fragment of a paired-end experiment. In Figure 2, we show that the FLD calculated using Equations (6) and (7) from this pseudo single-end data (black) is similar to the true FLD of the paired-end sample (green).
![Ritornello captures the FLD from single-end sequencing data. Given a set of fragments, singled end sequenced reads are a subsample of paired-end sequenced reads. When enough single end fragments are sampled the distribution of read coverage on the positive and negative strands $\Pr [R^p]$ and $\Pr [R^n]$ are equivalent to their paired-end counterparts. A single read from each paired-end fragment of an EZH2 sample was randomly chosen to simulate singled end data. The FLD calculated by Ritornello from this data (black curve) closely approximates the true FLD (green curve).](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/nar/45/21/10.1093_nar_gkx799/2/m_gkx799fig2.jpeg?Expires=1747987861&Signature=Wq6qzJWmdzYrMhQ-XDa-~5QVN3KY4STetbgqWieRUqSFSdfxEYjAQtECCoE4LnZzzh1tqYVTg-wtgPrcRxiJzkbefW3vcCswjTbqalirHWZ79Kn-HAs3FQ~VqqdWhFrKvKJ318Me~UYs9LHrJ5GTKDMytOjnzYSWj4~0AIj6sC6k7uu6OFPKXr3bMWyvFL8BxzgU2hOE0tfBXYSYb9SY6ZL97JPo45LZzJHuTphQh51afja1OkHeAcThx6iqp8aV5YKZKrVASS4SQUqYjSSfUi02FNPB8uxQblZkklc433T32PfLY2G3ndqEBQFZfs9XT9XS5GLQIOaWKQapTrcrkQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Ritornello captures the FLD from single-end sequencing data. Given a set of fragments, singled end sequenced reads are a subsample of paired-end sequenced reads. When enough single end fragments are sampled the distribution of read coverage on the positive and negative strands |$\Pr [R^p]$| and |$\Pr [R^n]$| are equivalent to their paired-end counterparts. A single read from each paired-end fragment of an EZH2 sample was randomly chosen to simulate singled end data. The FLD calculated by Ritornello from this data (black curve) closely approximates the true FLD (green curve).
Local fragment length distribution varies around binding events
The fragments generated from a binding event overlap the binding site, thus any given fragment must be at least as long as the distance from its start position to the binding site. This creates dependence between the fragment length and genomic position on both strands because reads that are further from the binding site are necessarily longer on aggregate. Consequently, neither the positive nor the negative strand read coverage is independent of the fragment length within a binding event, making it inappropriate to recover the FLD using Equation (2), as shown by simulation in Figure 3D. In contrast, in simulated event free regions, it is shown that the FLD can be correctly recovered using Equation (2) as seen in Figure 3A.

The presence of binding events and read length artifacts hinders reconstruction of the FLD by deconvolution. Coverage patterns (positive strand in red and negative strand in blue) were generated from reads sampled randomly from either end of simulated fragments. Equation (2) was applied to infer the FLD, FLD (black). The actual FLD was calculated from the simulated fragments (green). The read length is shown in gray. (A) The FLD, inferred from reads in simulated binding regions, deviates from the true FLD. (B) The FLD, inferred from genomic background coverage outside of binding events, agrees with the true FLD. (C) In the presence of read length column artifacts, the inferred FLD deviates from the true FLD and exhibits a ‘phantom peak’ at read length. (D) In the presence of read length missing-column artifacts, the inferred FLD deviates from the true FLD and exhibits a ‘phantom peak’ at read length.
Local fragment length distribution varies around read length artifacts
In addition to binding events, we have observed local artifactual patterns that create dependence between fragment length and genomic position on both strands, preventing the reconstruction of FLD. These artifactual patterns fall into the following two categories:
a pile of reads whose start positions constitute a read length width column pattern on the positive strand, followed by a read length width column pattern on the negative strand, a read length downstream. We refer to this artifact as a column artifact. We simulate it in Figure 3B and show it in Figure 4A.
a binding peak (or background read coverage) but with a read length width column pattern of missing reads on the positive strand followed by a read length width column pattern of missing reads on the negative strand, a read length downstream. We refer to this as a missing-column artifact. We simulate it in Figure 3C and show it in Figure 4B.

Column and missing-column artifacts and the nonrandom FLD in their neighborhoods. Examples of read length column (A) and missing-column (B) artifacts in a single-end human GM12878 cell anti-Serum Response Factor (SRF) ChIP sample on chromosome 1. Examples of a read length column (C) and missing-column artifact (D) in a paired-end MEF input control sample on chromosomes 1 and 18 respectively. Positive strand reads are shown in red while negative strand reads are shown in blue. The artifact center positions x where the sample genome differs from the reference are marked with asterisks. The paired-end scatterplots show each read’s position (x-axis) and associated fragment length (y-axis) to demonstrate the dependence relationship between genomic position and fragment length. Every positive strand read (red point) is accompanied by a negative strand read (blue point) originating from the same fragment. For clarity, in the paired-end column artifact plot (C), we have plotted each fragment and separated them to two groups such that in one group all fragments have positive strand reads within the positive strand column (light red column) and in the other group all fragments have negative strand reads within the negative strand column (light blue column). Explicitly, reads from fragments contributing positive strand columns are shown between the pink lines, while those from fragments contributing to the negative strand column are shown between cyan lines. The paired-end missing-column artifact (D) has a column of missing reads on the positive strand followed by a column of missing reads on the negative strand. The positive strand column of missing reads is linked to a diagonal of missing reads on the negative strand, representing the associated missing downstream fragment ends and are together outlined in light red. Similarly, the negative strand column of missing reads is linked to a diagonal of missing reads on the positive strand, representing the associated missing upstream fragment ends, and are together outlined in light blue. We highlight two fragments to demonstrate the coupling between reads on the positive strand and reads on the negative strand.
Based on paired end data, we have observed that the FLD is invariant with respect to genomic position with the exception of these artifacts and binding events which Ritornello excludes from the FLD calculation using Equations (6) and (7). Both of these artifacts cause local disturbances in the FLD. This is easier to see in paired-end sequenced reads. The paired-end column artifact shown in Figure 4C contains fragments with length distributed according to the sample’s FLD. However, the fragments are organized such that longer fragments extend further from center of the artifact (denoted with an asterisk) than shorter fragments, implying F is dependent on Rp and Rn. The paired end missing-column artifact shown in Figure 4D is composed of fragments organized such that the range of possible fragment lengths is restricted based on genomic position. Specifically, on the positive strand, lengths of fragments (such as fragments A and B) must lie outside the range [dp, dp + |$w$|] where dp is the distance from the positive strand read to the center of the missing-column artifact (denoted with an asterisk). Likewise, on the negative strand, the lengths of fragments (such as fragments C and D) must lie outside the range [dn, dn + |$w$|] where dn is the distance from the negative strand read to the center of the missing-column artifact (denoted with an asterisk). Thus, column and missing-column artifacts are composed of fragments organized such that both strands are dependent on fragment length.
We note that if we were using Equations (2) or (4) without invoking Equations (5)–(8) at these artifactual regions, the resulting FLD would have two modes, one associated with the ‘phantom peak’ near the read length, and one associated with the predominant fragment length. ENCODE observed these two modes in the opposing strand cross-correlation, which is related to the FLD (Equations (1) and (3)). However, when these artifactual regions are filtered out as is done in equations (5)–(8), the ‘phantom peak’ is greatly attenuated. Thus, these artifactual areas give rise to a low quality RSC (2,49), the ENCODE measures of the ‘phantom peak’ using cross-correlation. The cause of these artifacts is likely incorrect mapping which is described in the next section.
Incorrect mapping leads to read length artifacts
The read length used in any sequencing experiment is determined subsequent to the collection of fragments. Therefore, read length artifacts are associated with sequencing or post-sequencing procedures. Each read in a ChIP-seq experiment is sequenced from the sample’s genomic DNA, which can differ from the reference genome used in the alignment step. Comparative genome assembly algorithms used for sequence alignment work by comparing the nucleotide sequence for each read to the sequence of the reference genome and assigning the read to the location that gave the best alignment score, usually based on fuzzy string matching. If the nucleotide sequence of the sample’s genome, from which the reads are sampled, disagrees with the reference genome at location x, the mapping algorithm can fail to assign the appropriate reads to that location. This could occur if the number of mismatched bases (or indels) per read exceeds a predetermined cutoff or simply if the reads belonging to that location map better to another region with higher sequence similarity. When this occurs there will be a discontinuity in coverage across |$w$| nucleotides (where |$w$| is the read length) because there are exactly |$w$| possible read start positions where the read would overlap the mismatched base (or indel). On the positive strand the discontinuity is upstream of x and on the negative strand the discontinuity is downstream of x. This results in an missing-column artifactual coverage pattern of incorrectly mapped reads to the upstream sequence highlighted in yellow as seen in Figure 5. Further, reads that fail to map to the correct location can instead map to another location with higher sequence similarity as determined by the mapping algorithm. This would result in a column artifact as seen in the downstream sequence highlighted in yellow in Figure 5. These artifacts tend to occur in interspersed repetitive regions such as the sequences shown in yellow in Figure 5.

Read length artifacts likely stem from mapping problems. Reads that map to their correct locations do not give rise to artifacts (left). Differences between the reference and sequenced genomes can produce missing-column artifacts and additionally the coverage can be relocated to a region of higher sequence similarity forming a column artifact (right).
For paired-end each relocated read in a column artifact, the associated read from the same fragment is also relocated as shown in Figure 4C. Likewise for each missing read in a missing-column artifact, the associated read from the same fragment is also missing as shown in Figure 4D. In principle, the missing column artifact problem may be mitigated by suggesting multiple possible mappings (which can be done using the bowtie −k option or efficiently using other algorithms (50)). However, while this may fill in some of the missing columns, it will create additional column artifacts, which would need to be resolved.
Derive the expected read coverage distribution around binding events
Matched filtering for rapid and accurate localization of candidate peaks
TFs bind to a small fraction of the genome, thus to improve efficiency we only test candidate peak positions (step 6 of Figure 1) that closely match our expected peak shape (|$\Pr [-FK]$| and |$\Pr [FK]$|), as identified by a matched filter (51) (step 4 of Figure 1).
Usually γ is chosen to normalize the expected power of the noise after application of the filter to one. However, a given γ that normalizes the noise in low coverage areas to one, necessarily will give higher power in areas of higher coverage. Therefore, it is infeasible to specify a single γ when the noise is not globally stationary. As a result, the standard way of detecting the desired output signal by thresholding using a fixed signal to noise ratio is not applicable.
Remove false positive-binding events using cross-correlation
Genomic regions that have similarly high read coverage in both the ChIP sample and the negative control are false positive-binding events. Most current peak calling algorithms rely on negative controls (usually total DNA input) to control the FPR. To the best of our knowledge, if negative controls are not available, false positive events would not be filtered out by current peak calling algorithms. We have discovered that most of the significant false positive events are in fact the read length artifacts described earlier and exemplified in Figure 6. We have already shown that the cross-correlations and deconvolved FLDs of regions with read length artifacts exhibit the ‘phantom peak’ at read length (Figure 3).

Local cross-correlation differentiates true-binding events from read length artifacts (false positives). (A) Local cross-correlation of a binding event in anti-ATF2 K562 ChIP sample peaks near the average fragment length. (B) Local cross-correlation of a column artifact in anti-ATF2 K562 ChIP sample peaks near the read length. (C) A scatterplot of the maximum local log cross-correlation up to 10 bp beyond the read length versus the maximum local log cross-correlation in the range of 10 bp beyond the read length to 0.75Fmax (D) A scatterplot of the maximum local binarized cross-correlation (using unique reads) up to 10 bp beyond the read length versus the maximum local binarized cross-correlation in the range of 10 bp beyond the read length to 0.75Fmax.
Ritornello identifies regions containing false positive events by detecting the ‘phantom peak’ in their cross-correlations; true-binding events contain no such ‘phantom peak’ in thier cross-correlations. To this end, we employ a machine learning approach, building a classifier to distinguish between read length artifacts and true-binding events. To extract features for the classifiers, we calculated cross-correlation locally from bj − Fmax to bj + Fmax around each candidate peak. The features include: (i) the maximum value of the cross-correlation function in the range between zero and read length, which is denoted by c1 and (ii) the maximum value of the cross-correlation function in the range between the read length and the maximum fragment length Fmax, which is denoted by c2. In the neighborhoods of binding events c2 is expected to be higher than c1, whereas in neighborhoods of read-length artifacts we expect c1 to be larger than c2. We added additional features to account for consecutive read length artifacts of varying amplitudes and large amplifications such as due to polymerase chain reaction (PCR). For this purpose, we binarized the coverage in the positive and negative strands by setting positions with read count >0 to 1. We then performed a running mean smoothing on the binarized coverage, calculated cross-correlation and extracted the following features: (i) the maximum value of the binarized smoothed cross-correlation function in the range between zero and read length, which is denoted by d1 and (ii) the maximum value of the binarized smoothed cross-correlation function in the range between the read length and the maximum fragment length Fmax, which is denoted by d2.
We build a classifier using logistic regression, with features: |$\lbrace \frac{c_2}{c_1},\frac{d_2}{d_1}\rbrace$|. The instances used to build this classifiers include manually classified peaks obtained as follows: we first applied MACS2 (negative control free mode) to four TF ChIP-seq datasets generated by ENCODE, subsequently selected the top 200 peaks for each of four samples and finally manually labeled regions with typical binding shape as true positives (see Figure 6A) and regions with characteristic read length artifact as false positives (see Figure 6B). We trained this model using a five fold cross-validation and achieved high performance with AUROC of 0.993. This set of features is scale-free and thus our trained classifier is generalizable to any ChIP-seq sample and does not need to be retrained. Ritornello incorporates this trained classifier (step 5 of Figure 1) to flag artifactual locations as false positives without the need for a paired total DNA input or IgG control.
Deconvolving single events from local coverage
The read coverage near an event is a mixture of reads generated by that event, noise and any neighboring events. In order to accurately quantify each event, it is essential to deconvolve its binding intensity (number of reads originating from each event) from this mixture. Fragments originating from different events in close proximity may overlap. Consequently, it is difficult to quantify the number of reads coming from each binding event. Further, fragments originating from non ChIP-ed DNA, off targeted sequencing due to antibody inefficiency, as well as other sources, contribute to background noise. We model the read coverage around each candidate peak using a generalized linear model to deconvolve its binding intensity.
The relationships in Equations (23) and (25) use the following theorem: if X1…Xn are independent Poisson distributed random variables, Pois(λ1)…Pois(λn), then their sum X1 + … + Xn is Poisson distributed, Pois(λ1 + … + λn).
We then find the maximum likelihood estimates for parameters |$\beta ^p_q$| and |$\beta ^n_q$|. The sum, |$\beta ^p_j + \beta ^n_j$|, is reported as the binding intensity for the peak at bj.
We note that this is formally a Poissson generalized linear model with identity link function. Such a model has the advantage that it can resolve multiple peaks in close proximity, such as double-binding or triple-binding events. To our knowledge only BRACIL (46) and CSDeconv (47) are designed to deconvolve adjacent-binding events, in particular double-binding events and use different models. Those two algorithms are less efficient and therefore require as input a set of peaks from other peak callers.
Ritornello implements a dogleg optimization (the Newton–Raphson method coupled with initial gradient descent), which is much faster than traditional Expectation Maximization or Markov Chain Monte Carlo methods (52), enabling the rapid deconvolution of all loci detected in previous steps.
Testing candidate peaks for significance
In the previous section we quantified the intensity, βj, of each candidate peak. Here we determine the significance of each of these candidate-binding events using a likelihood ratio test based on the likelihood we derived in Equations (27). The null model |$H^0_j$| is obtained by setting both |$\beta ^p_j=0$| and |$\beta ^n_j=0$|. We note that we use the term null model at each position bj to refer to the model involving a zero-binding intensity at bj but with potentially nonzero |$\beta _{q_0}^p$| and |$\beta _{q_0}^n$| as well as |$\beta _q^p$| and |$\beta _q^n$| in neighboring candidate events. The alternative model |$H^1_j$| uses full parameterization including non vanishing |$\beta _j^p$| and |$\beta _j^n$| at the location of interest bj.
Up to this point, we have obtained an initial list of putative-binding events. This was based on inferring an impulse response function given by the product distribution of the FLD and a uniformly distributed K (step 3 Figure 1). To further refine the impulse response function, we find the estimate of α that maximizes the combined likelihood of the 200 most significant putative events (step 7 of Figure 1). As shown in Figure 7, the |$\Pr [-FK]$| and |$\Pr [FK]$| derived from this procedure closely match the shape of highly abundant peaks. Finally, we repeat the peak identification, artifact testing, and likelihood ratio testing (steps 4–6 of Figure 1) using the updated hp and hn and report final list of significant peaks.

The parameterized filter closely matches the peak shape. Shown is an example peak in a human SRF sample. The parameterized filter for this sample is shown in gray. The peak location as determined by the filter is shown by a dashed line.
FDR correction
To control the false discovery rate, we adjust the P-values, pi, associated with each hypothesis test by applying the Benjamini–Hochberg (54) approach to obtain q-values, qi, which control the FDR as follows: |$q_i=p_i \frac{m}{k_i}$|, where ki is the P-value rank by significance and m, is the total number of potential hypothesis tested. Finally, we ensure monotonicity using the Benjamini–Yekutieli correction (55).
Usually one performs all hypothesis tests and m is set to that number. For the likelihood ratio test that Ritornello performs, it is important to note that the null hypothesis is the absence of a peak at a position of interest in the presence of any amount of coverage due to uniform background signal or neighboring peaks. The appropriate m for Ritornello is then, not the number of candidate peaks tested, as these were previously enriched for peak shape coverage by the matched filter, but rather the total number of windows exceeding the minimum read count threshold before selecting those that match the filter.
RESULTS
To assess Ritornello’s performance, we compared it against the MACS2 (3) and GEM (23) peak callers, which have both been recommended by the ENCODE consortium (56,57). We use 14 single-end TF ChIP-seq experiments from the ENCODE project (57), each with two biological replicates (see Table 1 and Supplementary Table S1). Matched DNA input or IgG controls were also available (Supplementary Table S1).
Although matched input controls are optional for some methods, they are used by ChIP-seq peak calling algorithms to avoid calling many spurious false positives. Both MACS2 and GEM have options to run with or without the matched input control. To show that Ritornello, which is a matched control free approach, avoids calling these false positives we benchmark it against MACS2 (with and without the matched control), and GEM (with and without the matched control) on each of the 28 samples. We refer to MACS2 with the control as MACS2-I and without the control as MACS2. We refer to GEM with the control as GEM-I and without the control as GEM. To control for the variable size of reported peaks, we used the interval from 100 bp upstream to 100 bp downstream of each peak summit identified by MACS2 or GEM, and of each peak location identified by Ritornello. If multiple peaks called from a single algorithm overlapped each other using this 200 bp window, only the peak with the more significant q-value is considered. Mitochondria were excluded from all analysis. We compare the performance of these algorithms in terms of:
the percentage of peaks unique to Ritornello with respect to each control utilizing sample.
the similarity between characteristic coverages in binding events predicted uniquely by Ritornello, MACS2-I, or GEM-I to the coverages of strong binding events predicted by multiple algorithms.
motif enrichment.
Additionally, we demonstrate that Ritornello predicts very few false positives in negative controls suggesting that Ritornello has a low FPR. We also observe that Ritornello produces results that are reproducible among technical replicates as determined by the irreproducible discovery rate algorithm (58) (Supplementary Table S2).
Most peaks called by Ritornello are common to MACS2-I or GEM-I
Control samples are used to eliminate false positive peaks during ChIP-seq peak calling. It is therefore important to measure how many peaks are unique to Ritornello with respect to the control using algorithms. The percentages of peaks unique to Ritornello with respect to MACS2-I and GEM-I are shown in Table 2. Ritornello tends to call few (<10%) unique peaks in all samples with the exceptions of E2F4 and ATF2. Although Ritornello tends to report fewer peaks than MACS2-I and GEM-I, these peaks are most often a subset of the peaks reported by those input using algorithms. Peaks that are called by MACS2-I and GEM-I but not by Ritornello typically have less reads and hence the shape of the read coverage at these loci is not well defined.
TF ChIP-seq experiments (two replicates each)
TF . | Cell type . |
---|---|
NRF1 | K562 |
SRF | GM12878 |
REST | H1 |
MAX | K562 |
ATF2 | H1 |
E2F4 | GM12878 |
GATA1 | MEL |
MYC | MEL |
CTCF | Myotube |
ELK1 | K562 |
SRF | H1 |
MAX | H1 |
YY1 | H1 |
REST | K562 |
TF . | Cell type . |
---|---|
NRF1 | K562 |
SRF | GM12878 |
REST | H1 |
MAX | K562 |
ATF2 | H1 |
E2F4 | GM12878 |
GATA1 | MEL |
MYC | MEL |
CTCF | Myotube |
ELK1 | K562 |
SRF | H1 |
MAX | H1 |
YY1 | H1 |
REST | K562 |
TF . | Cell type . |
---|---|
NRF1 | K562 |
SRF | GM12878 |
REST | H1 |
MAX | K562 |
ATF2 | H1 |
E2F4 | GM12878 |
GATA1 | MEL |
MYC | MEL |
CTCF | Myotube |
ELK1 | K562 |
SRF | H1 |
MAX | H1 |
YY1 | H1 |
REST | K562 |
TF . | Cell type . |
---|---|
NRF1 | K562 |
SRF | GM12878 |
REST | H1 |
MAX | K562 |
ATF2 | H1 |
E2F4 | GM12878 |
GATA1 | MEL |
MYC | MEL |
CTCF | Myotube |
ELK1 | K562 |
SRF | H1 |
MAX | H1 |
YY1 | H1 |
REST | K562 |
Percentage of peaks unique to Ritornello as compared to MACS2-I and GEM-I
Comparing Ritornello and alternative methods based on unique peak coverage patterns
We next investigate the peak shape and motif content for samples where Ritonello called >10% unique peaks relative to MACS2-I or GEM-I. For this analysis, we compared the peaks that are uniquely reported by Ritornello to the corresponding algorithms in Figure 2. To be fair when comparing read coverage of unique peaks, we averaged the local distributions of read start positions (pileup) for the top 200 most significant unique peaks of each algorithm (pileup plots in Figure 8 and additionally Supplementary Figures S1 and 2) and found that the characteristic patterns (black curves whose shapes match the impulse response functions) associated with the highest intensity peaks reported by all algorithms tend to be similar to the patterns obtained by aggregating the pileups of peaks uniquely reported by Ritornello but in contrast are less similar to the aggregated pileups of peaks uniquely reported by MAC2-I or GEM-I. These results provide evidence that peaks uniquely reported by Ritornello may be more likely to be indicative of binding events than those uniquely reported by MACS2-I or GEM-I.

Pileup of read start positions for ATF2 rep2 peaks in H1 cells obtained by Ritornello only (blue) compared with that obtained from (A) MACS2 only(red) and (B) GEM only(green). The pileups of peaks common to Ritornello and MACS2 (A) or Ritornello and GEM (B) are shown in black. The pileups of read start positions for Ritornello best match the pileups of common peaks. Additionally Ritornello unique peaks have comparable levels of motif enrichment as compared to the other algorithms.
Comparing Ritornello and alternative methods based on motif occurrence rate
Availability of genuine validations of TF-binding events inferred by TF ChIP-seq peak callers is limited. Therefore, one of the measures used by practitioners for assessing the quality of peak callers is the fraction of predicted TF-binding events that overlap with the characteristic binding motif of the relevant TF. Employing the same 28 public ChIP-seq samples, we compare the motif enrichment for the top 1000 peaks reported by each algorithm or all peaks when less than 1000 are reported as shown in (Table 3). The annotated motifs specific to the TFs are downloaded from the JASPAR CORE 2014 motif library (59). Genomic locations that match the position weight matrix of each JASPAR motif were identified using PWMScan with default P-value cutoff at 0.00001 (60). Predicted peaks whose binding centers are within 100 bp from the corresponding JASPAR motif are classified motif containing. The peaks found by Ritornello had the highest motif occurrence rate compared with MACS2-I, MACS2, GEM-I and GEM in 15 out of the 28 samples. MACS2-I, MACS2, GEM-I and GEM had the highest motif occurrence rate in 8, 1, 3 and 1 samples respectively. This suggests that Ritornello, which is a control free peak caller, is able to identify the most significant true-binding events at comparable rates to input using peak callers.
Percentage of the top 1000 peaks (or all peaks when <1000) within 100 bp of a motif obtained by Ritornello, MACS2-I, MACS, GEM-I and GEM
![]() |
![]() |
![]() |
![]() |
Additionally, we compare the number of motif containing peaks reported by each algorithm as a function of significance (Figure 9 and Supplementary Figures S3–9). We see that Ritornello exhibits improved motif enrichment, with respect to the other algorithms, for peaks reported in ATF2 rep2 and comparable enrichment to control utilizing algorithms in E2F4 rep1 and rep2. In general, Ritornello shows comparable motif enrichment to the input utilizing algorithms.

Motif containing peaks as a function of the rank of the q-value for (A) ATF2 rep2, (B) E2F4 rep1, (C) E2F4 rep2 and (D) CTCF rep1. The number of peaks displayed is slightly more than the number of peaks reported by Ritornello.
Ritornello identifies true-binding events in low quality samples
ENCODE recommends discarding low quality samples as determined by the NSC and RSC scores. Ritornello discards read length artifacts that give rise to low RSC and uses a matched filtering approach that maximizes the signal to noise ratio measured by the NSC, we therefore conjectured that Ritornello may be able to rescue these low quality samples. We compared the performance of Ritornello with that of MACS2-I, MACS2, GEM-I and GEM on samples with suboptimal quality based on the NSC and RSC scores. The ENCODE Consortium has suggested repeating experiments with NSC values <1.05 and RSC values <0.8. Using these criteria we identified that out of the 28 samples we investigated, four samples have suboptimal quality. These four experiments include: ATF2 H1 replicate 1 (NSC = 1.04, RSC = 0.62), ATF2 H1 replicate 2 (NSC = 1.04, RSC = 0.74), ELK1 K562 replicate 1 (NSC = 1.03, RSC = 0.64) and ELK1 K562 replicate 2 (NSC = 1.05, RSC = 0.73).
We observed that in these four samples, the pileups of peaks predicted by Ritornello have a characteristic bimodal shape of TF binding and have much stronger read coverage than their matched input controls (Figure 10 and Supplementary Figures S10–12). This suggests that there are numerous significant binding events that can be captured in low quality ChIP-seq samples. Additionally, the pileups of peaks reported by MACS2 or GEM have either non-uniform read coverage or a narrow bimodal shape (similar to column artifacts) as in Figure 10 and Supplementary Figures S10–12. It is worth noting, however, that when provided controls, the MACS2-I and GEM-I reported peaks also have a strong bimodal shape with less signal in the control sample. This illustrates that Ritornello reliably rescues peaks from low quality samples, while MACS2-I and GEM-I might be able to avoid calling false positive artifacts from low quality samples as well.

Pileup of read start positions for peaks identified by: Ritornello, (A) and (D) (blue); MACS2-I, (B) and (E) (dark red); and MACS2, (C) and (F) (light red) in samples of low quality and their matched controls. Matched controls are shown in black. The pileups of peaks detected by Ritornello show smooth bimodal shapes and similarly to MACS2-I, have stronger read coverage in the ChIP as compared to their matched controls. In contrast, the pileups of peaks detected by MACS2 have a narrower shape and irregular spikes similar to the negative controls.
Ritornello obviates the need for a matched input control
To demonstrate that Ritornello calls few false positives that could otherwise be avoided with the aid of a match control, we show that Ritornello calls few peaks in control samples as seen in Table 4. We used the filter learned in the ChIP channel to look for regions in the corresponding control that match this pattern. This is performed by running Ritornello, supplying the filter found in the ChIP experiment rather than having it learned from the control data. The number of peaks Ritornello reported in each matched control is a proxy for the number of potential false positive peaks called by Ritornello in its corresponding ChIP-seq experiment. We found that Ritornello called very few peaks in the matched controls, which corresponds to a FPR in ChIP of <0.05 for 25 out of the 28 samples. The samples with the two highest FPR’s use IgG as a control, so it is likely that Ritornello is picking up non-specific binding events, which are unlikely to be a source of noise in an actual ChIP-seq sample.
Ritornello tends to call many more peaks in ChIP-seq samples than in control samples
![]() |
![]() |
![]() |
![]() |
DISCUSSION
In this work, we demonstrated that we could infer the entire FLD, rather than only the mean fragment length, using a deconvolution approach from single-end TF ChIP-seq experiments. We derived an experiment-specific probabilistic model to mathematically describe the well-known bimodal shape of TF binding. Using this bimodal shape, we applied the matched filter technique from signal-processing to identify potential TF-binding sites and used a Poisson GLM to deconvolve the binding intensities and test the significance of each putative-binding event. Our model efficiently deconvolves the effect of neighboring peaks as well as noise to resolve multiple adjacent-binding events. We compared Ritornello (a control-free approach) with two popular algorithms recommended by ENCODE, MACS2-I and GEM-I, which require matched controls to reduce false positives. We found that Ritornello outperforms these other methods when input is unavailable and performs similarly when it is in terms of reproducibility between biological replicates, motif enrichment and the coverage patterns of unique reproducible peaks.
We also identified artifactual-binding regions where the local cross-correlation peaks at read length instead of around fragment length. We elucidated that these artifactual regions contribute to the phantom peaks associated with poor experimental quality. Current peak calling algorithms, such as MACS2 and GEM, rely on matched control samples to remove a substantial fraction of these artifacts. We provide an extensive description of this specific category of artifacts and their origin, and offer an automated approach to filter out artifacts without requiring matched controls. Taken together, Ritornello offers an alternative that obviates the need for a match control, demonstrating that one can safely reduce the total experimental cost of TF ChIP-seq experiments, while providing superior analytic results.
ENCODE provides a blacklist of genomic regions which contains artifactually high read coverage in different ChIP-seq experiments (61). This manually curated blacklist largely overlaps with repetitive regions in the genome (61). The blacklist has several drawbacks: (i) it does not cover all artifactual regions, (ii) it is not generalizable to different cell types and (iii) it is only available for human and mouse. We note that a few peak calling tools, such as PePr (62), optionally remove artifacts in regions where the local read coverage in the ChIP is similar to that in the matched control. However, these methods still require a matched control. Ritornello is capable of removing artifacts independently without requiring either prior knowledge of a blacklist or matched negative controls.
Many variables influence the quality of ChIP-seq experiments and our ability to infer true-binding events from the data. These include factors such as antibody efficiency, DNA fragmentation, PCR amplification, sequencing depth, read mapping quality etc. Each of these factors may vary from one sample to another. Ritornello is designed to implicitly take into consideration these experiment specific parameters from raw data and is applicable to a wide variety of protocols. Additionally, it does not require any tuning of parameters. One limitation of Ritornello is that it is designed to detect point-source peaks such as TF-binding events. For broad-source peaks, such as epigenetic modifications, we recommend other peak callers.
We note that there are scenarios where Ritornello is expected to exceed the performance of control utilizing algorithms. These include: (i) when the control is of poor quality and (ii) positions where methods considering controls fail to eliminate read length artifacts. Additionally, in regions with very low coverage or high amounts of noise that obscure the binding shape of potential peaks, Ritornello will not call-binding events, and will thus report peaks more conservatively than MACS2-I and GEM-I.
We demonstrated that in TF ChIP-seq experiments that would otherwise be discarded due to low quality, Ritornello (as well as MACS2-I or GEM-I) might reliably recover true-binding events amidst the high levels of artifacts. Often repeat experiments with the same reagents are of poor quality according to ENCODE’s metric and thus algorithms capable of handling such data are required. If, however, the repeated experiments were to improve results, the previous poor quality samples may still be of value to strengthen the findings of the higher quality samples.
SOFTWARE AVAILABILITY
Ritornello was programmed in C++ using the FFTW (63) library for fast computation of the Fourier transform and the Samtools (64) library for interfacing with the sequence alignment/map format which has become the standard in high throughput sequencing. Ritornello is freely available for download at https://github.com/KlugerLab/Ritornello together with a detailed tutorial. Further analysis and graphics were made using the R statistical language (65).
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
The authors thank Francesco Strino, Vladimir Rokhlin, Ronen Talmon and Ronald Coifman for insightful discussions.
FUNDING
National Institute of Health (NIH) [U54HG006996-03 to S.W., 1R01HG008383-01A1 to Y.K., R01 GM086852 to Y.K.]. Funding for open access charge: NIH [1R01 HG008383-01A1 to Y.K, R01 GM086852 to Y.K.].
Conflict of interest statement. None declared.
REFERENCES
Author notes
These authors contributed equally to the paper as first authors.
Comments