- Split View
-
Views
-
Cite
Cite
Adam Buckle, Nick Gilbert, Davide Marenduzzo, Chris A Brackley, capC-MAP: software for analysis of Capture-C data, Bioinformatics, Volume 35, Issue 22, November 2019, Pages 4773–4775, https://doi.org/10.1093/bioinformatics/btz480
- Share Icon Share
Abstract
Capture-C is a member of the chromosome-conformation-capture family of experimental methods which probes the 3D organization of chromosomes within the cell nucleus. It provides high-resolution information on the genome-wide chromatin interactions from a set of ‘target’ genomic locations, and is growing in popularity as a tool for improving our understanding of cis-regulation and gene function. Yet, analysis of the data is complicated, and to date there has been no dedicated or easy-to-use software to automate the process. We present capC-MAP, a software package for the analysis of Capture-C data.
Implemented with both ease of use and flexibility in mind, capC-MAP is a suit of programs written in C++ and Python, where each program can be run separately, or an entire analysis can be performed with a single command line. It is available under an open-source licence at https://github.com/cbrackley/capC-MAP, as well as via the conda package manager, and should run on any standard Unix-style system.
Supplementary data are available at Bioinformatics online.
1 Introduction
Over recent years the family of experimental methods based on chromosome-conformation-capture (3C) has grown (Han et al., 2018), with different variants used to generate data at different resolutions, using different methods of detection—e.g. PCR, microarray or next-generation sequencing (NGS) technologies. These methods are used to probe the interactions between different chromatin regions in vivo, and to uncover the three-dimensional organization of chromosomes and genomes. In the near two decades since they were first developed, they have revolutionized our understanding of genome organization and function (Denker and de Laat, 2016).
The 3C based protocols range from the original ‘one-to-one’ 3C method (Dekker et al., 2002) which measures interactions between selected pairs of genomic loci; through the ‘one-to-all’ style 4C method (Simonis et al., 2006), where genome-wide interactions for a single selected locus are obtained; to high-throughput ‘all-to-all’ HiC (Lieberman-Aiden et al., 2009), which uses NGS to obtain genome-wide chromatin interaction maps. Capture-C (or NG Capture-C) is a relatively recent addition to the 3C family, developed by Hughes et al. (2014); it uses oligo-capture technologies, a frequently cutting restriction enzyme, and NGS sequencing, to deliver high-resolution cis-interaction profiles for up to hundreds of target loci from a single experiment. While HiC can provide a large-scale overview of chromosome interactions, deep sequencing is required to get good spatial resolution, which is costly. Capture-C is a ‘many-to-all’ assay which gives interaction profiles for a set of ‘targets’ at near restriction enzyme fragment resolution (Davies et al., 2016; Hughes et al., 2014).
The popularity of the Capture-C method has grown (Andrey et al., 2017; Buckle et al., 2018; Furlan et al., 2018), but the analysis of the data is complicated—it requires non-standard use of bioinformatics tools as well as some bespoke data treatment. Most work using the method to date has used custom analysis scripts accessible only to experts in bioinformatics and programming. While some analysis tools designed to treat HiC data now also support Capture-C, these are not optimized for the method and are limited in functionality: there has been a lack of easy-to-use, dedicated software. Here we introduce capC-MAP, a software package for the analysis of Capture-C data.
2 capC-MAP
capC-MAP is implemented as a suit of programs written in C++ and Python. It calls several common external software packages [cutadapt (Martin, 2011), bowtie (Langmead et al., 2009) and samtools (Li et al., 2009)], as well as performing Capture-C specific processing steps (namely, ‘target’ and ‘reporter’ restriction enzyme fragments are identified, invalid interactions are removed and pile-ups of interactions are generated for each target). Full details of these processing steps are given in Supplementary Information.
An entire Capture-C analysis can be performed with a single command, or, for bespoke analyses, the capC-MAP component programs can be run separately. Usage of the software is documented in its user manual, but a typical work flow is as follows. First, the user builds an index for the reference genome to which the data will be aligned; then capC-MAP is used to generate a restriction enzyme fragment map from the same reference. These steps only need to be performed once for a given genome. Finally, the capC-MAP pipe-line is run on the input data (a pair of fastq files obtained from paired-end sequencing). The main output is an ‘interaction profile’ (reads versus genome position) for each target locus (Fig. 1). This signal is proportional to the number of cells within the population where the target was spatially proximate to the given genomic position. capC-MAP can perform normalization to remove biases arising because different oligos have different capture efficiencies, and different binning and smoothing options can be applied to reveal features at different length scales (compare tracks in Fig. 1). capC-MAP outputs are in the ‘bedGraph’ file format which can be read by many downstream analysis and visualization tools. Full details are given in the capC-MAP user manual.
To our knowledge capC-MAP is the only software specific to the Capture-C experimental design where an interaction profile is obtained for each targeted restriction enzyme fragment. It is possible to instead use software designed to treat HiC data, and then perform additional Capture-C specific analysis steps manually. One tool which has facilities to do this is HiC-Pro; in Supplementary Information we compare the performance of HiC-Pro and capC-MAP. For testing we used a dataset with 35 targets captured from mouse erythroid cells [obtained from Davies et al. (2016)], and a dataset with 446 targets captured from developing mouse midbrain cells [obtained from Andrey et al. (2017)]. capC-MAP identifies a higher proportion of PCR duplicates, finds on average about twice as many informative reads, and performs the analysis in under a quarter of the time taken by HiC-Pro. This highlights the fact that the packages are optimized for different types of data. Full details are given in Supplementary Information and Supplementary Table S1. Another common method, which is similar but distinct from Capture-C, is ‘Capture HiC’; there, different experimental designs are common, and other software may be more appropriate (see Supplementary Information).
capC-MAP is freely available under the GNU General Public License v3.0, and can be obtained from https://github.com/cbrackley/capC-MAP, with user manual at capc-map.readthedocs.io. It can also be installed via the conda package manager. capC-MAP comes with a small example dataset, and several ‘worksheets’ showing examples of how plots such as Figure 1 can be generated using different tools (either R packages or command-line based tools common in NGS bioinformatics). In Supplementary Information we give full details of the processing steps performed by the software, as well as background details on the Capture-C method.
Funding
This work was supported by the European Research Council [Grant No. 648050, THREEDCELLPHYSICS] and the UK Medical Research Council [Grant No. MR/J00913X/1].
Conflict of Interest: none declared.
References