HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis

To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure mapping experiments, including mutate-and-map contact inference, chromatin footprinting, the Eterna RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use and extend. Here, we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version and additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access: http://hitrace.org.


INTRODUCTION
Capillary electrophoresis (CE) is one of the most powerful and widely used nucleic acid separation techniques available (1).Its conventional application areas include genomic mapping, forensic identification, and genome sequencing (2,3).Recently combined with chemical probing methodologies, CE further provides a powerful and rapid means to map complex DNA and RNA structures at singlenucleotide resolution (4).
In chemical-probing-based RNA structure inference studies, a chemical reagent modifies the RNA of interest, either cleaving it or forming a covalent adduct with it.Commonly used reagents include hydroxyl radicals (5,6), dimethyl sulfate alkylation (DMS) (7), carbodiimide modification (CMCT) (8), and the SHAPE strategy using 2'-OH acylation (9).Subsequent reverse transcription detects the modification sites as stops to primer extension at nucleotide resolution.Traditionally, the resulting cDNA fragments were resolved in sequencing gels followed by individually quantifying band intensities.To resolve these fragments in a high-throughput fashion, capillary electrophoresis (CE) can be used.CE-based chemical probing can produce tens of thousands of individual electrophoretic bands from a single experiment, leading to recent breakthroughs in high-throughput nucleic acid structure mapping: e.g., automated inference of complex RNA structures (4,10-13) such as ribosomes (14), and viruses (15,16).
Analyzing a large number of electrophoretic traces from a high-throughput structure-mapping experiment is time-consuming and poses a significant informatics challenge.It requires a set of robust signal-processing algorithms for accurate quantification of the structural information embedded in the noisy traces.Current software methods for CE analysis include capillary automated footprinting analysis (CAFA) (10), ShapeFinder (11), high-throughput robust analysis for capillary electrophoresis (HiTRACE) (17), fast analysis of SHAPE traces (FAST) (18), and QuShape (19).The most recent of these methods have largely converged on the basic steps of analysis: preprocessing (such as selection of the datacontaining range and baseline adjustment), deconvolution of co-loaded mapping signal traces and reference traces, alignment, peak detection, band annotation, peak fitting, signal decay and background subtraction.
Although these programs are all useful for semi-automated CE data analysis, they suffer from certain limitations.CAFA has effective peak fitting capabilities but lacks alignment and annotation features, thus necessitating laborious efforts for analyzing multiple capillaries.ShapeFinder and QuShape provide sophisticated signal alignment, annotation, and peak fitting capabilities, but have limited cross-capillary analysis capabilities, making these tools less efficient for analyzing data from large-scale experiments involving hundreds of capillaries.FAST is highly optimized for rapid band annotation but relies on the proprietary ABI utility programs, which has made it difficult to customize and extend to various experimental scenarios.HiTRACE is feature-rich and has been extensively used for high-throughput structure mapping studies such as the mutate-and-map strategy (13,20,21), chromatin footprinting, and the massively parallel RNA design project EteRNA (22).However, HiTRACE is a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use, and extend, and a purchased license (The MathWorks, http://www.mathworks.com).HiTRACE-Web is based on the HiTRACE workflow and thus inherits all of its core functionalities for high-throughput CE analysis.Going one step further, HiTRACE-Web provides an integrated and interactive on-line interface.In addition to the standard features previously available, HiTRACE-Web presents additional features such as automated band annotation and adjustment.By making use of powerful multi-core server processors, HiTRACE-Web also runs faster than the previous off-line implementations available to most users on their local computers.To the best of the authors' knowledge, HiTRACE-Web is the first on-line tool for high-throughput CE data analysis.

METHOD OVERVIEW
HiTRACE-Web takes as input a set of nucleic acid structure mapping profiles obtained from a number of capillaries.A profile represents the intensity of a nucleic acid sample in a capillary as a function of electrophoretic time.The peaks in a profile appear as bands in a gray-scale image.Band/peak locations represent individual residues of a nucleic acid sequence.Commonly used ABI sequencers (Life Technologies Corporation, Carlsbad, USA) can separate the products of hundreds of nucleotides, and we assume that each CE profile contains hundreds of bands.The final output of HiTRACE-Web is a set of aligned and annotated profiles with quantified band areas in text and image formats.To deliver the output, HiTRACE-Web works in multiple stages interleaved with user checkpoints: preprocessing, profile alignment, band annotation, peak fitting and quantification.Figure 1 shows the flowchart of the analysis pipeline.
In preprocessing, the user needs to provide information on capillary channel compositions.To facilitate the analysis process, it is common in typical CE experiments to co-load samples with a reference ladder which fluoresces in a different color.Multiple spectral channels thus exist in each capillary, and the user should specify which channels contain the structure mapping signal and the reference (typically a ladder) The next step in preprocessing is to select the range of analysis, since typical profiles carry information only within a subset of the entire electrophoresis run.HiTRACE-Web utilizes edge-detection techniques (23) to select proper regions for further analysis automatically (Step A.2).It is also possible for the user to set the region manually.HiTRACE-Web then carries out a series of additional preprocessing operations on the selected region of capillaries such as adjustment of constant baseline (Step A.3).More details of the preprocessing operations can be found in (17).
The second stage in HiTRACE-Web is alignment.Profiles need to be aligned to each other because the products in different capillaries are subject to slightly different electrophoretic conditions and their detection times are then shifted and scaled compared to each other.Due to the finite number of capillaries an ABI sequencer can handle at a time (e.g., 96 for ABI3730), CE experiments using hundreds of capillaries consist of multiple batches.We designate the first capillary in each batch as the reference and use it for within-batch alignment.There, we align each profile to the reference by finding the optimal shift and scaling amounts that maximize its correlation to the reference (part of Step B.1).After the withinbatch alignment, we carry out additional alignment procedures: between-batch adjustment for global where bands should appear in the data, e.g., primarily at A and C positions for DMS modification of RNA data (24), and with annotations carried forward to the final result files (see below).
Following the profile alignment is the band annotation procedure (Step D.1), which refers to the process of mapping each band in an electrophoretic profile to a position in the nucleic acid sequence.For verification, visual inspection of band annotation results is normally inevitable to certain extent, but the manual band annotation step takes significant human efforts when there are a large number of profiles.
To address this issue, HiTRACE-Web provides an automated band annotation functionality, which allows users to complete initial assignments of hundreds of bands in hundreds of profiles in the order of seconds.
More details of the automated band assignment algorithm are beyond the scope of this server-focused paper and will be described elsewhere.HiTRACE-Web also provides an interactive interface to adjust the band annotation manually so that users can correct any suboptimal assignment they find.
In the last stage, HiTRACE-Web performs peak fitting (Step D.2) to approximate a profile as a sum of Gaussian curves, each of which is centered at the intensity peak location.The peak amplitudes are selected in the least-squares sense by using a standard optimization technique, thus minimizing the deviations of the Gaussian model from the CE profiles (17).As the final output, HiTRACE-Web reports the peak quantification results including the area and location of each band and exports data annotations, sequence, and structure to an RNA data (RDAT) formatted file (25).The RDAT file can then be submitted to the RNA Mapping Database (RMDB) repository for data sharing (http://rmdb.stanford.edu/submit)or structure server for secondary structure model estimation (http://rrmdb.stanford.edu/structureserver)(25), and links to these tools are provided.HiTRACE-web leaves correction for attenuation of band intensity for longer products ('signal decay'), background subtraction, and final normalization to the user.These post-processing tasks are carried out differently by independent groups who have made distinct ad hoc assumptions (19,26,27), and indeed these tasks are left out of some applications, such as the mutate-and-map technique (21), due to the introduction of noise.Robust experimental and computational approaches for post-processing chemical mapping data are in development (T.Mann, PC, RD, in prep.), as are additional features including secondary structure display and error estimation in automatic sequence assignment.Inclusion into HiTRACE-Web will allow these advances to be disseminated rapidly to the RNA structure mapping community.

WEB SERVER
The HiTRACE-Web server utilizes the Apache HTTP Server (The Apache Software Foundation, http://httpd.apache.org/)for basic web services and the MySQL Server (Oracle Corporation, http://www.mysql.com/)for internal data management.For client-side programming, we used CodeIgniter (EllisLab, http://ellislab.com/codeigniter),an open-source web application framework, and coded the web pages in PHP (The PhP Group, http://php.net/) and JavaScript (Mozilla Foundation, https://developer.mozilla.org/en/docs/JavaScript)with jQuery (The jQuery Foundation, http://jquery.com/) and jQuery user-interface (UI) plugins.Interactive UI components also use the canvas elements defined in the HTML5 standard (World Wide Web Consortium, http://www.w3.org/TR/html5/).We used Ajax (http://en.wikipedia.org/wiki/Ajax_(programming)) for asynchronous data transfers between the client and server sides.On the server side, we used Gearman (http://gearman.org),an open-source application framework, to schedule multiple requests.Each user request is handled by a worker written in PHP.The worker creates template code in MATLAB that contains input parameters and links to the user data.The worker also forks the MATLAB interpreter so that it can execute the template code.Most of the timeconsuming operations are multi-threaded, and the current HiTRACE-Web server runs on a 48-core machine (four on-board AMD Opteron 6172 processors with 256GB main memory; Ubuntu Linux version 3.2.0-29).In this environment, processing 52 capillaries (with 60 bands per capillary) takes a few minutes including manual adjustments.HiTRACE-Web supports most of the widely used web browsers including Google Chrome (Version 25.0 or later; http://www.google.com/chrome/),Mozilla Firefox (Version 19.0 or later; http:// www.mozilla.org/),Apple Safari (Version 6.0 or later; http:// www.apple.com/safari/),and Microsoft Internet Explorer (Version 9.0 or later; http://windows.microsoft.com).
Figure 2 explains the overall flow of CE data analysis using HiTRACE-Web.The input file is a zipcompressed collection of .ab1or .fsafiles from ABI sequencers in the ABIF file format.If there are multiple batches, each batch should be presented in a subfolder, as is the typical output from ABI sequencers.After uploading the input file, the user needs to specify the signal and reference channels.
HiTRACE-Web provides a graphical interface for channel selection (Figure 2A).Next, the region of interest is specified either manually or automatically (Figure 2B).HiTRACE-Web then carries out the alignment of the profiles in the selected region, following the procedures outlined previously.The user can visually inspect the intermediate result after alignment (Figure 2C) and go over the previous stages if needed.If satisfactory, the user can start specifying the type of chemical modification data present in each capillary using the graphical interface HiTRACE-Web provides (Figure 2D).The user can then perform band annotation (Figure 2E).HiTRACE-Web permits either manual or automated band annotation; in practice, both types of annotations complement each other: performing automated band annotation first and then adjusting the result manually allows a large number of bands to be annotated quickly and robustly.After band annotation, HiTRACE-Web carries out peak fitting and quantification.The results are provided as downloadable images (Figure 2F), tab-delimited text files (with quantified areas; one column per capillary and one row per residue), and an RDAT file that can be submitted to the repository and structure modeling server available at the RNA Mapping Database (25).More detailed instructions to each analysis stage and explanations of results are available at the HiTRACE-Web homepage.A complete tutorial with example data is also available, with specific help at each step.The data used is from an RNA structure mapping study using the mutate-and-map strategy (13,20,21).
The mapped sequence length was 92 nucleotides, and the data describes six different mapping experiments including SHAPE, DMS and reference ladders.The total number of bands was 552.

SUMMARY
We developed HiTRACE-Web (freely available at http://hitrace.org),an online server for rapid and robust alignment (part of Step B.1), adjustment of smooth baseline (Step B.2), and dynamic-programming-based piece-wise-linear alignment for fine-tuning local disagreements (17) (Step B.3).After alignment, the user checks intermediate results (Step C.1) and specifies the sequence probed and the type of chemical modifications applied to each capillary (Step C.2).The user can currently select among nine options: three types of chemical reagents (DMS, CMCT, and SHAPE), four types of dideoxynucleotides (ddGTP, ddATP, ddTTP, and ddCTP), no modification, or other.HiTRACE-Web provides a convenient and intuitive graphical interface for selecting signal and reference channels and specifying capillary modification types.This specification provides HiTRACE-Web with information on

Figure
Figure 3A and 3B show the correlation of quantification results between HiTRACE-Web and HiTRACE (17) for two different users.The high level of Pearson's correlation coefficients (R 2 of 0.9996 and 0.9999; P<0.001) between band intensities quantified with HiTRACE-Web and those quantified with HiTRACE indicates lack of any major systematic deviations induced by HiTRACE-Web.To test the consistency in band quantification between different users, we let two independent users carry out quantification of the same data using HiTRACE-Web and HiTRACE, as shown in Figure 3C and 3D, respectively.Both tools resulted in excellent agreement between the independent analyses (R 2 of 0.9993 and 0.9986; P<0.001).

Figure 2 .
Figure 2. Overview of data analysis using HiTRACE-Web.(A) Users can select the reference and signal channels using the graphical interface.(B) HiTRACE-Web provides a feature to select the valid region of interest automatically.Alternatively, users can set the range manually.The red rectangles in the figure represent the regions of interest.(C) Before advancing to the next step, HiTRACE-Web provides a snapshot of intermediate results, so that the user can go over the previous steps if needed.(D) HiTRACE-Web provides an intuitive graphical interface to specify the chemical modifications made to capillaries.For each of them, the user can select among nine color-coded options: three chemical reagents (dimethyl sulfate alkylation (DMS) (7), carbodiimide modification (CMCT) (8), the SHAPE strategy using 2'-OH acylation (9)), reference ladders using four dideoxynucleotides (ddGTP, ddATP, ddTTP, and ddCTP), no modification, and other.(E) HiTRACE-Web can carry out band annotation in an automated fashion, thus reducing the analysis time substantially.Furthermore, HiTRACE-Web provides a user-friendly interface for users to fine-tune band annotation results manually.Red circles represent the residue locations.Each type of nucleotide is associated with a different color (G: green, C: cyan, U: blue, and A: red).By clicking the image, user can (de)select the position of individual residues.The autoassigned bands are shown on the right side in gray for easier referencing when manually adjusting the assignment.(F) HiTRACE-Web reports a set of aligned and annotated profiles with quantified peak areas in the image, tab-delimited text, and RDAT (25) formats users can download for further uses.

Figure 3 .Figure 3
Figure 3. Confirming consistency in band quantification results between tools and analyses.(A-B) Correlation of quantification results between HiTRACE-Web and HiTRACE (17) for two independent users.These plots indicate lack of any major systematic deviations induced by HiTRACE-Web.(C-D)HiTRACE-Web gives the same level of consistency between independent analyses as HiTRACE.The data used is from an RNA structure mapping study using the mutate-and-map strategy(13,20,21).A 92nucleotide RNA sequence was treated in six different mapping conditions including SHAPE, DMS and ddTTP, giving 552 bands in total.