Summary: Current web-based genome browsers require repetitious user input to scroll over long distances, alter the drawing density of elements or zoom through multiple orders of magnitude. Generally, either the server or the client is responsible for the majority of data processing, resulting in either servers having to receive and handle data relevant only to one user, or clients redundantly processing widely viewed data. ChromoZoom pre-renders and caches general-use tracks into tiled images on the server and serves them in an interactive web interface with inertial scrolling and precise, fluent zooming via the mouse wheel or trackpad. Custom tracks in several formats can be rendered by client-side code alongside the pre-rendered tracks, minimizing server load because of user-specific rendering and eliminating the need to transmit private data. ChromoZoom thereby enables rapid and simultaneous exploration of curated, experimental and personal genomic datasets.
Supplementary information:Supplementary data are available at Bioinformatics online.
Genome browsers have become an essential tool for experimental and computational biologists. Among well-known web-based browsers, the UCSC Genome Browser has gained popularity for its ready availability, comprehensive library of genomes and curated data and the ability to display custom data uploaded by researchers (Dreszer et al., 2011; Nielsen et al., 2010). Overlaying experimental data (e.g. sequence variation) onto curated tracks (e.g. gene predictions) allows for the formulation and verification of biological hypotheses. Researchers unfamiliar with a newly encountered locus can inspect it within a genome browser to determine the gene layout or potential regulatory elements and design polymerase chain reaction or other experiments.
In response, ‘next-generation’ browsers like AnnoJ (http://www.annoj.org), JBrowse (Skinner et al., 2009) and ABrowse (Kong et al., 2012) have been built to take advantage of modern web technologies, adding more seamless interactions that preserve the user’s sense of location while traversing the massive ‘landscape’ of a genome. AnnoJ and JBrowse render most genomic data on the browser-side, the former drawing pixels on HTML5 Canvas elements and the latter manipulating standard HTML elements. However, both require preprocessing of custom data by an administrator before it can be rendered within the browser; neither accept flat files via the web interface in the manner of the UCSC browser. ABrowse renders all data to tiled images, but cannot zoom smoothly and requires custom data to be fully uploaded to the server—which can be prohibitive for large datasets. Finally, none of the browsers features inertial scrolling, a feature popularized by iOS and Google Maps whereby a scrollable surface can be ‘thrown’ with a finger or the cursor to move across long distances.
ChromoZoom attempts to improve the interactivity introduced by ‘next-generation’ genome browsers while adding custom data capabilities and the familiar rendering styles of established browsers like UCSC. The full-window interface can maximize use of vertical display space by wrapping tracks into multiple lines, much like paragraph text. General-use tracks can be added via a dropdown menu (Fig. 1A); the layout updates dynamically. The user can drag lines horizontally and vertically, and they can also ‘throw’ the line with the mouse to scroll smoothly through the genome. Motion of the tracks is never interrupted, preserving the user’s sense of location throughout all navigation operations.
To zoom in and out, the user can use a familiar slider and buttons (Fig. 1B) or keyboard shortcuts to move all the way from the genome-wide view to the base pair level, or they can position their mouse cursor and use a scroll wheel or trackpad to zoom continuously and precisely at any visible location (a feature unique to ChromoZoom). To see more detail, the user can vertically resize the track using its side label (Fig. 1C), which will ‘unpack’ individual elements and labels as completely as vertical space allows. An orange warning line appears if elements are being cropped by the edge of the track (Fig. 1D). Users can also reorder tracks by dragging the labels in the sidebar and remove them using the ‘show tracks…’ menu. A search bar allows users to specify coordinates or coordinate ranges, for example, ‘chr1:12340-12350’, or keywords like ‘MSH2’, which display a dropdown of matching features (Fig. 1E). Tooltips appear when the user hovers over track features (Fig. 1F), and a click loads a feature description page from the UCSC Browser. A full comparison of features with other current web-based genome browsers is provided in Supplementary Table S1.
ChromoZoom is the first online genome browser to provide client-side parsing and rendering of user-provided custom data, initiated by clicking the ‘custom tracks … ’ button (Fig. 1G). Browser Extensible Data (BED) tracks containing range-based features and ‘wiggle’ (WIG) tracks of continuous quantitative data, formatted according to UCSC’s guidelines (http://genome.ucsc.edu/FAQ/FAQformat.html), can be read from the client’s local disk or a public URL and are plotted adjacent to the normal tracks. The full suite of zooming, panning, reordering and expansion interactions applies equally to custom tracks. For large datasets, the user can provide a track line with a bigDataUrl pointing to a pre-indexed BED or WIG data file in bigBed/bigWig format (Kent et al., 2010) or sequence variations in tabix-compressed Variant Call Format, VCFTabix, (Danacek et al., 2011; Li, 2011), again formatted to UCSC’s guidelines, and the application will seamlessly fetch data for the current view with AJAX and render graphical data within the browser. ChromoZoom is, therefore, ideal for exploration of experimental data by researchers, enabling the visualization of custom results alongside a dynamic representation of curated genomic information.
ChromoZoom uses a local installation of the UCSC Genome Browser to generate and pre-cache tiled PNG images (via Ruby scripts). Rake (‘Ruby make’) directs the creation of a configuration (YAML) file for each genome, the capture of tile images and the creation of a JSON file to initialize the web interface. A Ruby extension written in C maximizes image-processing performance. ImageMagick is used for many image operations and the Nokogiri library for HTML parsing. The eight tracks available on http://chromozoom.org/for the hg19 assembly and the 19 tracks for the sacCer3 assembly consume 64 GB and 5.4 GB of disk space, respectively. Tile generation scripts are run in parallel across a computing cluster.
The visual style is influenced by Edward Tufte’s principle of maximizing the data-ink ratio (Tufte, 2001). Repetitive high-order digits are removed from the Base Position track, the data are placed front and centre with minimalistic borders and labelling, and control widgets are kept in the margin and in collapsed format until activated by the user.
The authors thank members of the Roth laboratory, in particular Takafumi Yamaguchi and Joseph Mellor, for providing design suggestions and sample custom track data. They thank David Haussler and W. James Kent for advice and encouragement and the Genome Bioinformatics Group at UC Santa Cruz for making source code, data and documentation for the UCSC Genome Browser (http://genome.ucsc.edu/) publicly available.
Funding: National Institutes of Health (HG004233, HL107440); Ontario Research Fund—Research Excellence Award; Canadian Institute for Advanced Research Fellowship; Canada Excellence Research Chair (to F.P.R.).
Conflict of Interest: none declared.