Sequenceserver: a modern graphical user interface for custom BLAST databases

The dramatic drop in DNA sequencing costs has created many opportunities for novel biological research. These opportunities largely rest upon the ability to effectively compare newly obtained and previously known sequences. This is commonly done with BLAST, yet using BLAST directly on new datasets requires substantial technical skills or helpful colleagues. Furthermore, graphical interfaces for BLAST are challenging to install and largely mimic underlying computational processes rather than work patterns of researchers. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver (http://sequenceserver.com), a modern graphical user interface for BLAST. Sequenceserver substantially increases the efficiency of researchers working with sequence data. This is due to innovations at three levels. First, our software can be installed and used on custom datasets extremely rapidly for personal and shared applications. Second, based on analysis of user input and simple algorithms, Sequenceserver reduces the amount of decisions the user must make, provides interactive visual feedback, and prevents common potential errors that would otherwise cause erroneous results. Finally, Sequenceserver provides multiple highly visual and text-based output options that mirror the requirements and work patterns of researchers. Together, these features greatly facilitate BLAST analysis and interpretation and thus substantially enhance researcher productivity.

Introduction alert message is shown and the BLAST button will remain disabled to avoid BLAST generating meaningless results.
Subsequently, the user selects one or several BLAST databases using checkboxes. Once a first database is selected, 120 additional database selections are limited to those of the same type (i.e., either nucleotide or protein) to eliminate the 121 risk of users combining incompatible databases that would cause BLAST to fail. Once a valid query is entered and 122 a database is selected, the BLAST submission button activates. For most query-database combinations, the single 123 possible basic BLAST algorithm will be used (Supplementary Figure S1). When multiple algorithms are appropriate 124 (e.g., nucleotide query and nucleotide database: BLASTN and TBLASTX are both appropriate), a pull-down in the 125 BLAST submission button allows the user to toggle between them. Sequenceserver's automatic algorithm selection 126 reduces the risk of attempting to perform impossible BLAST queries. Finally, Sequenceserver includes an "advanced 127 parameters" field providing access to all standard BLAST parameters available in the command-line (Camacho et al.,  Figure 1. Partial screenshot of the query interface. Dark red letters highlight the steps involved and some specific features. A: Three or more sequences were pasted into the query field (typewriter font; only the identifier is visible for the third sequence); a message confirms to the user that these are amino acid sequences. B: The Swiss-Prot protein database was the first database to be selected. As a result, additional database selections are limited to protein databases; nucleotide databases are disabled. C: The user entered (optional) advanced parameters to constrain the results to the 10 strongest hits with evalues stronger than 10 −10 . D: The BLAST button is automatically activated and labeled "BLASTP" as this is the only possible basic BLAST algorithm for the given query-database combination. As the user's mouse pointer hovers over the BLASTP button, a tooltip indicates that a keyboard shortcut exists for this button.

130
As a result of performing a BLAST query, an HTML report including graphical overviews is shown in the web browser 131 (Figure 2; an interactive version of this figure is at http://sequenceserver.com/paper/fig2interactive). This 132 report will feel familiar to users of NCBI BLAST but includes many additions and revisions for improved navigation,  and another link to display the sequence in the browser. This sequence viewing interface (Supplementary Figure S2) 143 includes GenBank-style visualization for readability and displays appropriate coordinate information when the mouse 144 pointer hovers over one or selects multiple residues (Gómez et al., 2013). Furthermore, each hit includes a checkbox 145 making it possible to simultaneously download a selection of multiple hit sequences as a single FASTA file.

147
We created Sequenceserver to overcome many of the challenges of performing BLAST on custom datasets. Below 148 we review known applications of Sequenceserver, compare it to alternatives, consider its compatibility with tools for 149 follow-up analyses and discuss future directions.  Google Analytics referral links to http://sequenceserver.com, social web statistics, and community en-160 gagement through the mailing list indicates growing adoption of Sequenceserver (Table 1) //sequenceserver.com/paper/fig2interactive. Three amino acid sequences were compared against the Swiss-Prot database using BLASTP with an evalue cutoff of 10 −10 and keeping only the 10 strongest hits per query. This screenshot shows a portion of the results for the first query. Dark red letters highlight some of the specific features of this report.
A: An index overview summarizes the query and database information and provides clickable links to query-specific results. B: Results for the first query are shown. These include a graphical overview indicating which parts of the query sequence aligns to each hit, a tabular summary of all hits, and alignment details for each hit. C: The first hit is selected for download; its alignment details have been folded away. D: The user is studying the second hit; the mouse pointer hovers over the link to the hit's UniProt page.
researchers. For instance, the authors collectively run 12 private servers and only 4 public servers at the time of this classroom's BLAST server from anywhere and the instructor to control the software and databases versions so that 179 the results for any given query are consistent and predictable, as is critical for a classroom setting and not feasible 180 using publicly-available servers such as NCBI.

Metric Statistic
Hits to sequenceserver.com in the last 12 months 11, 050 Mailing list users 102 Mailing list threads 137

Twitter mentions 102
Sites with publicly available instances 52   bugs (Sametinger, 1997 To ensure a fluid user experience that increases researcher productivity, we designed Sequenceserver around eight 287 modern user interface design principles. First, the interface contains only essential information so as to minimize 288 distractions for the user. Second, the information is laid out in a clear and hierarchically structured manner. As part of 289 this, we paid special attention to typography, using Roboto (https://www.google.com/fonts/specimen/Roboto) 290 for headings and Open Sans (https://www.google.com/fonts/specimen/Open+Sans) for normal text. These 291 free, contemporary typefaces were designed to maximize legibility and overall aesthetics across electronic devices and 292 print media. Third, we used automation where possible to minimize the number of decisions required from the user.

293
For example, based on query type and databases selection we limit the choices for algorithm selection (except in 294 the case of nucleotide-nucleotide search only a single BLAST algorithm is possible; see Supplementary Figure S1).

295
Fourth, we use interactive visual feedback and cues for step-by-step discovery of the workflow. For example, the 296 BLAST button remains disabled until the user has provided query sequence(s) and selected target databases.  Figure S1. Automatic BLAST algorithm selection. BLAST includes five basic algorithms (right column). Arrows indicate how Sequenceserver automatically selects an appropriate BLAST algorithm based on the sequence types of query (left column) and selected databases (middle column). For the first three combinations of query and database types, only one algorithm is possible. The circle indicates that for nucleotide query and nucleotide database, the user can choose between BLASTN and TBLASTX.