Summary:xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator.
Availability:xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org.
Modern high-throughput technologies generate large amounts of genomic, transcriptomic, proteomic and metabolomic data. However, existing open source web-based tools for QTL analysis, such as webQTL (Wang et al., 2003) and QTLNetwork (Yang et al., 2008), are not easily extendable to different settings and computationally scalable for whole genome analyses. xQTL workbench makes it easy to analyse large and complex datasets using state-of-the-art QTL mapping tools and to apply these methods to millions of phenotypes using parallelized ‘Big Data’ solutions (Trelles et al., 2011). xQTL workbench also supports storing of raw, intermediate and final result data, and analysis protocols and history for reproducibility and data provenance. Use of Molgenis (Swertz et al., 2010a) helps to customize the software. All is conveniently accessible via standard Internet browsers on Windows, Linux or Mac (and using Java, R for the server).
xQTL workbench provides visualization of QTL profiles, single and multiple QTL mapping methods, easy addition of new QTL analyses, scalable data management and analysis tracking.
2.1 Explore QTL profiles
Through the web interface, users can explore mapped QTLs, and underlying information by viewing QTL plots in an interactive scrollable and zoomable window. xQTL workbench has support for other common image formats, such as PNG, JPG, SVG and embedded postscript; useful for publishing scientific results online, and on paper. From the output, main database identifiers, such as gene, protein and/or metabolite identifiers are automatically linked-out to matching external web pages of public database such as NCBI, KEGG and Wormbase.
2.2 Single and multiple QTL mapping
xQTL workbench wraps R/qtl (Arends et al., 2010; Broman et al., 2003) in a web-based analysis framework offering all important QTL mapping routines, such as the EM algorithm, imputation, Haley-Knott regression, the extended Haley-Knott method, marker regression and Multiple QTL mapping. In addition, xQTL workbench includes R/qtlbim, a library that provides a Bayesian model selection approach for mapping multiple interacting QTL (Yandell et al., 2007) and Plink, a library for association QTL mapping on single nucleotide polymorphisms (SNP) in natural populations (Purcell et al., 2007).
2.3 Add new analysis tools
xQTL workbench supports flexible adding of more QTL analysis software: any R-based, or command-line tool, can be plugged in. All analysis results are uploaded, stored and tracked in the xQTL workbench database through an R-API. When new tools are added, they can build on the high-level multi-core computer, cluster and Cloud management functions, based on TORQUE/OpenPBS and BioNode (Prins et al., 2011). xQTL workbench can be made part of a larger analysis pipeline using interfaces to R, Excel, REST and SOAP web services and Galaxy (Goecks et al., 2010).
2.4 Track analysis and monitor performance
When a new analysis protocol or R script is defined, this protocol can easily be applied to new data. Also, xQTL workbench keeps track of history. Re-use of analysis protocols can be done in an automated fashion. Previous analyses can be rerun without resetting parameters. xQTL workbench provides an online overview of past analyses e.g. which analyses were performed, by who, when and display settings applied.
2.5 Scalable data management
xQTL workbench has a consistency checking database based on XGAP specification (Swertz et al., 2010b), user interfaces to manage and query genotype and phenotype datasets and support for various database back-ends including HSQL (standalone) and MySQL. Phenotype, genotype and genetic map data can be imported as text (TXT), comma separated (CSV) and Excel files. xQTL workbench handles and stores large data in a new and efficient binary edition of the XGAP format, named XGAPbin (extension .xbin), documented online. Such binary formats are essential when handling, storing and transporting multi-Gigabyte datasets.
2.6 Customizable to research needs
Additional modules for new data modalities can be added using Molgenis software generator (Swertz et al., 2010b). The ‘look and feel’ of xQTL workbench is adaptable to institute or consortium style by changing a simple template, which is described in the xQTL workbench documentation enabling seamless integration into an existing website or intranet site, such as recently for EU-PANACEA model organism project and LifeLines biobank.
We built xQTL workbench on top of Molgenis (Swertz et al., 2004), a Java-based software to generate tailored research infrastructure on-demand (Swertz and Jansen, 2007). From a single ‘blueprint’ describing the whole system, Molgenis auto-generates a full application including user interface, database infrastructure, application programming interfaces in R, REST and SOAP (APIs). Molgenis' flexibility and robustness is proven by the wide range of research projects, e.g. the Nordic GWAS Control database (Leu et al., 2010), EB mutation database (van den Akker et al., 2011) and the Animal observation database (Swertz et al., 2010a).
For data storage, the eXtensible Genotype and Phenotype (XGAP) data model was adopted (Swertz et al., 2010b) and extended for big data. To support the increased demand for computational resources for included mapping routines, we added high-level cluster and cloud management functions for computation. The scalable QTL mapping routines of xQTL workbench are written in R and C. The choice of R ties in with the general practice of using R for QTL mapping. The user interface includes direct access to the R interpreter. Both xQTL workbench and Molgenis are open-source software, and source code is transparently stored and tracked in online source control repositories.
xQTL workbench provides a total solution for web-based analysis: major QTL mapping routines are integrated for use by experienced and inexperienced users. Researchers can upload raw data, run analyses, explore mapped QTL and underlying information, and link-out to important databases. New algorithms can be flexibly added, immediately available to all users. Large analyses can be easily executed on a cluster or in the Cloud. Future work include visualizations and search options to explore the results. We also had an EU-SYSGENET workshop that envisioned further integration of xQTL with analysis tools like HAPPY, databases like GeneNetwork, and the workflow manager TIQS (Durrant et al., 2011).
We thank Konrad Zych for Figure 1.
Funding: National Institutes of Health (GM074244 to KB); Netherlands Organisation for Scientific Research (NWO)/TTI Green Genetics (1CC029RP to P.P.); NWO (Rubicon 825.09.008 to M.A.S), Centre for BioSystems Genomics (CBSG), Netherlands Consortium of Systems Biology (NCSB) (to D.A.), Netherlands Bioinformatics Center (NBIC) (to M.A.S.), all part of Netherlands Genomics Initiative/NWO; Target/LifeLines co-funded by the European Regional Development Fund and NWO (to M.A.S.); and EU-FP7 Projects PANACEA (222936 to K.J.v.d.v.) and EURATRANS (241504 to R.C.J.).
Conflict of Interest: none declared.