Graphics processing units in bioinformatics, computational biology and systems biology

Abstract Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools.

or to some kind of correlation. These networks can be analyzed from a topological perspective, in order to capture some information on the network structure [3].
In this context, FastGCN exploits GPUs for building interaction networks of co-expressed genes [17]. Interestingly, despite an intense optimization including branch removal from source code, compressed data structures and coalesced access patterns, the GPU version of FastGCN can be slower than a multi-threaded CPU implementation when analyzing small datasets. However, according to the results presented in [17], FastGCN is 63× faster than a single-thread version implemented using the R language.
The goal of genome-wide association studies (GWAS) is to determine the genetic variations of a population that are responsible for a specific phenotype. Since this methodology relies on genome-wide information, it is in general computationally challenging. To this aim, GBOOST [25] was developed to perform gene-gene interaction analysis of large genome data, showing a 40× speed-up and therefore reducing the running time of a GWAS from 2.5 days down to a few hours.
PBOOST [24] is another tool developed in the context of GWAS, which performs a permutation test to detect interacting SNP pairs having significant association with diseases. In order to assess a meaningful P-value, a huge number of permutations must be tested, a circumstance that motivated the GPU implementation; thanks to GPU acceleration, PBOOST is able to strongly reduce the running time. The authors analyzed 10 7 permutations for a single SNP pair considering the genomic data from the Wellcome Trust Case Control Consortium [5]: the analysis was completed in 1 minute using a GPU Nvidia Tesla M2090, instead of the 60 minutes required by a CPU Intel Xeon E5-2650.
GWAS also relies on haplotyping and imputation of untyped genotypes, two tasks whose computational cost escalates quadratically with the number of reference haplotypes. Mendel-GPU [8] performs accelerated imputation, reducing the running time from around 3 years down to less than a week. Finally, PANET [23] is a GPU-powered tool created to investigate how feedback and feedforward loops determine the robustness of the dynamics of large-scale biological networks. PANET was designed to overcome the limitations of a previous software (namely, NetDS [15]), whose applicability to genome-wide networks was limited by CPU-bound execution. To the best of our knowledge, PANET and Mendel-GPU are the only tools for computational analysis of biological systems implemented exclusively for the OpenCL framework.

Bayesian Inference
Three notable examples of GPU-based methodologies for Bioinformatics are both based on Bayesian inference: MrBayes [2], PLL [12] and FamSeq [21]. The first two tools are used for the investigation of phylogenetic trees from DNA data, while the third calculates the variant calling for family-based sequencing data.
The GPU version of MrBayes achieved a 63× speed-up with respect to the equivalent CPU-bound implementation, although it is characterized by a limited applicability [12]. The authors also tested a distributed and multi-GPU execution on the Tianhe-1A supercomputer, using 32 of its 7168 GPU Nvidia M2050, obtaining a noticeable 478× acceleration. A further version of MrBayes, named oMC 3 , was proposed by Chai et al. [7]: this heterogeneous implementation further reduced the running time by simultaneously exploiting multi-threaded CPU computation and GPUs.
The authors of PLL compared the performance of a GPU implementation (running on a Nvidia Tesla C2075) with a strongly optimized CPU version exploiting AVX intrinsics [18], with a maximum speed-up of 2×.
In the case of FamSeq, the GPU was exploited to calculate the posterior probability for 3 n kinds of genotypes, where n is the pedigree size: this is a task suitable for GPU's programming paradigm. Thanks to this strategy, the tool achieved a 10× speed-up that, as stated by the authors, allows to call variants for the whole genome sequencing data in just 36 hours, instead of 16 days as required by the CPU version.

Movement Tracking
Movement tracking algorithms can be used to assist model building and validation. Szafaryn et al. [22] tested a MATLAB application for heart wall tracking, which exploits external CUDA kernels to offload highly parallel activities (e.g., speckle-reducing anisotropic diffusion). According to their results, GPUs allow a relevant speed-up; however, the authors underline the difficulty in porting existing algorithms to GPUs despite their advantages, explicitly stating that the improvement of performances is directly proportional to the coding effort.

Quantum Chemistry
Quantum chemistry is based on computationally demanding simulation methods that rely on models of the electronic structure of many-body systems, and exploit approximate solutions of the Schrödinger equation [13]. Different CUDA implementations of these methods-based on Khon-Sham and Hartree-Fock theories, ab initio electron correlation techniques and quantum Monte Carlo-achieved up to 100× speed-up with respect to the classic sequential counterpart. For these topics, we refer the interested reader to [10] and references therein.

Further General Techniques
Additional applications of GPUs can be exploited to accelerate the investigation of complex biological systems. For instance, Nvidia's cuFFT libraries allow the accelerated calculation of the Fast Fourier Transform [20]; Principal Components Analysis, widely used to reduce the complexity of biological systems, can be accelerated on GPUs [1]; Markov Clustering, which can be exploited to identify functional modules in protein-protein interaction networks, is well suited for GPU's acceleration, provided that the implementation exploits a sparse matrix data structure [6]; the Fast Non-dominating Sorting Genetic Algorithm (NSGA-II) is useful to solve many multi-objective optimization tasks in Computational and Systems Biology [9].

Miscellaneous
Tool  Table 1: List of the GPU-powered miscellaneous tools, along with the speed-up achieved and the solutions used for code parallelization. * Refers to a speed-up value achieved on a cluster of GPUs.