Summary: The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases.
Availability and Implementation: Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command
The Ensembl (Flicek et al., 2010) and UCSC (Fujita et al., 2010) genome browsers are the first point of call for a large community of genetics and genomics researchers. Both provide a graphical interface for browsing the genomes of a large number of species, displaying the location of genes, polymorphisms, repeats and regulatory regions. Each database can also be accessed directly via SQL and provides an interface for simple querying of the data: BioMart for Ensembl (Haider et al., 2009) and the Table Browser for UCSC. In addition, the Ensembl team provides a Perl API for advanced scripted access to the data (Flicek et al., 2010).
In recent years, the Python and Ruby scripting languages have gained significant ground in the bioinformatics community (Aerts and Law, 2009; Cock et al., 2009; Goto et al., 2010, see e.g.), increasing the need for a programmable interface in these languages. In this article, we describe a second API to the Ensembl database, focusing on the Ruby programming community.
The data available in the Ensembl Genome Browser is stored in a set of MySQL relational databases and to a certain extent normalized. Every table covers one specific conceptual class of objects, such as ‘genes’ or ‘transcripts’. The Ruby ActiveRecord library is used to map tuples within the Core and Variation tables to objects of a class, and delivers a full API with a limited amount of code.
As does the perl API, the Ruby library provides a
The Ruby API to Ensembl is not part of the BioRuby project, but might be linked to it as a plugin in the future.
The user provides the species name (e.g. ‘Homo sapiens’) and an optional release number to connect to the Ensembl database. It is not necessary to make the distinction between Core or Variation; the code will internally open connections to either if necessary. In addition and in contrast to the Perl API, only a single Ruby interface is necessary for every Ensembl release. The Ruby API is also able to work with Ensembl Genomes databases, where multiple species are stored within the same database (e.g. bacterial, fungal and plant genomes; Kersey et al., 2010).
The Ruby Ensembl API provides—to our knowledge—the same functionality as the Perl API where concerning the Core and Variation databases. Class methods cover searching for records: every column in the table is available for querying by preceding it with
The library provides two binaries. The
Future efforts will focus on extending the API to the Compara and FunctionalGenomics databases which provide data for multi-species comparisons and for functional as well as regulatory information.
Figure 1 shows an example of using the Ruby Ensembl API. Lines 1 to 3 load the library and connect to the Core and Variation databases for human release 60. In lines 4 and 5, the BRCA2 gene is retrieved and gene name and location are printed. The
The library described here provides the functionality needed to query the Ensembl database using the Ruby programming language. This API has several advantages compared to the Perl version, including a single API for all releases, terser code, a powerful interactive shell, a more useful implementation of the Slice concept and extensive introspection. The Ruby Ensembl API is also ideal for e.g. adding background information on candidate genes from the Ensembl database in applications geared at clinical geneticists (e.g. Annotate-It; Sifrim,A. et al., manuscript in preparation).
From a library maintainer's perspective, the metaprogramming and introspection capabilities of the Ruby language and the ActiveRecord module allow for providing full functionality and easy maintenance with minimal effort. They enable the Ruby API to be very flexible and by definition virtually insensitive to adding or removing data columns in tables.
The API is available through the Rubygem system and can be installed with the command
J.A. initiated the project and wrote the Core code and overall framework. F.S. created the Variation API and adapted the code for Ensembl Genomes multi-species databases. Both contributed to the manuscript. We thank the European Bioinformatics Institute for hosting JA under the Geek For A Week program, and specifically Glenn Proctor and Andreas Kahari. We thank Marc Hoeppner for useful discussions and Alejandro Sifrim for his contribution to the variation consequence calculation.
Funding: Article processing charges are covered by SymBioSys II (grant number KUL PFV/10/016 SymBioSys).
Conflict of Interest: none declared.