InterMine: extensive web services for modern biology

InterMine (www.intermine.org) is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis.

: Documentation available for the InterMine API and client libraries. Figure S1: Examples of the links to the API documentation that are available from the home page of each InterMine database and through the main navigation tab of the web interface.

b. Cookbooks
The web services section of the InterMine documentation also has several example scripts: http://intermine.readthedocs.org/en/latest/webservices/howdoi/ These "howtos" detail how to accomplish a specific task, e.g. "How Do I Get a Summary of a Gene?", using each library available. Users can request new "howtos" by adding a support ticket.

c. Interactive documentation
For each InterMine instance, a full list of available web services can be accessed using the / s e r v i c e endpoint with results returned in JSON format.
The same service listing is consumed by IOdocs which provides InterMine's API documentation (http://iodocs.labs.intermine.org). Here, resources are automatically exposed as executable examples, meaning that for each method a user can review and edit input parameters and, in the browser, see the result of running the query.

Getting started: Using the client libraries
Each template search, custom query (using the query builder) or results table in the web interface includes a mechanism for generating code using one of the client libraries. The code will generate the same results as those generated by the search or as seen in the results table (figure S2). The generated code can help get you started using the web service client libraries. Working from the generated stub, you can edit the code to perform your intended task. You will probably want to refer to the API documentation for your target language (http://intermine.readthedocs.org/en/latest/webservices/#apiandclientlibraries).
To access code from a template search, some familiarity with the web interface is required. All InterMine databases have a similar web interface, hence reducing the learning curve. Each InterMine web page includes a series of tabs allowing navigation between the search and tool functions. One such tab, 'Templates', provides access to a library of template queries for that InterMine instance. Templates are simple forms with dropdown lists and text boxes with autocompletion, where users can specify filters. The code generated from a template form will reflect any user constraints added to the form (see figure S2). Figure S2 illustrates code accessed from a template search for all GO annotations for a specified gene. A similar search could be constructed in the query builder, or indeed displayed directly in the query builder from the template form (by clicking the 'Edit query' button). Like the templates, the query builder can also be accessed through the navigation tabs in the web interface. Automatically generated code can be accessed from within the query builder itself or from the results table produced by running the query. The following python code was generated from such a query in FlyMine: # G e t a n e w q u e r y o n t h e c l a s s ( t a b l e ) y o u w i l l b e q u e r y i n g : q u e r y = s e r v i c e . n e w _ q u e r y ( " G e n e " )  OKUP&value1=CG11348&extra1=&format=tab&size=10.

D.
The link to generate the query xml for this template. Such XML can be used in HTTP requests.

Code Examples
In this section we provide example code for using the web services for both common biological usage using the identifier resolution in list upload and retrieving sequence from a specified biological region, and in two example pipelines one producing a suggest service that identifies similar genes by their GO annotation and another performing an enrichment analysis for disease terms on a set of mouse genes, followed by a search b. Retrieving a subsequence The 'sequence' endpoint (s e r v i c e / s e q u e n c e ), may be used to retrieve a sequence object or fetch an indexed subsequence of it e.g. a chromosome subsequence interval. Figure S3. The /sequence method exposed in iodocs The service expects an XML query string (see figure S3) with a single output column that resolves to a subsequence object. Optionally, the user may provide subsequence coordinates defining both fragment start and end (integer).

C A C T C G A G C T G T G A C C G C C G C A C A G T C A A C A A C T A A C T G C C T T C . . . G C G C A A A A T C A A A T T A A G A A A T A A A T G C G A A A A T A A C A T T G " ,
" e n d " : 5 8 6 8 3 0 0 } ] , " e x e c u t i o n T i m e " : " 2 0 1 4 . 0 1 . 2 3 1 4 : 2 2 : : 1 9 " , " w a s S u c c e s s f u l " : t r u e , " e r r o r " : n u l l , " s t a t u s C o d e " : 2 0 0 } In most cases we would recommend that users unfamiliar with making raw HTTP requests take advantage of the automatic code generation available through the InterMine web interface.
c. Building analysis pipelines Different web service resources and their methods can be joined to produce automated analysis pipelines.
With basic programming skills it is possible to join fragments of InterMine's generated code to generate powerful analysis pipelines.
An example script to find similar genes which share a gene ontology (GO) annotation is shown below. The script calls a list from an InterMine instance and uses the list enrichment method (in this case GO) to analyse for genes enriched for GO terms. For the GO term with highest enrichment (if any), a query retrieves all genes (from the same organism) annotated with that GO term. A final step is to identify (*) which of the genes were already present in the starting list. By editing the URI and LIST parameters, the code can be used to suggest genes by enriched GO terms for any InterMine instance which loads gene GO annotations.

# F i n d t h e m o s t e n r i c h e d t e r m , i n t h i s c a s e a d i s e a s e t e r m . m o s t _ e n r i c h e d = n e x t ( g e n e _ l i s t . c a l c u l a t e _ e n r i c h m e n t ( T O O L ) )
#