‘Multi-omic’ data analysis using O-miner

Abstract Innovations in -omics technologies have driven advances in biomedical research. However, integrating and analysing the large volumes of data generated from different high-throughput -omics technologies remain a significant challenge to basic and clinical scientists without bioinformatics skills or access to bioinformatics support. To address this demand, we have significantly updated our previous O-miner analytical suite, to incorporate several new features and data types to provide an efficient and easy-to-use Web tool for the automated analysis of data from ‘-omics’ technologies. Created from a biologist’s perspective, this tool allows for the automated analysis of large and complex transcriptomic, genomic and methylomic data sets, together with biological/clinical information, to identify significantly altered pathways and prioritize novel biomarkers/targets for biological validation. Our resource can be used to analyse both in-house data and the huge amount of publicly available information from array and sequencing platforms. Multiple data sets can be easily combined, allowing for meta-analyses. Here, we describe the analytical pipelines currently available in O-miner and present examples of use to demonstrate its utility and relevance in maximizing research output. O-miner Web server is free to use and is available at http://www.o-miner.org.


Introduction
Large amounts of data have been generated from highthroughput profiling platforms. Public repositories, such as the Gene Expression Omnibus (GEO) [1], ArrayExpress [2], Sequence Read Archive (SRA) [3] and the European Genome-phenome Archive (EGA) [4], contain thousands of profiles from transcriptomic, genomic and methylation platforms across many experimental conditions and sample types. The exponential growth in the number of available data sets has created many challenges to data analysis and integration. The Bioconductor project (www.bioconductor.org) provides a vast array of open-source packages within the R programming environment for the analysis of data from both array and sequencing platforms. A typical pipeline for the analysis of array/sequencing data will require the use of several R packages and substantial coding skills to navigate between inputs and outputs from one package to another. This is not a simple task for those without any programming or data analysis experience, making the process both difficult and time-consuming. To make these workflows more accessible and useful to biologists and clinicians, it is necessary to create easy-to-use online tools to perform the required bioinformatic and statistical analyses. While several online tools are available for the analysis of data from transcriptomic and genomic experiments such as Babelomics 5.0, ArrayMining, Patchwork, wANNOVAR and the Tumor Aberration Prediction Suite [5][6][7][8][9], most cover just one type of data, or require bioinformatics expertise to interpret the data. Consequently, the need arises for comprehensive and easy-to-use online bioinformatics tools that are able to process raw profiles either individually or as a meta-analysis, while alleviating the need for researchers to invest time and effort in setting up the necessary computational infrastructure.
To satisfy this, we developed O-miner [10] as a solution for the analysis and exploitation of data. All analytical pipelines are designed to run in the R statistical environment and use wellestablished statistical methods from Bioconductor packages. Since its first publication in 2012, we have greatly improved O-miner by adding a number of analytical and graphical features to increase functionality and to improve the usability of our software by the scientific community. Here, we present an overview of O-miner and focus on its enhanced workflows and computational features. We also illustrate examples of use for the analysis of multiple transcriptomics data sets from breast cancer (BC) studies and the post-processing of RNA sequencing (RNA-Seq) data from prostate cancer (PCa) samples to extract biologically meaningful results. Feedback from the user community has resulted in many significant additions and improvements to the O-miner query system, analytical workflows and output layers since the first release. Table 1 summarizes the analytical workflows with corresponding input and output features that are currently supported in O-miner. Details regarding each of the analytical workflows are available from our comprehensive online user guide (http://o-miner.org/guide_2.0.html).

O-miner features
Query submission architecture and data source The basic O-miner request-response internal architecture remains the same. Following the provision of a project name and an email address (optional), users can upload data to O-miner via the user interface either as a zipped archive (of raw CEL files or a normalized data matrix from in-house or public data) or via the input of valid GEO GSE accession number(s). Users are then presented with a tabular form, where they need to assign biological groups to the uploaded data. While previously completed online, the current version of O-miner facilitates the assignment process by offering the option to upload a text file containing the relevant information for each sample. Analysis of individual and multiple data sets from the GEO has also been improved. In addition, extraction of biological information regarding the samples from GEO has been automated, bypassing the requirement for any manual inputting of data. Up to five user-defined data sets from GEO (using GSE accession numbers) can be automatically uploaded to O-miner.
An updated Perl CGI pipeline connects the data submitted through the front-end Web interface to the back-end workflows implemented in R. The results are displayed back to the users, once each of the analytical steps has been completed. Users are notified via email with the URL from where the results can be viewed. The results generated by O-miner are accessible online and are available for download without restrictions. Uploaded data sets and computed results are stored on our system for a period of 2 weeks.

Analytical workflows: updates and additions
The first version of O-miner was composed of two analytical domains. These are genomics and transcriptomics, where the latter contained just one workflow for the analysis of data from the Affymetrix GeneChip Human Genome series only. The updated version includes an improved workflow for transcriptomic data and allows the analysis of data from the Affymetrix GeneChip Mouse Genome. Furthermore, new workflows have been added for the analysis of data from the Illumina expression array, Affymetrix exon array, Affymetrix microRNA (miRNA) array and the downstream processing of data from RNA-Seq experiments.
Previously, the genomics layer contained one workflow for the analysis of data from Affymetrix SNP arrays, which offered the use of several different algorithms for the segmentation stage of the analysis. However, in practice, when several of these are chosen, the analysis became both time-consuming and computationally expensive. The genomics layer has been improved. We have now implemented an improved version of the workflow using just one segmentation model circular binary segmentation (CBS), which is widely used in copy number analysis. This has simplified the process for users and limited the time and computational burden of analyses. We have also added two new workflows to the genomics section: the allelespecific copy number analysis of tumour (ASCAT) workflow and the genome sequencing (post-processing) workflow to estimate copy numbers from whole-exome sequencing (WES) and wholegenome sequencing (WGS) data.
A third, analytical layer for the analysis of data from methylation arrays has also been implemented. A list of data types and -omics platforms currently supported by O-miner is provided in Table 2.

Transcriptomics
The core workflow comprised the following steps: quality control (QC), normalization, filtering, differential expression analysis and the identification of statistically significant gene ontology (GO) terms. For each of the new platforms added to O-miner, the relevant functions to perform these steps have been implemented. In addition, we have increased the functionality of the existing Affymetrix transcriptomics pipeline by adding the option of estimating tumour purity with the algorithm ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumours using Expression data) [11]. To increase the statistical robustness of meta-analyses, users now have the option to use the algorithm COMBAT [12] that eliminates batch effect(s) when integrating data from different studies from the same platform. We have also included an option for survival analysis.
Results can be viewed from a single Web page, with data from each step of the analysis presented in a distinct tab ( Figure 1). The QC assessment for raw .CEL files is implemented using the R package ArrayMvout [13]. These are presented as both a summarized report and as individual plots. ArrayMvout automatically excludes outliers from the analysis. Additional QC checks, from ArrayQualityMetrics [14], can be applied. Optionally, the COMBAT algorithm can be applied to a metaanalysis, which is usually performed after the normalization step. An estimated tumour purity report is also generated (for Affymetrix GeneChip Human Genome series only), if the ESTIMATE algorithm is run.
Normalized data matrices are derived using platformspecific normalization methods followed by a filtering step. An unsupervised hierarchical clustering algorithm is used on the normalized gene expression data to generate dendrograms to show similarity between samples. Filtering reduces the dimensionality of the data using one of the three following methods: interquartile range (IQR) (soft; intermediate; robust), intensity (25 or 50% of samples) or SD (up to the top 40% of the most variable probes). Differential expression analysis is then applied to the filtered matrix of normalized expression values using LIMMA [15] to identify significantly altered probes between the biological groups in the user-defined comparisons. For each comparison, the differentially expressed genes (DEGs), passing the user-imposed cut-offs of adjusted P-values and log fold-change, are presented in a tabular format as an annotated list or graphically as heatmap ( Figure 1A). Users can view boxplots displaying the expression of DEGs across the predefined biological groups ( Figure 1B). Optionally, Venn diagrams can be generated showing unique and overlapping genes that are differentially expressed in up to four biological comparisons ( Figure 1C). The DEG lists can subsequently be used to identify significantly under-and over-represented GO terms using the R package GOstats [16], which includes ontologies relating to molecular function, biological process (BP) and cellular component. For data sets where survival covariates are supplied, three Kaplan-Meier (KM) plots are generated, to show 5, 10 and 15 years of survival rates across different risk groups [17].
The current version also has incorporated additional functionalities to interrogate the data. Previously, users could only visualize expression boxplots for their gene(s) of interest across the biological groups. Users now have the option to visualize the effect of those genes on survival as KM plots. A univariate model is applied to the survival data, and samples are assigned to risk groups based on the median dichotomization of mRNA expression intensities of the respective genes. Users can also identify genes that are co-expressed with their gene(s) of interest. The top 10 genes with the highest correlation, in terms of Pearson product-moment correlation coefficients (PPMCCs) and associated P-values, are presented in a table.
A number of default values are set against the various analysis parameters to assist non-advanced users: ArrayMvout is the default method for detecting outliers in the QC step; data are normalized using Robust Multi-array Average (RMA) and filtered using SD, where the top 40% of the most variable probes on the array are used for differential expression analysis; an adjusted P-value threshold of 0.05 and a log 2 fold-change threshold of 2.0 are imposed to identify DEGs.

RNA-Seq post-processing
Along with array-based transcriptomic data analysis, O-miner now also supports the post-processing of RNA-Seq data ( Figure 2). Analytical steps covered by this pipeline include differential expression, annotation with Ensembl Gene IDs (if data are not already annotated) and the identification of statistically significant GO terms using the R package GOseq. Users can provide raw read counts generated from HTSEQ [18] or Reads Per Kilobase per Million mapped reads (RPKM) values. They can choose between LIMMA and edgeR [19] as methods for differential expression analysis, when raw read counts are provided. However, only LIMMA is available, when a matrix of RPKM values is uploaded. LIMMA applies the voom [20] transformation to raw read counts data to generate log counts per million with associated precision weights to be used for differential expression analysis, whereas EgdeR applies a generalized linear model to the data to calculate differential expression.
Results are presented in the same format as those from the other transcriptomics workflow. Further details of results generated from this pipeline can be found in our 'Examples of use' section.
Genomics O-miner offers two analytical workflows, CBS and ASCAT [21], for the analysis of copy number data generated on Affymetrix array platforms (Table 1, Figures 3 and 4). Both workflows can conduct a complete genomic analysis from raw data files. In addition, the CBS workflow can also perform analysis from multiple entry points with processed data as input. A third workflow is offered for the post-processing of sequencing data, which estimates copy numbers from pre-processed WES and WGS data using the ASCAT algorithm ( Figure 4). Both CBS and ASCAT workflows have common analytical steps, including background correction, allelic crosstalk calibration, nucleotideposition probe sequence effects normalization, probe-level summarization using robust average (SNP 5.0 and SNP 6.0 arrays) or log-additive model (10K, 100K and 500K arrays), polymerase chain reaction fragment-length effects normalization and calculation of raw copy number estimates (log 2 ratios) relative to the chosen reference. QC cluster or aberration density plots are generated for each platform using the R package aroma.affymetrix [22]. Default parameters are set to facilitate the analytical process for non-advanced users. These include a log 2ratio (copy number ratio) threshold of 0.2; a minimum number of 15 consecutive SNPs to define regions of copy number aberration (CNA) as a gain/loss; and an observation percentage of 20% that sets the minimum number of samples for which a copy number event must be observed.
Similar to a transcriptomic analysis, results are viewable from a single Web page, with data from the QC step and sample-specific CNA regions as well as recurrent CNA regions across groups presented in distinct tabs ( Figure 3). CNA regions are reported with corresponding physical and cytogenetic (optional) mapping information. Users can also customize the analyses by selecting to retrieve results from one or more data annotation systems, e.g. Refseq, Ensembl, UCSC and Vega, information on overlapping regulatory elements, such as miRNAs [23] or conserved transcription factor binding sites. Recurrent CNA regions are available in both tabular and graphical formats, and can be viewed either as summarized across all chromosomes simultaneously or individually for each chromosome. Such information can be valuable for identifying putative disease-causing genes [24].

Circular binary segmentation
Both paired (e.g. tumour-normal) and unpaired analyses are available when raw CEL files are provided. If paired normal/baseline samples are not available, a user-defined baseline may be selected. In this case, O-miner generates a pooled average from the unpaired  O-miner conducts unsupervised hierarchical clustering for each sample using the raw log 2 ratios. If users do not specify threshold values, then the log 2 ratio threshold is calculated based on the quantile distribution of segmented copy numbers. These thresholds are then applied to the data to call copy number gains and losses. Figures 3A and B and 4C illustrate various plots that O-miner generates based on filtered or unfiltered log 2 ratio data.
Allele-specific copy number analysis of tumour The ASCAT workflow is useful to study diseases caused by somatic mutations (Figure 4). O-miner accepts only raw .CEL files for this workflow, where both paired and unpaired analyses are available. CalMaTe [25] is used to calculate the log 2 ratios and Ballele frequencies (BAFs) between samples. These values are then processed with the allele-specific piecewise constant fitting (ASPCF) algorithm. For each sample, ASCAT profiles ( Figure  4A) are generated with aberrant cell fraction and tumour ploidy information. Absolute allele-specific copy number calls are estimated with annotated regions of gain, loss or copy neutral loss of heterozygosity (LOH) ( Figure 4B). Frequency and aberration plots are also generated ( Figure 4C), using the R package DNAcopy [26].

Estimation of copy number from WES and WGS data
The workflow to generate copy number information from WES and WGS data takes as input the pre-processed files containing  summary information of reads from the comparison between each tumour-normal pair. Pre-processing steps to analyse these data comprise: QC, alignment and the generation of SNP and indel-variant genotyping information. Based on the numbers of reads supporting the reference and altered alleles for each variant between tumour and normal samples, log 2 ratios (LRR) and BAF values are calculated for each tumour-normal pair with the depth information normalized by dividing the depth of each variant by the median depth across all variants. These files are then used as input to the ASCAT algorithm to estimate copy number calls, annotate the regions of CNA as well as generating frequency and aberration plots.
Methylomics O-miner now offers an analytical workflow to analyse data generated on Illumina Infinium methylation arrays (Table 1, Figure 5). Similar to the transcriptomics workflow, the general structure of the methylomics workflow comprises key steps: QC, normalization and filtering, followed by differential Input data can be provided as either raw files or normalized data. When raw files are provided, red and green IDAT files are uploaded for each sample. O-miner automatically combines the two files and extracts the sample name. Users then need to specify the biological state/source for the sample. Paired or unpaired analyses can be performed, and technical replicates can be flagged. QC analysis on the input data is performed using the R package ChAMP [27], which calculates the proportion of failed probes in each sample. A variety of quality assessment plots are also generated alongside a hierarchical cluster diagram ( Figure 5A). Raw data can be normalized into a matrix of beta values by selecting one of the following methods: BMIQ [28], SWAN [29] and PBC [30]. More information on each of these methods can be found in our online user guide. Normalized data generated by O-miner or user-provided normalized data are then used for filtering and subsequent differential methylation analysis to identify significantly altered probes. An annotated list of the differentially methylated regions is generated. Information is provided regarding the chromosomal location, corresponding hyper-or hypo-methylated genes and whether the probe region overlaps with a known region of differential methylation and/or a CpG island. Boxplots are also generated showing differences in methylation levels between biological groups. Users can also opt for generating Venn diagrams showing the number of probes exclusive or common to each of the biological groups.
GO terms found to be over-or under-represented amongst the differentially methylated probes can also be generated ( Figure 5C). O-miner also allows users to interrogate the correlation between a probe of interest and others present on the array using the PMCC value.

Elaborate documentation
To ensure that first-time users are able to understand all the available parameters for -omics analyses and customize their analysis accordingly, a comprehensive user guide describing each workflow is available online (http://o-miner.org/guide_2.0. html).
Additionally, an 'Examples of use' section is available (http:// o-miner.org/examples_2.0.html), so that users can familiarize themselves with the analytical workflows, input file formats and structure of the output before analysing their own data. Examples are provided for each of the analytical pipelines with various parameter settings. The user interface also contains pop-up help buttons for selected fields to provide additional information.

Examples of use
This section presents how the RNA-Seq and transcriptomics workflows can be applied to biological data to conduct independent and meta-analyses and to draw meaningful conclusions.

Case study 1: Meta-analysis of BC transcriptomics data to investigate the relationship between triple-negative BC and basality
Background BC is one of the leading causes of cancer-associated deaths among women worldwide. It is a heterogeneous disease exhibiting distinct histological and biological characteristics, diversity in clinical behaviour and variability in response to treatment. The ability to reliably classify and address these entities independently has important diagnostic, prognostic and therapeutic implications and is a major step towards a more personalized approach to the treatment of BC.
Seminal studies applying microarray-based technology to BC research demonstrated the phenotypic heterogeneity of BC to be accompanied by a parallelized diversity in transcriptomic profiles, and segregated BCs into five primary molecular subtypes-luminal A, luminal B, basal-like (BL), Her2þ and normal breast-like-each with distinct transcriptomic signatures.
The BL subtype represents 10-25% of all BC and is of particular interest to the cancer research community because of its aggressive clinical behaviour, lower overall survival relative to the other molecular subtypes and lack of targeted therapy. Clinically, this subtype is characterized by a high prevalence in premenopausal women, particularly those of African descent, large tumour size at diagnosis and specific metastatic patterns favouring dissemination to the brain and lungs. For Basal-Like Breast Cancer (BLBC) patients, the first-line treatment would be conventional chemotherapy.
This molecular subtype shares many features with the immunohistochemically defined triple-negative (TN) subgroup, which is characterized by a lack of clinically significant oestrogen receptor (ER), progesterone receptor (PR) and Her2 expression. The terms BLBC and Triple-Negative Breast Cancer (TNBC) have been used interchangeably in the past but, with discordance rates of 30% reported between tumours with the TN phenotype and those with the BL molecular phenotype, it is important that researchers address these definitions as distinct entities.
Triple-negative tumours assigned to the basal-like molecular subgroup (TNBL) have been associated with lower median age at presentation, higher pathological grade, increased tumour size and distinct differences in clinical outcome relative to TN tumours allocated to one of the other molecular subtypes (TNnonBL).
We used O-miner to conduct a multi-cohort meta-analysis of publicly available data to gain a deeper understanding of the relationship between TNBL and TNnonBL tumours.
The ER, PR and Her2 receptor status of each sample were defined by implementing functions within the MCLUST R library. The MCLUST algorithm was set to calculate the Bayesian information criterion for a two-component Gaussian distribution model. In addition, the PAM50 classifier was applied to determine the subtype calls of each sample. The TN samples were then isolated and grouped based on their basality, i.e. TNBL and TNnonBL.
These samples were uploaded to O-miner, and in-depth analyses of their transcriptomic profiles and survival characteristics were conducted.

Interpretation of output
Unsupervised hierarchical clustering to view the underlying structure of the data indicated that the gene expression profiles of TNBL BCs are more similar to each other than to those of TNnonBL BCs ( Figure 6A).
The most DEGs between the two groups included GABRP, ABCA8 and DARC, as well as various cytokeratins, which is in accordance with previous work. By focusing on a given gene, such as GABRP, O-miner presents its expression across the biological groups ( Figure 6A and B). From this, we can examine the behaviour of the gene more clearly. For example, expression of GABRP is higher in TNBL tumours than in TNnonBL tumours, supporting previous research suggesting that GABRP is involved in the initiation and progression of BL tumours. Key GO terms identified as disrupted between TNBL and TNnonBL include antigen processing, cytokine activity and immune response, indicating that multiple immune processing pathways are affected in TNBL relative to TNnonBL [33] ( Figure 6D).
The 5-year KM plot displays a trend for the Basal-Like Triple Negative (BLTN) group to have a poorer overall survival relative to BLnonTN; however, this relationship is not significant (P > 0.05) ( Figure 6C) and disappears >10 and 15 years.
The results from this analysis suggest that TNBL and TNnonBL tumours exhibit unique transcriptomic profiles, with genes and pathways associated with immunological processes and cell signalling being reported as significantly disrupted between the two groups. This not only serves to confirm the 'uniqueness' of each group but also could indicate potential targets that warrant further investigation.
Case study 2: Analysis of PCa sequencing data from The Cancer Genome Atlas Background PCa is the second most common male cancer and the fifth leading cause of cancer-related death in men [34]. It has long natural history and can initiate from disrupted prostate epithelium and progressively develop over many decades [35]. While PCa patients present remarkable diversity both in terms of pathology and clinical presentation [36], which can be partially explained by underlying genetic heterogeneity, in most cases, it is an indolent disease that is unlikely to ever become symptomatic during patients' lifetime [37].
Many studies and collaborative efforts investigated PCa molecular make-up resulting in the identification of key alterations and associated molecular processes involved in the disease development. Therefore, we used PCa as proof of concept to test the robustness of the O-miner RNA-Seq post-processing pipeline for detecting genuine aberrantly expressed genes and also to search for novel gene associations with PCa.

Samples
RNA-Seq data from The Cancer Genome Atlas (TCGA) prostate adenocarcinoma (PRAD) project were downloaded and subjected to QC and alignments steps. These pre-processed data were uploaded to O-miner, and genes/GO terms differentially altered between PCa and normal samples were identified (Figure 7).

Interpretation of output
Several genes identified as differentially expressed by O-miner have previously been identified as PCa biomarkers including PCA3, DLX1, single-minded homolog 2 (SIM2), hepsin (HPN), HOXC6, AMACR ( Figure 7A) as well as MYC, forkhead box O1 (FOXO1), PTEN, RUNX2, MET, RB1, EGF, ERG, EZH2, FOXA1 and SPINK. The most significantly DEG PCA3 encodes prostate cancer gene 3, which is a widely used urine biomarker for PCa [38]. SIM2 encodes a transcription factor involved in PCa onset and progression [39]. HPN is one of the most consistently over-expressed genes in PCa and is associated with disease progression and metastasis [40]. Other top-ranked genes by O-miner and implicated in PCa are the homeobox genes HOXC6 and DLX1, recently proposed as urine-based biomarkers for early disease diagnosis [41], as well as diagnostic marker AMACR [42], which encodes alpha-methylacyl-CoA. Moreover, we noted strong evidence for differential expression for MYC, a well-known oncogene [43] located in frequently amplified region 8q24 in PCa, and FOXO1, a key downstream effector of the tumour suppressor PTEN and critical gene in negative regulation of transcription factor RUNX2, which are also significantly deregulated according to results derived from O-miner.
Several top-ranked genes, which have not been linked with PCa, have been previously associated with other malignancies. For instance, DNAH5 (dynein axonemal heavy chain 5) has an important role in the development of colorectal cancer [44], O-miner | 139 Figure 6. Application of the transcriptomics workflow for the multi-cohort analysis of BC data. Data Collection: A meta-analysis was conducted using O-miner to investigate the effect of basality on TN BCs. Two Affymetrix data sets GSE48390 and GSE21653 were downloaded using GEO data set as the data source option. The subset of samples defined as triple negative, were selected from the File Organiser window. Analysis Parameters: Once all the sample characteristics and survival covariates were provided, the raw data were normalized using RMA and filtered using SD (top 10%). Samples belonging to each of the data sets were specified and the COMBAT algorithm applied to adjust for batch effects. The resulting normalized matrix was subjected to differential expression and survival analyses. All the results are available and easy to download as text and excel files. Results: (A) Unsupervised hierarchical clustering of the gene expression profiles suggests that TNBL BCs are more similar to each other than to TNnonBL BCs. The cluster is annotated with the sample names and biological groups. Each biological group has its own colour. (B) The GABRP gene was reported differentially expressed between the two biological groups. The expression of GABRP between the TNBL and TNnonBL groups can be displayed by boxplots. (C) Survival, the 5-year KM survival plot suggests that the BLTN group has poorer overall survival relative to the BLnonTN group but this relationship is not significant (P>0.05). (D) Statistically significant GO terms between BLTN and BLnonTN groups are displayed, with hyperlinks to external resources provided. whereas COMP (cartilage oligomeric matrix protein) has been recently reported as a novel biomarker contributing to the severity of BC [45].
Results of GO analysis using DEGs are displayed ( Figure 7B). Over-and under-represented GO terms are listed and GO IDs, Pvalues and GO term annotations are presented. Among the most enriched GO terms is steroid hormone metabolic processes (GO:0008207) associated with biosynthesis of cholesterol, whose increased level in the blood was previously linked with an increased risk of PCa and its aggressiveness [46]. Two other highly enriched GO terms (GO:0007411 and GO:0008045) are related to axon guidance, whose associated genes were previously linked with PCa [47] suggesting that these BPs may have important role in prostate tumourigenesis.
Several established PCa biomarkers were identified using O-miner. Moreover, many genes not previously reported in the context of PCa were identified from this analysis. Given the published evidence of their association with other malignancies, they are promising candidates for experimental validation and further exploration to characterize their functional role in PCa. This example illustrates that O-miner provides researchers with the tools required to conduct powerful analyses of publicly available sequencing data.

Limitations and future directions
O-miner is an analytical suite that has filled an existing void for biologists to be able to perform increasingly complex -omics analyses without the need for bioinformatics support or a complex IT infrastructure.
Currently, a key limitation to O-miner is that it requires Next Generation Sequencing (NGS) inputs to be pre-processed. This means that the user needs to conduct QC and sequence alignment on sequencing data. Another limitation is the lack of workflows for Agilent arrays or non-Affymetrix copy number arrays starting from raw data. While O-miner provides links to gene ontologies, we appreciate that the utility of this resource would be greatly enhanced if links to pathway databases, such as KEGG [48] or Reactome [49], were also provided. Finally, O-miner is not yet able to integrate results from the different analytical layers, for example correlating gene expression with copy number information on matched samples.
The flexible design of the analytical modules comprising O-miner allows for the easy addition of further analytical processes to existing pipelines as well as new workflows. In that respect, our future plans include expanding on the analytical, computational and visualization capabilities of the tool and making it even more informative and useful for the research community.

Availability and requirements
Project name: O-miner Project home page: www.o-miner.org Operating system(s): Platform independent; Standard WWW browser (Google Chrome, Safari and Mozilla Firefox).

Key Points
• O-miner provides a number of extensive analytical workflows for the analysis of high-throughput data.
• Data from transcriptomic arrays, pre-processed RNA-Seq, SNP array, WES and WGS data as well as methylation arrays can be analysed with ease.
• Raw data from GEO (multiple projects) or summarized data from TCGA can also be submitted and analysed.
• Results can be viewed online or downloaded in text, graphical and excel format.