With the production of whole genome microarray chips the ability arises to investigate whether the regulation of particular groups of genes may be influenced by their chromosomal localization. Chromosome Co-Localization probability calculator (ChroCoLoc) is a publicly available web-based tool for the analysis of co-localization of co-expressed genes identified by microarray experiments.
Microarray gene expression experiments provide the ability to measure the expression of several thousand genes simultaneously, with attempts to represent whole genomes on microarray chips. This presents the possibility of measuring the expression levels of most if not all of the genes expressed in different cell types or cells under differing conditions. It also creates the challenge of interpreting the meaning of the large groups of co-expressed genes identified by these studies. Here we present Chromosome Co-Localization probability calculator (ChroCoLoc), a tool for the analysis of the chromosomal co-localization of groups of genes. The effects of chromosomal co-localization of genes on their co-expression are of interest in several areas of biological and medical research, e.g. cancer research. An increase in gene expression in a particular chromosomal area may indicate increases in gene copy number owing to duplication of that region. Similarly, a decrease of gene expression in an area may indicate a deletion of the region on a chromosome. Several studies have sought to identify chromosomal regions containing an elevated number of regulated genes (Caron et al., 2001; Chang et al., 2004) or link this information to particular cancer types (Almstrup et al., 2004; Bisognin et al., 2004). ChroCoLoc could also be used in conjunction with other approaches to investigate chromosomal changes, e.g. array CGH, providing a statistical analysis of the changes in gene expression in the regions where chromosomal changes are observed. ChroCoLoc is a web-based application that calculates the probabilities of groups of co-expressed genes co-localizing on chromosomes. It provides rapid probability calculation for datasets comprising several thousand features, plots features for visualization of putative co-regulation ‘hotspots’ on a chromosome graphic, supports several mammalian species and is easy to use, driven by a web interface. Other tools exist such as MACAT (Tödling et al., 2005), Onto-tools (Draghici et al., 2003), EASE (Hosack et al., 2003) and Caryoscope (Awad et al., 2004) that can calculate and/or display the distribution of genes from microarray gene expression data with respect to chromosomal location. With ChroCoLoc we combine statistical analysis with web application ease of use to provide a robust analysis of chromosomal gene expression co-localization and high quality graphical output. ChroCoLoc has now been incorporated into the publicly available microarray data analysis platform Expression Profiler (Kapushesky et al., 2004), accessible at .
The existing genome annotation, including that of Homo sapiens, clearly shows an uneven chromosomal distribution of identified coding sequences across the genome, (Hillier et al., 2005). Therefore the calculation of the probability that a group of genes in any one specified area are co-expressed purely by chance must take this uneven distribution into account.
Thus the problem may be described as the probability of x co-expressed genes being located in the same region, when there are y genes in that region, from the total population of genes in the study. This problem can be solved using the hypergeometric distribution (Weisstein, ). We have implemented the calculation of the probability of gene co-expression according to the hypergeometric distribution in ChroCoLoc using the Perl programming language.
After defining the method for calculating the probability of co-localization of regulated genes, the question of defining what makes up a gene-containing region arises. As our probability calculation takes into account the uneven distribution of genes in the genome we have a lot of freedom in how the proposed chromosomal regions are defined. ChroCoLoc calculates the probability of genes co-localizing by comparing the number of genes designated to a particular region from a sample with the number designated to that region in the study population as a whole. As such ChroCoLoc can calculate the probabilities of co-localization of genes in a sample regardless of how the regions are defined, be it established stained chromosome banding patterns or proprietary defined chromosome windows. Currently in ChroCoLoc, we provide karyotypes for human, mouse, rat and fruit fly visualization. The probabilities calculated by ChroCoLoc for co-expressed gene co-localization in the chromosomal banding regions for these organisms can be plotted on their karyotypes.
USAGE AND DATA PRESENTATION
ChroCoLoc is available within the web-based data analysis platform Expression Profiler with access either as a registered or guest user. Users only need to upload their microarray gene expression data into Expression Profiler to use ChroCoLoc for analysis. The uploaded data should contain chromosomal location information, optionally together with gene identifiers that can mark genes represented by more than one feature on the chip (composite features), preventing the same gene from being counted more than once. The required data formats are described in the online documentation, accessible within Expression Profiler. Integration with BioMart (Kasprzyk, et al., 2004) allows automatic retrieval of chromosomal location information for Affymetrix microarrays, hence for these arrays the user needs only to provide the gene expression matrix with probeset identifiers.
ChroCoLoc has a number of filter options available to remove chromosomal regions not containing statistically significant numbers of features. Filters can be set for the minimum number of features per region or the minimum percentage of features regulated in region thus restricting the output to statistically interesting regions. The user also has the option to adjust the calculated probabilities for multiple testing using the Bonferroni correction.
Regions that pass the filter criteria are plotted on a karyotype graphic (Fig. 1) and colored according to the probability of the observed co-localization occurring. The calculated probability information is also presented in tabular form. As an alternative to the probabilities calculated the user can also plot the percentage of the genes co-expressed in a particular chromosomal region or as the absolute number of genes co-expressed per region.
IMPLEMENTATION AND AVAILABILITY
The application is implemented as a Perl module and has been tested on a 2.8 GHz Pentium 4 server running the Fedora Core 4 Linux operating system. ChroCoLoc has been tested with datasets of various sizes, sample size 109–853 and population sizes 3988–19 497 features. The karyogram is available in SVG and PNG formats. SVG has been tested on Windows 2000/XP using Adobe SVG viewer 3.x () and on Linux using Adobe SVG viewer 3.01. The application has now been incorporated into Expression Profiler, hosted by the European Bioinformatics Institute, where it is available for public use. The source code for ChroCoLoc and the entire Expression Profiler suite is available on SourceForge ().
This work was funded by the EU TEMBLOR grant.
Conflict of Interest: none declared.