Abstract

With the production of whole genome microarray chips the ability arises to investigate whether the regulation of particular groups of genes may be influenced by their chromosomal localization. Chromosome Co-Localization probability calculator (ChroCoLoc) is a publicly available web-based tool for the analysis of co-localization of co-expressed genes identified by microarray experiments.

Availability:

Contact:blake@ebi.ac.uk

INTRODUCTION

Microarray gene expression experiments provide the ability to measure the expression of several thousand genes simultaneously, with attempts to represent whole genomes on microarray chips. This presents the possibility of measuring the expression levels of most if not all of the genes expressed in different cell types or cells under differing conditions. It also creates the challenge of interpreting the meaning of the large groups of co-expressed genes identified by these studies. Here we present Chromosome Co-Localization probability calculator (ChroCoLoc), a tool for the analysis of the chromosomal co-localization of groups of genes. The effects of chromosomal co-localization of genes on their co-expression are of interest in several areas of biological and medical research, e.g. cancer research. An increase in gene expression in a particular chromosomal area may indicate increases in gene copy number owing to duplication of that region. Similarly, a decrease of gene expression in an area may indicate a deletion of the region on a chromosome. Several studies have sought to identify chromosomal regions containing an elevated number of regulated genes (Caron et al., 2001; Chang et al., 2004) or link this information to particular cancer types (Almstrup et al., 2004; Bisognin et al., 2004). ChroCoLoc could also be used in conjunction with other approaches to investigate chromosomal changes, e.g. array CGH, providing a statistical analysis of the changes in gene expression in the regions where chromosomal changes are observed. ChroCoLoc is a web-based application that calculates the probabilities of groups of co-expressed genes co-localizing on chromosomes. It provides rapid probability calculation for datasets comprising several thousand features, plots features for visualization of putative co-regulation ‘hotspots’ on a chromosome graphic, supports several mammalian species and is easy to use, driven by a web interface. Other tools exist such as MACAT (Tödling et al., 2005), Onto-tools (Draghici et al., 2003), EASE (Hosack et al., 2003) and Caryoscope (Awad et al., 2004) that can calculate and/or display the distribution of genes from microarray gene expression data with respect to chromosomal location. With ChroCoLoc we combine statistical analysis with web application ease of use to provide a robust analysis of chromosomal gene expression co-localization and high quality graphical output. ChroCoLoc has now been incorporated into the publicly available microarray data analysis platform Expression Profiler (Kapushesky et al., 2004), accessible at .

PROBABILITY CALCULATION

The existing genome annotation, including that of Homo sapiens, clearly shows an uneven chromosomal distribution of identified coding sequences across the genome, (Hillier et al., 2005). Therefore the calculation of the probability that a group of genes in any one specified area are co-expressed purely by chance must take this uneven distribution into account.

Thus the problem may be described as the probability of x co-expressed genes being located in the same region, when there are y genes in that region, from the total population of genes in the study. This problem can be solved using the hypergeometric distribution (Weisstein, ). We have implemented the calculation of the probability of gene co-expression according to the hypergeometric distribution in ChroCoLoc using the Perl programming language.

After defining the method for calculating the probability of co-localization of regulated genes, the question of defining what makes up a gene-containing region arises. As our probability calculation takes into account the uneven distribution of genes in the genome we have a lot of freedom in how the proposed chromosomal regions are defined. ChroCoLoc calculates the probability of genes co-localizing by comparing the number of genes designated to a particular region from a sample with the number designated to that region in the study population as a whole. As such ChroCoLoc can calculate the probabilities of co-localization of genes in a sample regardless of how the regions are defined, be it established stained chromosome banding patterns or proprietary defined chromosome windows. Currently in ChroCoLoc, we provide karyotypes for human, mouse, rat and fruit fly visualization. The probabilities calculated by ChroCoLoc for co-expressed gene co-localization in the chromosomal banding regions for these organisms can be plotted on their karyotypes.

USAGE AND DATA PRESENTATION

ChroCoLoc is available within the web-based data analysis platform Expression Profiler with access either as a registered or guest user. Users only need to upload their microarray gene expression data into Expression Profiler to use ChroCoLoc for analysis. The uploaded data should contain chromosomal location information, optionally together with gene identifiers that can mark genes represented by more than one feature on the chip (composite features), preventing the same gene from being counted more than once. The required data formats are described in the online documentation, accessible within Expression Profiler. Integration with BioMart (Kasprzyk, et al., 2004) allows automatic retrieval of chromosomal location information for Affymetrix microarrays, hence for these arrays the user needs only to provide the gene expression matrix with probeset identifiers.

ChroCoLoc has a number of filter options available to remove chromosomal regions not containing statistically significant numbers of features. Filters can be set for the minimum number of features per region or the minimum percentage of features regulated in region thus restricting the output to statistically interesting regions. The user also has the option to adjust the calculated probabilities for multiple testing using the Bonferroni correction.

Regions that pass the filter criteria are plotted on a karyotype graphic (Fig. 1) and colored according to the probability of the observed co-localization occurring. The calculated probability information is also presented in tabular form. As an alternative to the probabilities calculated the user can also plot the percentage of the genes co-expressed in a particular chromosomal region or as the absolute number of genes co-expressed per region.

Fig. 1

Probabilities of co-localization of regulated genes plotted onto a human karyogram. Chromosomal regions are colored according to decreasing probability of co-localization by chance. The table underneath the karyogram displays the chromosome regions where probabilities have been calculated along with the number of features both in the selected set and whole population present in these regions.

Fig. 1

Probabilities of co-localization of regulated genes plotted onto a human karyogram. Chromosomal regions are colored according to decreasing probability of co-localization by chance. The table underneath the karyogram displays the chromosome regions where probabilities have been calculated along with the number of features both in the selected set and whole population present in these regions.

IMPLEMENTATION AND AVAILABILITY

The application is implemented as a Perl module and has been tested on a 2.8 GHz Pentium 4 server running the Fedora Core 4 Linux operating system. ChroCoLoc has been tested with datasets of various sizes, sample size 109–853 and population sizes 3988–19 497 features. The karyogram is available in SVG and PNG formats. SVG has been tested on Windows 2000/XP using Adobe SVG viewer 3.x () and on Linux using Adobe SVG viewer 3.01. The application has now been incorporated into Expression Profiler, hosted by the European Bioinformatics Institute, where it is available for public use. The source code for ChroCoLoc and the entire Expression Profiler suite is available on SourceForge ().

This work was funded by the EU TEMBLOR grant.

Conflict of Interest: none declared.

REFERENCES

Almstrup
K.
, et al.  . 
Embryonic stem cell-like features of testicular carcinoma in situ revealed by genome-wide gene expression profiling
Cancer Res.
 , 
2004
, vol. 
64
 (pg. 
4736
-
4743
)
Awad
I.A.
, et al.  . 
Caryoscope: an open source Java application for viewing microarray data in a genomic context
BMC Bioinformatics
 , 
2004
, vol. 
5
 pg. 
151
 
Bisognin
A.
, et al.  . 
Detection of chromosomal regions showing differential gene expression in human skeletal muscle and in alveolar rhabdomyosarcoma
BMC Bioinformatics
 , 
2004
, vol. 
5
 pg. 
68
 
Caron
H.
, et al.  . 
The human transcriptome map: clustering of highly expressed genes in chromosomal domains
Science
 , 
2001
, vol. 
291
 (pg. 
1289
-
1292
)
Chang
C.F.
, et al.  . 
Calculating the statistical significance of physical clusters of co-regulated genes in the genome: the role of chromatin in domain-wide gene regulation
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
1798
-
1807
)
Draghici
S.
, et al.  . 
Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate
Nucleic Acids Res.
 , 
2003
, vol. 
13
 (pg. 
3775
-
3781
)
Hillier
L.W.
, et al.  . 
Generation and annotation of the DNA sequences of human chromosomes 2 and 4
Nature
 , 
2005
, vol. 
434
 (pg. 
724
-
731
)
Hosack
D.A.
, et al.  . 
Identifying biological themes within lists of genes with EASE
Genome Biol.
 , 
2003
, vol. 
4
 pg. 
R70
 
Kapushesky
M.
, et al.  . 
Expression Profiler: next generation—an online platform for analysis of microarray data
Nucleic Acids Res.
 , 
2004
, vol. 
32
 pg. 
W465-W470
 
Kasprzyk
A.
, et al.  . 
EnsMart: a generic system for fast and flexible access to biological data
Genome Res.
 , 
2004
, vol. 
14
 (pg. 
160
-
169
)
Töedling
J.
, et al.  . 
MACAT—microarray chromosome analysis tool
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
2112
-
2113
)
Weisstein
E.W.
Hypergeometric Distribution
 
From MathWorld—A Wolfram Web Resource

Author notes

Associate Editor: Martin Bishop

Comments

0 Comments