Abstract

Motivation: DNA copy number aberrations are frequently found in different types of cancer. Recent developments of microarray-based approaches have broadened the knowledge on number and structure of such aberrations. High-density single nucleotide polymorphism (SNP) microarrays provide an extremely high resolution with up to 500 000 SNPs per genome. Owing to the enormous amount of data the detection of common aberrations in large datasets is a great challenge. We describe a novel open source software tool—IdeogramBrowser—which was specifically designed for use with the Affymetrix SNP arrays. It provides an interactive karyotypic visualization of multiple aberration profiles and direct links to

GeneCards
. Visualization of consensus regions together with gene representation allows the explorative assessment of the data.

Availability: IdeogramBrowser and its source code are freely available under a creative commons license and can be obtained from .  IdeogramBrowser is a platform independent Java application.

Contact:hans.kestler@uni-ulm.de

1 BACKGROUND

Genetic aberrations of the DNA copy number level, such as deletions, amplifications and unbalanced translocations are among the main pathogenetic mechanisms underlying human genetic disorders. Analysis of chromosomal aberrations is particularly important in cancer, where amplification of oncogenes, cell cycle regulators and/or deletion of tumor suppressor genes are involved in the multistep process of cancer development. Rapid and accurate identification of such genetic imbalances is therefore an important factor for the understanding of physiological and pathophysiological processes and may serve as a starting point for new diagnostic methods or therapeutic strategies.

Single nucleotide polymorphism (SNP) arrays from Affymetrix are a widely used tool for the analysis of allelic variations and, additionally, for the detection of copy number aberrations. Besides genotyping, the hybridization signal intensity provides information about the copy number of the chromosomal segments by comparing these intensities to a control set of 110 healthy, ethnically diverse individuals (Wang et al., 2004). For the analysis of these data Affymetrix provides the Chromosome Copy Number Analysis Tool (CNAT) [; Slater et al. (2005)].

Owing to the high number of data points (up to 500 000), the high noise in the data and the small area to display this information on screen, a visual identification of such aberrations is only practical if they span several megabasepairs. Currently the only solution for identification is to scroll through the datafiles or visually scan the single chromosomal representations at a high zoom level, as only one chromosome from a single case is visualized. Such a procedure is not practical for a large number of cases.

According to Huang et al. (2004) and Slater et al. (2005) it is necessary to discard data points below a specific threshold for the copy number estimation and above a certain P-value (single point or genomic smoothed average value). In addition to these thresholds it is also advisable to screen only for recurrent aberrant regions, as Sebat et al. (2004) and Iafrate et al. (2004) first described the existence of so-called copy number polymorphisms (CNPs). According to Sebat et al. (2004) two otherwise healthy individuals differ in about 11 CNPs on average. Therefore singular findings may be invalidated by CNPs and consensus region identification is of major importance. Our software tool allows the filtering via these thresholds (e.g. a higher threshold might be useful if the DNA is slightly degraded) and therefore expedites the identification of biological relevant regions (Fig. 1).

Fig. 1

Visualization of the REF-103, MCF-7, SKBR3 and ZR7530 datasets (freely available from Affymetrix) with group size 3 and minimum length 0 is shown. Gene markers are shown on the right-hand side of the karyogram whereas all SNP locations of the selected sample are shown on the left-hand side. Gains (red) and losses (green) are visualized in the usual color coding.

Fig. 1

Visualization of the REF-103, MCF-7, SKBR3 and ZR7530 datasets (freely available from Affymetrix) with group size 3 and minimum length 0 is shown. Gene markers are shown on the right-hand side of the karyogram whereas all SNP locations of the selected sample are shown on the left-hand side. Gains (red) and losses (green) are visualized in the usual color coding.

Other software tools in this field include VAMP (La Rosa et al., 2006), a large software framework which is not freely available (only tests on a predefined set of examples are allowed), SNP-VISTA (Shah et al., 2005) which is mainly used in the analysis of sequence data and provides no ideogram representation, and the web-based tool SNPscan (Ting et al., 2006), which requires an installed Apache web-server if run as a local application. In contrast to these approaches the proposed software tool—IdeogramBrowser—is open source, easy to use and specifically designed for the import of the CNT result files of the Affymetrix CNAT. In addition, a tab-separated text format allows the use of external copy-number detection tools. Further file formats can be implemented through the open structure of the software.

2 FUNCTIONALITY

IdeogramBrowser is a platform independent Java application providing an interactive karyotypic representation which shows gain/loss or LOH markers together with genes. The karyogram and the genes are directly generated from local copies of the two NCBI mapview database files

ideogram
and
seq_gene.md
() that can be updated accordingly (currently BUILD 36.2). On selecting a marker, overlapping genes are shown in a list (Fig. 1) with direct hyperlinks to for explorative assessment. The visualization is based on the different marker types: LOH score (loss of heterozygosity), log2ratio, single point copy number (SPA), or genomes-smoothed analysis of the copy number (GSA for >10k chips).

Individual global thresholds can be chosen for each marker type. Multiple CNT files can be merged into a single sample which is required e.g. for the 100k split files. Two alternatives for merging two or more profiles can be chosen: resolution enhancement for interlocked SNP chips or reliability enhancement (differential mode). As the number of aberrations (log2ratios) is often too high a marker reduction method was included: only contiguous regions (gain, loss or LOH) of a minimum group size (default three adjacent SNPs of the same polarity) and a minimum sequence length are shown. In addition for GSA and SPA values an upper bound for the corresponding P-value can be defined. Multiple profiles can be shown in the diagram. They may originate from different types of SNP chips (e.g. 100k or 50k) or another ASCII file containing aberration information. Consensus regions can be identified by specifying the minimum number of samples which need to be of a certain state (gain, loss or LOH) in the given locus interval. Karyograms can be exported as JPEG, SVG [World Wide Web Consortium (W3C)] (scalable vector graphics format, which is especially suited for highquality prints), or PDF. Multiple files can be imported at once.

This work is supported by the German Science Foundation, SFB 518, Project C5 (AM, HAK), Project B19 (KH) and the Stifterverband für die Deutsche Wissenschaft (HAK).

Conflict of Interest: none declared.

REFERENCES

Huang
J.
, et al.  . 
Whole genome DNA copy number changes identified by high density oligonucleotide arrays
Hum. Genomics
 , 
2004
, vol. 
1
 (pg. 
287
-
299
)
Iafrate
A.J.
, et al.  . 
Detection of large-scale variation in the human genome
Nat. Genet.
 , 
2004
, vol. 
36
 (pg. 
949
-
951
)
La Rosa
P.
, et al.  . 
VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
2066
-
2073
)
Sebat
J.
, et al.  . 
Large-scale copy number polymorphism in the human genome
Science
 , 
2004
, vol. 
305
 (pg. 
525
-
528
)
Shah
N.
, et al.  . 
SNP-VISTA: an interactive SNP visualization tool
BMC Bioinformatics
 , 
2005
, vol. 
6
 pg. 
292
 
Slater
H.R.
, et al.  . 
High-resolution identification of chromosomal abnormalities using oligonucleotide arrays containing 116,204 SNPs
Am. J. Hum. Genet.
 , 
2005
, vol. 
77
 (pg. 
709
-
726
)
Ting
J.C.
, et al.  . 
Analysis and visualization of chromosomal abnormalities in SNP data with SNPscan
BMC Bioinformatics
 , 
2006
, vol. 
7
 pg. 
25
 
Wang
Z.C.
, et al.  . 
Loss of heterozygosity and its correlation with expression profiles in subclasses of invasive breast cancers
Cancer Res.
 , 
2004
, vol. 
64
 (pg. 
64
-
71
)
World Wide Web Consortium (W3C)
Scalable Vector Graphics (SVG) 1.1 Specification

Author notes

Associate Editor: Martin Bishop

Comments

0 Comments