Motivation: DNA copy number aberrations are frequently found in different types of cancer. Recent developments of microarray-based approaches have broadened the knowledge on number and structure of such aberrations. High-density single nucleotide polymorphism (SNP) microarrays provide an extremely high resolution with up to 500 000 SNPs per genome. Owing to the enormous amount of data the detection of common aberrations in large datasets is a great challenge. We describe a novel open source software tool—IdeogramBrowser—which was specifically designed for use with the Affymetrix SNP arrays. It provides an interactive karyotypic visualization of multiple aberration profiles and direct links to
Genetic aberrations of the DNA copy number level, such as deletions, amplifications and unbalanced translocations are among the main pathogenetic mechanisms underlying human genetic disorders. Analysis of chromosomal aberrations is particularly important in cancer, where amplification of oncogenes, cell cycle regulators and/or deletion of tumor suppressor genes are involved in the multistep process of cancer development. Rapid and accurate identification of such genetic imbalances is therefore an important factor for the understanding of physiological and pathophysiological processes and may serve as a starting point for new diagnostic methods or therapeutic strategies.
Single nucleotide polymorphism (SNP) arrays from Affymetrix are a widely used tool for the analysis of allelic variations and, additionally, for the detection of copy number aberrations. Besides genotyping, the hybridization signal intensity provides information about the copy number of the chromosomal segments by comparing these intensities to a control set of 110 healthy, ethnically diverse individuals (Wang et al., 2004). For the analysis of these data Affymetrix provides the Chromosome Copy Number Analysis Tool (CNAT) [; Slater et al. (2005)].
Owing to the high number of data points (up to 500 000), the high noise in the data and the small area to display this information on screen, a visual identification of such aberrations is only practical if they span several megabasepairs. Currently the only solution for identification is to scroll through the datafiles or visually scan the single chromosomal representations at a high zoom level, as only one chromosome from a single case is visualized. Such a procedure is not practical for a large number of cases.
According to Huang et al. (2004) and Slater et al. (2005) it is necessary to discard data points below a specific threshold for the copy number estimation and above a certain P-value (single point or genomic smoothed average value). In addition to these thresholds it is also advisable to screen only for recurrent aberrant regions, as Sebat et al. (2004) and Iafrate et al. (2004) first described the existence of so-called copy number polymorphisms (CNPs). According to Sebat et al. (2004) two otherwise healthy individuals differ in about 11 CNPs on average. Therefore singular findings may be invalidated by CNPs and consensus region identification is of major importance. Our software tool allows the filtering via these thresholds (e.g. a higher threshold might be useful if the DNA is slightly degraded) and therefore expedites the identification of biological relevant regions (Fig. 1).
Other software tools in this field include VAMP (La Rosa et al., 2006), a large software framework which is not freely available (only tests on a predefined set of examples are allowed), SNP-VISTA (Shah et al., 2005) which is mainly used in the analysis of sequence data and provides no ideogram representation, and the web-based tool SNPscan (Ting et al., 2006), which requires an installed Apache web-server if run as a local application. In contrast to these approaches the proposed software tool—IdeogramBrowser—is open source, easy to use and specifically designed for the import of the CNT result files of the Affymetrix CNAT. In addition, a tab-separated text format allows the use of external copy-number detection tools. Further file formats can be implemented through the open structure of the software.
IdeogramBrowser is a platform independent Java application providing an interactive karyotypic representation which shows gain/loss or LOH markers together with genes. The karyogram and the genes are directly generated from local copies of the two NCBI mapview database files
Individual global thresholds can be chosen for each marker type. Multiple CNT files can be merged into a single sample which is required e.g. for the 100k split files. Two alternatives for merging two or more profiles can be chosen: resolution enhancement for interlocked SNP chips or reliability enhancement (differential mode). As the number of aberrations (log2ratios) is often too high a marker reduction method was included: only contiguous regions (gain, loss or LOH) of a minimum group size (default three adjacent SNPs of the same polarity) and a minimum sequence length are shown. In addition for GSA and SPA values an upper bound for the corresponding P-value can be defined. Multiple profiles can be shown in the diagram. They may originate from different types of SNP chips (e.g. 100k or 50k) or another ASCII file containing aberration information. Consensus regions can be identified by specifying the minimum number of samples which need to be of a certain state (gain, loss or LOH) in the given locus interval. Karyograms can be exported as JPEG, SVG [World Wide Web Consortium (W3C)] (scalable vector graphics format, which is especially suited for highquality prints), or PDF. Multiple files can be imported at once.
This work is supported by the German Science Foundation, SFB 518, Project C5 (AM, HAK), Project B19 (KH) and the Stifterverband für die Deutsche Wissenschaft (HAK).
Conflict of Interest: none declared.