Abstract

Mayday is a workbench for visualization, analysis and storage of microarray data. It features a graphical user interface and supports the development and integration of existing and new analysis methods. Besides the infrastructural core functionality, Mayday offers a variety of plug-ins, such as various interactive viewers, a connection to the R statistical environment, a connection to SQL-based databases and different data mining methods, including WEKA-library based methods for classification and various clustering methods. In addition, so-called meta information objects are provided for annotation of the microarray data allowing integration of data from different sources, which is a feature that, for instance, is employed in the enhanced heatmap visualization.

Contact:nieselt@informatik.uni-tuebingen.de

Supplementary information: The software and more detailed information including screenshots and a user guide as well as test data can be found on the Mayday home page . The core is published under the GPL (GNU Public License) and the associated plug-ins under the LGPL (Lesser GNU Public License).

1 INTRODUCTION

In recent years the genome-wide measurement of gene expression profiles via DNA microarrays has become a widely used tool in the field of biological and medical research. The huge plethora of data generated by microarray experiments raised the challenge to store, manage, analyze and visualize them. A great number of various, but specific evaluation procedures have been already developed to address different biological questions. Programs such as Base (Saal et al., 2002), Expression Profiler (Kapushesky et al., 2004), Gepas (Vaquerizas et al., 2005) and many others focus on various aspects of the microarray analysis. To make these methods commonly accessible a number of tools have been implemented and integrated in either commercial applications like GeneSpring [SiliconGenetics (2005), ] or Spotfire DecisionSite [Spotfire (2005), ] or freely available packages like Bioconductor (Gentleman et al., 2004) or TIGR TM4 (Saeed et al., 2003). Most state-of-the-art methods are developed by the academic community conducting and analyzing DNA microarray experiments. Open source based solutions are appealing because they allow a fast implementation of these methods and integration in existing applications by the academic community. This is one reason for the growing importance of Bioconductor, which relies on R, the freely available environment for statistical computing [R Development Core Team (2005), ]. We believe that the microarray community will benefit from a software that supports the development and integration of current and new methods within a user-friendly GUI-based software environment. To our knowledge, there is no freely available tool that bridges the gap between the development process and application of new methods. Encouraged by the success of highly flexible component-based applications during the last years we started the development of Mayday, a plug-in based microarray data analysis workbench as a graphical software environment for the analysis of microarray data and the development of new methods and algorithms for this purpose.

2 FEATURES

Mayday offers an interactive graphical user interface that provides access to all features of the software. The architecture of Mayday strictly follows a plug-in-based operational model. It consists of a light-weight core providing infrastructural functions such as central data structures and a plug-in manager. Mayday is implemented in Java in order to be compatible with the heterogeneous environments often found in research laboratories. Thus, Mayday is deployable on the three major operating systems Windows, Linux and MacOS. Both the core and the plug-ins are easily distributed and deployed as Java Archive (JAR) files.

The core provides the main data structures of Mayday to store expression data and basic project management facilities. Furthermore, it provides the possibility to annotate the expression data with so-called meta information objects. These are data objects that can represent almost any information type, such as textual (e.g. from databases) and numerical data (e.g. from statistical analyses). The plug-in manager scans for and loads plug-ins to make them accessible to the user. Plug-ins are implemented according to a well-defined and straightforward application programming interface (API).

In order to address the core tasks that arise during evaluation of microarray data we have focused our efforts on the development of plug-ins that can be assigned to four classes: data management, visualization, statistical analysis and data mining, such as classification and clustering of microarray data (Fig. 1).

Fig. 1

Different data viewers provided by Mayday. Synthetic data for purpose of demonstration was generated and analyzed with various methods implemented in Mayday. Screenshots of the main access window and context menu (top left) and different viewers: enhanced heatmap (left bottom), multi-dimensional plot using R (top center), circular dendrogram of an unweighted pair group method using arithmetic averages (Sokal and Michener, 1958)} derived clustering of the genes (top right), multi-profile plot of a k-means clustering colored using a self-organizing map (Kohonen, 1997) clustering (bottom right).

Fig. 1

Different data viewers provided by Mayday. Synthetic data for purpose of demonstration was generated and analyzed with various methods implemented in Mayday. Screenshots of the main access window and context menu (top left) and different viewers: enhanced heatmap (left bottom), multi-dimensional plot using R (top center), circular dendrogram of an unweighted pair group method using arithmetic averages (Sokal and Michener, 1958)} derived clustering of the genes (top right), multi-profile plot of a k-means clustering colored using a self-organizing map (Kohonen, 1997) clustering (bottom right).

To support data management we have implemented a database plug-in that realizes a connection to an SQL-compliant object-relational database-backend, for instance to a PostgreSQL server [PostgreSQL (2005), ]. We also provide database scripts, which create and setup a database for gene expression data. The schema of this database is derived from the OMG (Object Management Group)-certified MAGE (MicroArray and GeneExpression) Object Model proposal [OMG (2003), ]. It is possible to import preprocessed ImaGene™ and Affymetrix files as well as plain text files into the database using this plug-in. Also the database can be queried for expression data fulfilling user-defined conditions. Data can be loaded from the database into Mayday and vice versa.

We have developed a set of data mining plug-ins that contains classification methods from the WEKA library (Witten and Frank, 2005) as well as commonly used clustering methods like k-means and Self Organizing Maps. This set also provides phylogenetic clustering methods adapted for gene expression data. These plug-ins share a collection of different distance measures that can also be used for the development and integration of additional clustering algorithms.

Statistical analyses can be performed using the plug-in that connects Mayday to the R environment. Data from Mayday is passed directly to R scripts that are called from within Mayday using a graphical user interface. The results are returned to Mayday and are available for further analysis with other plug-ins. It is possible to exchange both gene expression data and meta information objects between R and Mayday. The adaptation of existing R scripts will be straightforward in most cases.

A collection of several interactive data viewers has been created to support the user during data exploration and hypothesis generation. Currently there is a tabular view, an enhanced heatmap visualization (Gehlenborg et al., 2005) that employs meta information objects to integrate additional information into the visualization, a profile plot, a box plot, as well as a multi-profile and a multi-box plot. The multi-profile and the multi-box plot can be used to easily visualize clustered gene expression data in the context of the whole dataset. In addition, a tree viewer exists for gene expression data that has been analyzed using phylogenetic clustering.

The interactive features of the viewers include but are not limited to zooming, selecting genes that will be highlighted across all viewers, rearranging the order in which genes are displayed and querying web databases for information about selected genes. Furthermore, all views can be exported to a range of graphics formats including JPEG and PNG as well as vector-based Scalable Vector Graphics (SVG).

Finally, there is a number of additional plug-ins available, which, for instance, implement conditional filters, compute descriptive statistics and import annotation data as meta information objects for use with the enhanced heatmap and other plug-ins.

3 CONCLUSION

The straightforward API and the plug-ins are the most important aspect of our strategy to combine the needs of the algorithm developer and the application user in one single application. The R plug-in plays a special role in this effort since it allows fast prototyping. We expect that this will speed-up the process of bringing new analysis methods to the user and make the transition from experimental software to stable, user-friendly software more efficient. Owing to its versatility and usability Mayday fills a niche in the field of open source microarray data analysis software and will hopefully prove fruitful for the community.

The authors thank Matthias Zschunke, Stephan Symons and Markus Riester for the development of several Mayday plug-ins. Some parts of Mayday were made possible by a Karl-Steinbuch-Scholarship to NG sponsored by the MFG Foundation Baden-Württemberg. J.D. and K.N. were supported by the Deutsche Forschungsgemeinschaft, Bioinformatics Initiative AZ BIZ 1/1-3.

Conflict of Interest: none declared.

REFERENCES

Gehlenborg
N.
, et al.  . 
A framework for visualization of microarray data and integrated meta information
Information Visualization
 , 
2005
, vol. 
4
 (pg. 
164
-
175
)
Gentleman
R.C.
, et al.  . 
Bioconductor: open software development for computational biology and bioinformatics
Genome Biol.
 , 
2004
, vol. 
5
 pg. 
R80
 
Kapushesky
M.
, et al.  . 
Expression Profiler: next generation-an online platform for analysis of microarray data
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
W465
-
W470
)
Kohonen
T.
Self-Organizing Maps
 , 
1997
New York
Springer
OMG
Gene Expression Specification Version 1.1
2003
PostgreSQL
PostgreSQL - SQL-compliant, open source object-relational database management system
2005
R Development Core Team
R: A Language and Environment For Statistical Computing
 , 
2005
Vienna, Austria
R Foundation for Statistical Computing
Saal
L.H.
, et al.  . 
BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data
Genome Biol.
 , 
2002
, vol. 
3
  
SOFTWARE0003
Saeed
A.I.
, et al.  . 
TM4: a free, open-source system for microarray data management and analysis
Biotechniques
 , 
2003
, vol. 
34
 (pg. 
374
-
378
)
SiliconGenetics
GeneSpring 7.2.
2005
Sokal
R.R.
Michener
C.D.
A statistical method for evaluating systematic relationships
Univ. Kans. Sci. Bull.
 , 
1958
, vol. 
38
 (pg. 
1409
-
1438
)
Spotfire
Spotfire DecisionSite for Functional Genomics.
2005
Vaquerizas
J.M.
, et al.  . 
GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data
Nucleic Acid Res.
 , 
2005
, vol. 
33
 (pg. 
W616
-
W620
)
Witten
I.H.
Frank
E.
Data Mining: Practical Machine Learning Tools and Techniques
 , 
2005
San Francisco
Morgan Kaufmann

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Associate Editor: Joaquin Dopazo

Comments

0 Comments