We have developed a comprehensive database (MITOMAP) for the human mitochondrial DNA (mtDNA), the first component of the human genome to be completely sequenced [Anderson et al. (1981) Nature 290, 457–465]. MITOMAP uses the mtDNA sequence as the unifying element for bringing together information on mitochondrial genome structure and function, pathogenic mutations and their clinical characteristics, population associated variation, and gene-gene interactions. As increasingly larger regions of the human genome are sequenced and characterized, the need for integrating such information will grow. Consequently, MITOMAP not only provides a valuable reference for the mitochondrial biologist, it may also provide a model for the development of information storage and retrieval systems for other components of the human genome.
The human mtDNA is a 16 569 nucleotide pair (np) closed, circular molecule located within the cytoplasmic mitochondria ( Fig. 1 ). Each of the several thousand mtDNAs per cell encodes a control region encompassing a replication origin and the promoters, a large (16S) and small (12S) rRNA, 22 tRNAs and 13 polypeptides. All of the mtDNA polypeptides are components of the mitochondrial energy generating pathway, oxidative phosphorylation (OXPHOS), which is functionally essential and evolutionarily constrained ( 3 ).
The maternally inherited mtDNA has a very high mutation rate ( 2 ). This has resulted in a wide variety of pathologic mutations and neutral polymorphisms. MITOMAP attempts to integrate the broad spectrum of available molecular, genetic, functional and clinical information into a single unified entity which can be queried from a variety of different perspectives.
MITOMAP is currently implemented using the Sybase relational database management software. It is both a self-contained information system for the mitochondrial biologist and also provides the mtDNA map and clinical information resources for the Genome DataBase (GDB, Johns Hopkins University) of the international Human Genome Organization (HUGO) and for On-Line Mendelian Inheritance of Man (OMIM, Johns Hopkins University and the National Library of Medicine).
MITOMAP is divided into five interrelated elements: the ‘standard’ mtDNA sequence ( 1 ), the functional genetic element data set, the clinical mutation data set, the population variation data set, and the gene-gene interaction data set ( Fig. 2 and Table 1 ).
The functional genetic element data set provides the genomic location of the known functional domains of the mtDNA, defined by nucleotide position. It also provides information on the amino acid sequence of proteins, structure of RNAs and sequences of the regulatory elements.
The clinical mutation data set provides the nucleotide positions and base changes of the over 50 base substitutions that have been associated with disease. It also encompasses information on over 100 mtDNA rearrangements that have been characterized, including nucleotide positions of breakpoint junctions and sequences of associated repeat elements. The clinical characteristics associated with the mutations are accessible both through associated data sets of MITOMAP as well as through linkage to OMIM.
The population variation data set provides access to known polymorphic sites. These include restriction site polymorphisms, small insertion-deletion variants, and identified sequence changes. The population associations of these variants are provided through available information of mtDNA haplotypes and the continental distributions and population frequencies of the more informative markers.
The gene-gene interactions data set catalogs known information on the polypeptide associations within the OXPHOS enzymes. It also provides information on nuclear genes which impinge on mtDNA structure and function.
The mitochondrial database is available to the general public through the World Wide Web ( http://www.gen.emory.edu/mitomap.html ). The interface provides both browsing and querying capabilities. Users can browse through the database in its published flat file format [adapted from ( 3 )]. A query interface is provided to perform searches on specific aspects of the mitochondrial genome. For example, the database can be used to answer questions such as ‘Are there any polymorphisms or mutations within a particular gene, and if so, have they been reported to cause disease?’
Data is taken from published works on the mitochondrial genome. The committee (see ref. 3 ) regularly searches the literature for new publications. The database is updated as new data is obtained, flat files are generated daily from the database, and queries are performed on-the-fly for the most up-to-date results. Submissions should be send to Dr Douglas C. Wallace, Attn: Mitochondrial Genome Committee, Department of Genetics and Molecular Medicine, Emory University, 1462 Clifton Rd. NE, Atlanta, GA 30322, USA or firstname.lastname@example.org .
This work was supported by a grant from the Emory-Georgia Tech Center for Biotechnology (SBN, MDB and DCW) and by NIH grants GM46915, NS21328, HL30164, NS30164, AG10130, DK45215 and a Muscular Dystrophy Foundation clinical research grant (DCW).