Abstract

The Homeodomain Resource is a searchable, curated collection of information for the homeodomain protein family. The resource is organized in a compact form and provides user-friendly interfaces for both querying the component databases and assembling customized datasets. The current release (version 5.0, October 2002) contains 1056 full-length homeodomain-containing sequences, 37 experimentally-derived structures, 81 homeodomain interactions, 84 homeodomain DNA-binding sites and 114 homeodomain proteins implicated in human genetic disorders. A new feature of this new release is the inclusion of experimentally-derived protein–protein interaction data for homeodomain family members. All entries are cross-linked for easy retrieval of the original records from source databases. The Homeodomain Resource is freely available through the World Wide Web at http://research.nhgri.nih.gov/homeodomain/ .

Received September 23, 2002; Accepted September 24, 2002

INTRODUCTION

Homeodomain-containing proteins are transcription factors that play a critical role in various cellular processes including body plan specification, pattern formation and cell fate determination during metazoan development ( 1 ). Members of this family are characterized by a helix-turn-helix DNA binding motif known as the homeodomain. X-ray crystallographic and NMR spectroscopic studies on several homeodomain-containing proteins ( 26 ) show that this motif is comprised of three alpha-helices that are folded into a compact globular structure with an N-terminal extension. Helices I and II lie parallel to each other and across from the third helix. This third helix is also referred to as the ‘recognition helix’, as it confers DNA-binding specificity on individual homeodomain proteins. The homeodomain structural motif is highly conserved across species, from yeast to human.

In recent years, a large amount of information has accumulated about members of this protein family, but this information is often not available in an easily accessible format. The Homeodomain Resource directly addresses the need for a comprehensive collection of sequence, structure, genomic and functional information on the homeodomain family. The Resource contains all available full-sequences, domain-only sequence fragments and three-dimensional structures as of October 2002, with updates performed on a continuous basis. The genetic information available through the Resource on this family includes a listing of human diseases in which homeodomain-containing proteins are implicated, cytogenetic map locations and specific mutation data underlying the disease condition. Each reference in this database is rigorously selected to assure non-redundancy, annotated and cross-referenced.

DATABASE DESCRIPTION

The Homeodomain Resource has expanded considerably since its last release ( 7 ). The current version of the database contains 1056 full-length homeodomain protein sequences isolated from 112 different species (Table 1 ). In addition to the full-length sequence data, the homeodomain portion of the sequence is also available in FASTA format. The database can be searched by SWISS-PROT ID, GenBank accession number, gene names, alternative gene names, protein description, sequence and organism name.

A new feature of this release is the inclusion of experimentally-determined protein–protein interaction data for the homeodomain protein family members. To the best of our knowledge, this collection of information is the most comprehensive one available on documented protein–protein interactions in the homeodomain family. Interaction data was collected through manual literature searches, after which essential information about the protein–protein interactions were extracted from the experimental data. Manual abstraction of relevant biological information from PubMed required the use of discriminatory MeSH terms, from specific to more general keyword search combinations. PubMed titles, abstracts and full text were searched for keywords indicative of relevant protein–protein interactions (e.g. DNA-independent interaction). Interacting protein partners and the organism from which they were isolated were catalogued; interacting regions were annotated and cross-linked to SWISS-PROT.

Figure 1 shows a typical search result when searching for specific protein interactions. The output includes the names of the homeodomains participating in the specific interaction, information on their interacting domains, and the actual amino acid residues (numerical range) required for the interaction. Also shown are the keywords used to produce the entry, as well as the primary citation from PubMed. All PubMed citations are hyperlinked back to NCBI Entrez.

TECHNICAL IMPROVEMENTS

In addition to the introduction of a new Web interface for the Homeodomain Resource, a number of back-end technical modifications have been made to improve both data collection and user performance. A series of Perl scripts were written for this release to improve the automated sequence search and update process. These new scripts eliminate many of the manual steps previously required for this process. The scripts automatically searches SWISS-PROT through the Swiss Institute of Bioinformatics' ExPASy SRS server. The search results are first compared to the existing data and a list of new entries is generated. Next, the new entries are examined manually and either added to the database or eliminated as false positives. The search program is run automatically each month to update the sequence database. A number of back-end processes were also modified to facilitate routine maintenance. Comments regarding the Web front-end are welcomed and encouraged, and users are encouraged to inform the authors of any updates or corrections to database information.

Studies utilizing the data within this database should cite this paper, as well as the primary source of any sequence, structural, or related information.

Figure 1. Results from a search against the collection of protein–protein interactions in the Homeodomain Resource. Search results include the names of the two interacting proteins, the region over which the interaction is observed, and the original PubMed citation from which the interaction data was obtained. Information on how these data were collected can be found in the main text.

Figure 1. Results from a search against the collection of protein–protein interactions in the Homeodomain Resource. Search results include the names of the two interacting proteins, the region over which the interaction is observed, and the original PubMed citation from which the interaction data was obtained. Information on how these data were collected can be found in the main text.

Table 1.

Homeodomain Resource statistics

Total sequences available 6669 
Non-redundant full-length sequences 1056 
Genes/gene symbols 889 
Distinct organisms 112 
Three-dimensional structures 37 
Homeodomain proteins implicated in human genetic disorders 114 
Homeodomain DNA-binding sites 84 
Protein–protein interactions 81 
Total sequences available 6669 
Non-redundant full-length sequences 1056 
Genes/gene symbols 889 
Distinct organisms 112 
Three-dimensional structures 37 
Homeodomain proteins implicated in human genetic disorders 114 
Homeodomain DNA-binding sites 84 
Protein–protein interactions 81 

References

1.
Gehring,W.J., Affolter,M. and Burglin,T. (
1994
) Homeodomain proteins.
Annu. Rev. Biochem.
  ,
63
,
487
–526.
2.
Kissinger,C., Liu,B., Martin-Blanco,E., Kornberg,T. and Pabo,C. (
1990
) Crystal structure of an engrailed homeodomain-DNA complex at 2.8 Å resolution: a framework for understanding homeodomain-DNA interactions.
Cell
  ,
63
,
579
–590.
3.
Ceska,T., Lamers,M., Monaci,P., Nicosia,A., Cortese,R. and Suck,D. (
1993
) The X-ray structure of an atypical homeodomain present in the rat liver transcription factor LFB1/HNF1 and implications for DNA binding.
EMBO J.
  ,
12
,
1805
–1810.
4.
Endo,T., Ohta,K., Saito,T., Haraguchi,K., Nakazato,M., Kogai,T. and Onaya,T. (
1994
) Structure of the rat thyroid transcription factor-1 (TTF-1) gene.
Biochem. Biophys. Res. Commun.
  ,
204
,
1358
–1363.
5.
Dekker,N., Cox,M., Boelens,R., Verrijzer,C., van der Vliet,P. and Kaptein,R. (
1993
) Solution structure of the POU-specific DNA-binding domain of Oct-1.
Nature
  ,
362
,
852
–855.
6.
Wolberger,C., Vershon,A.K., Liu,B., Johnson,A.D. and Pabo,C.O. (
1991
) Crystal structure of a MAT alpha 2 homeodomain-operator complex suggests a general model for homeodomain-DNA interactions.
Cell
  ,
67
,
517
–528.
7.
Banerjee-Basu,S., Sink,D.W. and Baxevanis,A.D. (
2001
) The homeodomain resource: sequences, structures, DNA binding sites and genomic information.
Nucleic Acids Res.
  ,
29
,
291
–293.

Comments

0 Comments