The IMGT/HLA database ( http://www.ebi.ac.uk/imgt/hla ) has provided a centralized repository for the sequences of the alleles named by the WHO Nomenclature Committee for Factors of the HLA System for the past four years. Since its initial release the database has grown and is the primary source of information for the study of sequences of the human major histocompatibilty complex. The initial release of the database contained a limited number of tools. As a result of feedback from our users and developments in HLA we have been able to provide new tools and facilities. The HLA sequences have also been extended to include intron sequences and the 3′ and 5′ untranslated regions in the alignments and also the inclusion of new genes such as MICA. The IMGT/MHC database ( http://www.ebi.ac.uk/imgt/mhc ) was released in March 2002 to provide a similar resource for other species. The first release of IMGT/MHC contains the sequences of non-human primates (apes, new and old world monkeys), canines and feline sequences. Further species will be added shortly and the database aims to become the primary source of MHC data for non-human sequences.
Received September 13, 2002; Accepted September 20, 2002
The IMGT/HLA database is a specialist database for the allelic sequences of the genes in the HLA system, the human major histocompatibility complex. This complex of ∼4 megabases is located within the 6p21.3 region of the short arm of human chromosome 6 and contains in excess of 220 genes. Genes included in the HLA nomenclature, and which comprise the HLA system, are those involved in antigen presentation to T cells, or are non-functional genes related to them. The core of the HLA system consists of the many highly polymorphic HLA genes which influence the outcome of cell and organ transplants and for which certain alleles are associated with progression of infectious disease and susceptibility to a wide range of chronic, non-infectious diseases ( 1 , 2 ). Nucleotide sequences for more than 1500 different alleles of these genes have been determined. The naming of new allelic sequences and their quality control is the responsibility of the WHO Nomenclature Committee for Factors of the HLA System ( 3 ). The IMGT/HLA database acts as the repository for these sequences, and is recognized as the primary source of up-to-date and accurate HLA sequences.
The IMGT/MHC database expands the coverage of the major histocompatibility complex to a number of different species. The MHC Class I and II genes are highly conserved between species ( 4 ) and so the expansion of the IMGT/HLA model to other species is a natural development. The system will be used to coordinate the naming and publication of histocompatibility related alleles in non-human species. The curators of the IMGT/MHC database are not responsible for the naming and identification of sequences, rather the IMGT/MHC database provides a coordinated means of making this data available in a single location and in a standardized format.
Both the IMGT/HLA and IMGT/MHC databases are based upon sequences currently found in the EMBL Nucleotide Sequence Database (EMBL) ( 5 ), GenBank ( 6 ) and the DNA DataBank of Japan (DDBJ) ( 7 ). Indeed a requirement of all submissions to IMGT/HLA and IMGT/MHC is that they have been submitted to these more general databases. The main difference between the data held within the different databases is that experts in the relevant species further curate the files included in the IMGT/HLA and IMGT/MHC databases. This additional step allows improvements in data quality as well as the addition of more specialized information. As a result there are often differences in the files kept in the different databases. When that is the case the file in IMGT/HLA or IMGT/MHC should be considered more accurate.
IMGT/HLA ORGANIZATION AND CONTENT
The first public release of the IMGT/HLA Database ( 8 , 9 ) was made on the 16th December 1998 and was included on the European Bioinformatics Institute's (EBI) Server as part of the IMGT project. The database is now updated quarterly and with each release all the tools are updated to include the new sequences and information on all the new and modified sequences is reported. The IMGT/HLA website provides tools for both the retrieval of HLA sequences and also the manipulation and analysis of these sequences. In addition a large amount of background information is held on the source material from which the sequences were derived. The two main uses of the website are to retrieve information on a single allele or to view sequence alignments based on the official HLA sequences, see Figure 1 .
Using the ‘Allele Query Tool’ users can retrieve information about any allele. The tool provides an interface for retrieving the information that is easy to use, requiring either the allele name, a synonym or previous designation, or a single EMBL/GenBank/DDBJ accession number. The output includes all the information provided in the official WHO Nomenclature Reports as well as other information on the individual and cells from which the sequence was derived. The entry also provides links to all EMBL/GenBank/DDBJ entries. References are also provided for all alleles and where possible linked to the abstract of the paper in which the allele is first described, as provided by PubMed. The nucleotide and protein sequences are also provided in the standard format used in the IMGT/HLA database ( http://www.ebi.ac.uk/imgt/hla/nomen_pt2.html ).
The database provides a tool for viewing sequence alignments based on the official HLA sequences. The interface provided lets the user define a number of key variables. The basic steps in selecting an alignment are to choose the loci and the type of sequence. The types of alignment available currently include coding sequences, genomic sequences and amino acid sequences. The remaining options available allow the user to customize the output to include optional reference sequences, sequence display and selection of specific sequences.
The database also provides a range of search tools. Through a collaboration with the EBI, users can search the IMGT/HLA database using the SRS Browser ( 10 ). Sequence similarity searches can also be performed using the BLAST ( 11 ) and FASTA ( 12 ) search tools. It is therefore recommended that for sequence similarity searches the IMGT/HLA database should be used wherever possible. BLAST searches using EMBL or other databases cannot guarantee to match against the official sequences.
Whilst most use of the database is confined to the sequence data, the database also contains additional background information, for example on the cells from which the alleles were characterized. Search and retrieval tools are provided to allow the user to query the data on all the cells from which a particular alleles has been sequenced. The IMGT/HLA database provides several tools to support the work of HLA typing laboratories, for example the characterization of ambiguities in Sequence Based Typing (SBT). Most SBT typing strategies currently employed use the exon 2 and exon 3 sequences for HLA class I analysis and exon 2 alone for HLA class II analysis. Due to the heterozygous nature of the SBT analysis the combinations of many pairs of alleles may give an ambiguous typing result. To aid the interpretation of SBT results, the database provides a comprehensive list of the ambiguous combinations for each locus.
As well as providing HLA sequences for retrieval, the IMGT/HLA website also provides the tools for submitting both new and confirmatory sequences to the WHO HLA Nomenclature Committee. This is now the only accepted method for submitting new sequences to the database. On submission of a sequence it is automatically analysed and annotated and then given a name, before been loaded into the IMGT/HLA database and included in the monthly nomenclature reports ( 13 ). The submission tool can be used for both new and confirmatory sequences, and is capable of holding confidential entries until a set time, thus allowing alleles to be named before publication. The submission of new HLA sequences to the IMGT/HLA database does not replace the submission of these sequences to EMBL/GenBank/DDBJ, as the submission criteria state that the sequences must also have been submitted to these databases.
IMGT/MHC ORGANIZATION AND CONTENT
The IMGT/HLA database provides a working system for study of the allelic sequences of selected genes of the human MHC region. Using this system as a model, a more wide-ranging project has been implemented. MHC sequences of many different species have been reported ( 14 – 16 ), and different nomenclature systems used in the naming and identification of new genes and alleles. The IMGT/MHC project works as a centralized platform for the collation and analysis of these sequences. By bringing the MHC sequences of different species together it is hoped to provide a central resource that will facilitate further research on the MHC of each species and on their comparison.
The first release of the IMGT/MHC database involves the work of research groups specializing in non-human primates, canines and felines. This release includes data from five species of ape, sixteen species of new world monkey, seventeen species of old world monkey, canines and felines. The nomenclature of each species is the responsibility of the species nomenclature committees ( 14 – 16 ). These committees prepare the allele information in a standardized format which allows it to be quickly loaded into the database and then published via the website.
For each species content varies but all these components of the web site have features in common, including taxonomic information and allele distributions. The sequence alignments and nomenclature reports are made available for online searches. The nomenclature reports provide details of the alleles, previous designations, EMBL/GenBank/DDBJ accession numbers and references (where possible linked to PubMed). The alignments follow the same format as that used in the IMGT/HLA database, which can itself be seen as a subset of IMGT/MHC.
The IMGT/MHC system has a standardized interface for viewing the alignments and nomenclature tables. The information available may vary between species depending on the data submitted by the appropriate nomenclature committee. In all species the loci and files available are determined by the species. Once a species is selected the user can select the type of data and locus. The interface is also included on the top of the output, to allow the user to perform multiple queries without returning to the main species index.
The initial release contains a limited number of species and a small number of tools. As the database grows and more species are added, many of the tools present in the IMGT/HLA system will be added to the IMGT/MHC website. The files will also be made available to download from the IMGT/MHC website, ftp server and included into SRS, BLAST and FASTA search engines.
Both databases provide high quality nucleotide and protein sequences with extensive background information. The IMGT/HLA database provides a centralized resource for everybody interested, either centrally or peripherally, in the HLA system. The database and accompanying tools allow the study of HLA alleles from a single site on the World Wide Web. The development of the potentially larger IMGT/MHC database will aid the development of similar projects in other species. The IMGT/MHC project will work as a centralized platform for the collation and analysis of these sequences. The primary use of the database will be to facilitate accurate and in-depth comparison of MHC sequences from different species. In order to achieve this aim the IMGT/MHC curators are already in contact with several other research groups and hope to include further species, such as chickens and horses in the near future.
We would like to acknowledge the support of the following organizations for the IMGT/HLA Database: Dynal, the American Society for Histocompatibility and Immunogenetics (ASHI), the Anthony Nolan Trust (ANT), Biotest, the European Federation of Immunogenetics (EFI), Innogenetics, the National Marrow Donor Program (NMDP), Orchid Diagnostics, Forensic Analytical, Genovision and Pel-Freez.
APPENDIX—ACCESS AND CONTACT
IMGT/HLA Homepage: http://www.ebi.ac.uk/imgt/hla/
IMGT/HLA Submissions: http://www.ebi.ac.uk/imgt/hla/subs/submit.html
IMGT/MHC Homepage: http://www.ebi.ac.uk/imgt/mhc/
Non-human primates: http://www.ebi.ac.uk/imgt/mhc/nhp
1Anthony Nolan Research Institute and Royal Free Hospital, Pond Street, Hampstead, London NW3 2QG, UK 2Department of Haematology, Royal Free Hospital, Pond Street, Hampstead, London NW3 2QG, UK 3Departments of Structural Biology and Microbiology & Immunology, Stanford University, Stanford, CA 94305, USA 4Biomedical Primate Research Centre, Lange Kleiweg 139, Rijswijk, The Netherlands 5Centre for Integrated Genomic Medical Research, University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PT, UK 6EMBL Outstation, The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK