The IUPHAR database is an established online reference resource for several important classes of human drug targets and related proteins. As well as providing recommended nomenclature, the database integrates information on the chemical, genetic, functional and pathophysiological properties of receptors and ion channels, curated and peer-reviewed from the biomedical literature by a network of experts. The database now includes information on 616 gene products from four superfamilies in human and rodent model organisms: G protein-coupled receptors, voltage- and ligand-gated ion channels and, in a recent update, 49 nuclear hormone receptors (NHRs). New data types for NHRs include details on co-regulators, DNA binding motifs, target genes and 3D structures. Other recent developments include curation of the chemical structures of approximately 2000 ligand molecules, providing electronic descriptors, identifiers, link-outs and calculated molecular properties, all available via enhanced ligand pages. The interface now provides intelligent tools for the visualization and exploration of ligand structure-activity relationships and the structural diversity of compounds active at each target. The database is freely available at http://www.iuphar-db.org .
Complementary to the familiar large-scale data banks holding the outputs of genomics research are many smaller, highly focused databases, whose remit is the comprehensive curation of selected subsets of genes through annotation of historic and emerging biological literature. Such resources are often created to address the specific needs of particular research fields and normally rely on the voluntary efforts of interested experts and a community of scientists. These resources can provide distilled data tailored to the requirements of the end user. The need for such facilities will only intensify as the amount and complexity of scientific data increases.
In 2009 we reported the development of an expert-curated database ( 1 ) from the International Union of Basic and Clinical Pharmacology Committee on Receptor Nomenclature and Drug Classification (NC-IUPHAR, http://www.iuphar.org/nciuphar.html ). The IUPHAR database (IUPHAR-DB, http://www.iuphar-db.org ) displays detailed information on the structure, function, anatomy, pathophysiology and pharmacology of G protein-coupled receptors (GPCRs), voltage-gated ion channels (VGICs) and ligand-gated ion channels (LGICs). Members of these protein superfamilies correspond to the biological targets of at least one-third of licensed medicinal drugs ( 2 ).
The database is a result of the voluntary efforts of over 700 international expert scientists coordinated by NC-IUPHAR through its network of subcommittees. The content and curation model of IUPHAR-DB have already been described ( 1 ); here we will focus on advances in the construction, content and user interface. We announce the addition of a fourth class of important drug targets and transcription regulators, the nuclear hormone receptors (NHRs). Furthermore, we provide details on improvement to the curation process for chemical compounds, which has resulted in the first rigorous definition of chemical entities in IUPHAR-DB. We also introduce a range of new features that enhance usability of the database, including redesigned ligand pages and intelligent tools for structure-based exploration of ligand–target interactions. We hope these will allow interested chemists and pharmacologists to rapidly visualize and appreciate the diversity of ligands, their chemical profiles and the structure activity relationships represented in the data.
In its current release the database contains information on proteins encoded by 356 GPCR genes, 141 VGIC subunit genes, 52 LGIC subunit genes and 49 NHR genes in human, rat and mouse. The database documents 6533 interactions between these proteins and 2715 distinct ligand molecules. All data included in IUPHAR-DB are curated from the primary literature by expert subcommittees or individual experts appointed by NC-IUPHAR or by in-house curators and then reviewed by external experts. The database currently references 11 332 unique publications, 99% of which are linked to PubMed ( http://www.ncbi.nlm.nih.gov/pubmed ). Supplementary Figure S1 provides charts describing the content of gene families in IUPHAR-DB.
Nuclear hormone receptors
Recently, the database has expanded to provide the first comprehensive annotation of the literature on the pharmacology of 49 human NHR gene products and their rodent orthologues, together with relevant information on their structure and function. The information in IUPHAR-DB complements that available in other online resources such as the Nuclear Receptor Signaling Atlas (NURSA) ( 3 ), an online resource with curated information and experimental data for receptors, ligands and co-regulators and NureXbase ( http://nurexbase.prabi.fr ), a multi-species genomic database with structural information for NHRs, complexes and endocrine disruptors.
Unlike the other receptors in IUPHAR-DB, which are cell surface proteins that transduce extracellular signals by modulating intracellular concentrations of second messengers or ions, NHRs are intracellular, non-transmembrane proteins with distinct structural and functional properties ( 4 ). NHRs are transcriptional regulators that exert their actions by binding to sequence-specific promoter elements on target genes. NHR ligands include steroid and thyroid hormones, metabolites and xenobiotics; many NHRs are ‘orphan’ receptors, for which the endogenous ligands are still unknown or may not exist.
IUPHAR-DB provides details of recommended nomenclature, synonyms used in the literature, structure, genomic location, physiological function, tissue distribution, functional assays, biologically significant alternatively spliced and RNA-edited variants, the clinical-relevance of polymorphisms and phenotypes resulting from alterations in gene-expression. Pharmacological information includes the affinities of selected agonists and antagonists, such as approved drugs, commonly used experimental compounds and endogenous ligands. For NHRs, additional ‘custom’ information is provided, including details of consensus DNA response elements, co-binding partners required for DNA binding, co-repressor and co-activator proteins and the main target genes regulated by NHRs. Each NHR in IUPHAR-DB is hyperlinked to the corresponding entries in other relevant databases, including several of the NCBI Entrez databases ( 5 ), the human, rat and mouse genome databases [HGNC ( 6 ), MGI ( 7 ) and RGD ( 8 )] and NURSA. Links to other databases containing information on drug targets and their ligands are provided on a ‘Useful Links’ page, which is regularly updated.
Structural information about receptors and channels has been enhanced by the addition of links to experimentally solved protein structures in the RCSB Protein Data Bank (RCSB PDB) ( 9 ). There is extensive three dimensional (3D) structural information available on NHRs, so for each receptor we have selected at least three representative structures, including, where available, an ‘apo’ structure (protein with no ligand bound), a protein bound to an agonist and a protein bound to an antagonist. Details have been added to the database for the few GPCR, VGIC and LGIC structures that are known. Database pages also include dynamic links to complete lists of structures available in the RCSB PDB for each protein and from IUPHAR-DB ligands to PDB ligand pages.
Ligand classification and structural information
Previously, information on ligand molecules in IUPHAR-DB was limited to common names and synonyms and PubChem compound identifiers (CIDs) ( 10 ). In 2009 we began an initiative to curate and classify all ligand molecules in IUPHAR-DB, incorporating structural specification, computational molecular descriptors and other information. Rigorous definition of chemical compounds brings many advantages to both database developers and users; for instance knowledge of chemical structure reduces unintended duplication where multiple publications refer to the same compound under different synonyms or fail to provide the chemical structure. It is now possible to unambiguously identify compounds in IUPHAR-DB and use molecular descriptors to search for them in other databases. We have extended the functionality of our in-house curation tools to enable straightforward curation of chemical structures.
During the chemical curation process we manually checked each compound against the primary literature to ensure that chemical structures were attributed to the correct compound names, removed over 200 duplicates and associated each compound with PubChem CIDs where possible. In all cases we endeavoured to provide correct isomeric forms; however, we found that stereomeric information is often omitted in the pharmacological literature. Consequently, we were obliged to use the non-isomeric representations for all cases where enantiomers or diastereomers are not specified in the primary assay used; this reflects some of the unavoidable historical ambiguity seen in pharmacological data.
About two-thirds of the IUPHAR-DB compounds were unambiguously identified in the PubChem compounds database. Utilizing the PubChem download service we were able to retrieve 1944 chemical structures in simplified molecular input line entry specification (SMILES) format ( http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html ). To enable the full and unambiguous identification of IUPHAR compounds, each molecular entity is represented by its common name, a list of synonyms, a systematic name, unique chemical identifiers and several structural descriptors and formats. The Open Babel software package ( http://openbabel.org/wiki/Main_Page ) was used to unify the format of the retrieved SMILES strings and create various structural descriptors. Generated descriptors include two types of SMILES descriptor, one containing isomeric specification of chiral compounds and a second without chiral or isotopic specification, IUPAC International Chemical Identifiers (InChI) and InChI Keys ( http://www.iupac.org/inchi ). Systematic IUPAC names and two-dimensional (2D) images for use on the IUPHAR-DB website were generated using the NCI/CADD Chemical Identifier Resolver from the National Cancer Institute ( http://cactus.nci.nih.gov/chemical/structure ). Various physicochemical properties were calculated using the Chemistry Development Kit (CDK) ( 11 ), including the five Lipinski ‘drug-likeness’ measures ( 12 ): polar surface area (TPSA) ( 13 ), predicted LogP (XLogP) ( 14 ), molecular weight, number of hydrogen bond donors and acceptors. The number of rotatable bonds is also included because this can be a useful indication of the compound flexibility and complexity.
The final set of 2715 distinct compounds were then classified into the following classes: synthetic organic-based compounds, inorganic molecules (e.g. ions), simple naturally occurring bioactive compounds, natural products, peptides (both synthetic and naturally occurring) and an ‘other’ class for the few compounds that do not fall into any of the former categories. Supplementary Figure S1 describes the distribution of IUPHAR-DB ligands into their classes.
Enhanced ligand pages
Each ligand entry in IUPHAR-DB has a dedicated data page displaying information about its structural properties and biological activity ( Figure 1 ). Where available, an image of the ligand’s 2D structure and physicochemical properties are displayed. A series of tabs provide access to further information. A summary tab includes the ligand’s classification, whether it is known to be an approved drug, the systematic name and other synonyms. To help users find further information on licensed drugs and experimentally used compounds we provide links to databases such as DrugBank ( 15 ), the Pharmacogenomics Knowledge Base (PharmGKB) ( 16 ) and Chemical Entities of Biological Interest (ChEBI) ( 17 ).
A second tab contains information on the ligand’s biological activity with tables listing its selectivity at receptors in IUPHAR-DB and links to the relevant receptor pages. A third tab displays a reference list and a fourth tab provides descriptors and download options for the chemical structure. Another tab provides a display of any structurally similar compounds that exist in the database, with lists of their receptor targets and links to the relevant data pages. A useful feature is the ability to quickly launch a chemical editor (by clicking on the ligand’s image or following a link under the ‘Similar compounds’ tab), allowing the structure to be modified and used for structure-based searching of the database (see below for details). Pharmacological data tables on receptor pages have also been enhanced with structural information about ligands, which helps users to visualize the types of compounds that interact with specific receptors.
New search tools
In addition to text-based searches of the database content the search tools now allow receptor and ligand retrieval by external identifier, such as by protein accession or DrugBank identifier. Moreover, users are now able to navigate ligand structure space by drawing a chemical structure or pasting a SMILES string into a chemical editor [ChemAxon’s MarvinSketch ( https://www.chemaxon.com/marvin/sketch/index.php ) Java applet] and performing a structure-based query of the database. The available search methods are substructure, exact, similarity and SMARTS-pattern ( http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html ) matching. Structure-based searches are powered by Dotmatics’ Pinpoint software ( http://www.dotmatics.com/products_pinpoint.jsp ). On the results page users see the structures of matched compounds and a quick overview of their receptor targets, with links to ligand and receptor pages and options to refine the search such as limiting by molecular weight or polar surface area ( Figure 2 ).
While we have obtained chemical structures for the majority of small molecule ligands in IUPHAR-DB, structural information for a majority of the approximately 900 endogenous and synthetically derived peptide ligands is lacking. Furthermore, the implemented structure-based search methods are not appropriate for oligopeptides, for which sequence-based methods may be preferable. In a future database release we aim to provide primary sequences and information on the post-translational modification of endogenous peptides and also the amino acid sequences and chemical modifications of synthetic peptides. We also plan to expand the coverage of IUPHAR-DB to include further protein families and new data types.
For a general citation of the resource we recommend citing this article. For citing specific receptor pages we recommend a format similar to the following example: A. P. Davenport, E. J. Mead. Kisspeptin receptor. Last modified on <date>. Accessed on <date>. IUPHAR database (IUPHAR-DB), http://www.iuphar-db.org/DATABASE/FamilyMenuForward?familyId=34 .
Supplementary Data are available at NAR Online.
British Pharmacological Society, Abbott, GlaxoSmithKline, Incyte, Millipore, Novartis, Servier, UNESCO and Wyeth. Funding for open access charge: Waived by Oxford University Press.
Conflict of interest statement . None declared.
The authors thank Daniel Ormsby at Dotmatics Limited for the Pinpoint license. The authors are grateful to Peter Buneman and Heiko Mueller at the University of Edinburgh for database expertise. The authors especially thank all contributors and members of NC-IUPHAR and its subcommittees for their ongoing support. The authors also acknowledge the support of the British Heart Foundation Centre of Research Excellence Award (RE/08/001). International Union of Basic and Clinical Pharmacology Committee on Receptor Nomenclature and Drug Classification (NC-IUPHAR) members: T.I. Bonner, W.A. Catterall, A.P. Davenport, P. Delagrange, C.T. Dollery, S. Duckles, S. Enna, S.M. Foord, P. Germain, A.J. Harmar, V. Laudet, G. Milligan, R.R. Neubig, E.H. Ohlstein, J. Peters, J.P. Pin, U. Ruegg, D.B. Searls, M. Spedding and M.W. Wright.