OlfactionBase: a repository to explore odors, odorants, olfactory receptors and odorant–receptor interactions

Abstract Olfaction is a multi-stage process that initiates with the odorants entering the nose and terminates with the brain recognizing the odor associated with the odorant. In a very intricate way, the process incorporates various components functioning together and in synchronization. OlfactionBase is a free, open-access web server that aims to bring together knowledge about many aspects of the olfaction mechanism in one place. OlfactionBase contains detailed information of components like odors, odorants, and odorless compounds with physicochemical and ADMET properties, olfactory receptors (ORs), odorant- and pheromone binding proteins, OR-odorant interactions in Human and Mus musculus. The dynamic, user-friendly interface of the resource facilitates exploration of different entities: finding chemical compounds having desired odor, finding odorants associated with OR, associating chemical features with odor and OR, finding sequence information of ORs and related proteins. Finally, the data in OlfactionBase on odors, odorants, olfactory receptors, human and mouse OR-odorant pairs, and other associated proteins could aid in the inference and improved understanding of odor perception, which might provide new insights into the mechanism underlying olfaction. The OlfactionBase is available at https://bioserver.iiita.ac.in/olfactionbase/.

The perception of neural signals produces a representation known as 'smell/ odor,' which is semantically represented by various perceptual descriptors such as fruity, rose, woody, etc.
The challenge associated with the olfaction process is daunting because odors are insubstantial, have a complex molecular basis, and are perceived individually (35)(36)(37)(38)(39). Odors associated with any substance (flower, plant, etc.) are made up of various odorants, some of which are significant contributors, and others are minor contributors. For example, the main constituent of rose smell is (−)-cis rose oxide, while the minor constituent is beta-damascenone, farnesol, geraniol, etc. The relationship between odorant chemistry and odor perception is the core of olfaction research. Since an odorant can have several odors (eugenol methyl ether associated with 27 odor perceptions), two structurally different chemicals can have virtually the same odor profile (cis-3-hexenol, nonadienal, ligustral exhibits green odor). A slight structural difference can result in distinct odors (carvone enantiomers, (R)-(−)-carvone (spearmint odor) and (S)-(+)-carvone (caraway odor)); this complex relationship is largely unknown until today. Odors are further encoded using a combinatorial approach, in which structurally identical odorants bind to completely different but overlapping olfactory receptors (ORs), hence significantly increasing the problem's complexity (40,41). There is no adequate scientific explanation for how smell is interpreted, particularly in humans. Although some pieces of the puzzle have been discovered in certain species, we still lack a comprehensive understanding of the phenomenon (42,43).
Several databases for odors, chemical compounds, and ORs have been published, but they are still limited in scope and concentrate on specific aspects (Table 1). Some resources focus solely on odorant compounds, others on odorant−OR pairs, and others on odors alone. As a result, these databases are fine but only useful for specific purposes, necessitating creating a robust website with all knowledge available at a single click. Also, there is no database available for odorless compounds. In this study, we created Ol-factionBase with the aim of integrating multidimensional aspects of significant components involved in the olfaction mechanism, i.e., odors, chemicals (odorants and odorless), ORs, odorant−ORs interaction, and other associated proteins (odorant-binding proteins (OBPs), pheromone binding proteins (PBPs), and chemosensory proteins). Olfac-tionBase is a manually curated comprehensive database collating information from various resources, comprised of 106 primary odors, 572 subodor types, 3985 odorant molecules, 1124 odorless compounds, 2067 ORs (human and Mus musculus) compiled from various sources. OlfactionBase stores 874 (408 Human and 466 mouse) odorant−OR interaction information, thereby provides a platform for comparative analysis and quantitative structure odor relationship (QSOR) studies. All the data was manually compiled and extracted from the literature and database searches.
One feature that sets OlfactionBase apart from similar resources is that it presents information in a hierarchy of primary odors, sub-odors, odorants with their odor profile, interacting ORs, and chemical profile, including physicochemical, functional groups, ADMET (absorption, distribution, metabolism, excretion, toxicity) properties. The hi-erarchy is represented dynamically in the form of Olfaction-Wheel, which offers an integrated platform to explore odorspace, starting from the aroma-wheels suggested by different researchers and terminating at chemicals associated with an odor along with their interacting ORs and chemical information. Hence, OlfactionBase combines different dimensions of odorants constituting the 'odor space', 'interaction space' and 'properties space'. Similarly, the general profile of ORs includes sequence, family, organism, length, and interacting odorants. OlfactionBase also houses 2871 entries related to odorant/pheromone binding and chemosensory proteins. OlfactionBase offers a robust dataset backed by an innovative visualization, userfriendly interface, and inter-linked search options for exploring components related to olfaction research at a single platform (Figure 1). OlfactionBase thus paves the way for a better understanding of odor perception due to the dynamic interplay between odors, odorant molecules and their properties with olfactory receptors interaction information, leading to future research directions for scientists.

DATABASE OVERVIEW
OlfactionBase is a repository with extensive coverage of 5109 chemicals, 2067 ORs, 874 OR-odorant pairs, 106 primary odors and 572 subodors. The chemicals data comprises 3985 odorant and 1124 odorless compounds. Further, the chemical molecules are classified into 30 functional groups and mapped to 572 subodors. OlfactionBase houses the information of ORs from two organisms, i.e., human (851) and Mus musculus (1215). It also lists 2871 OBP/ PBP protein information from 190 species.
OlfactionBase has a simple, user-friendly, and intuitive interface for querying and browsing odors, chemicals, and ORs. Interactive data visualizations such as the Olfaction wheel and interlinked textual and drawing-based (for chemical structure) search options are provided to retrieve relevant information. The Olfaction Wheel allows the user to interactively browse, backward and forward, through the odor classifications to access corresponding odorant molecules and subsequently obtain details of the odorant's chemical profile and interacting ORs. Thus, OlfactionBase provides a broad spectrum of information facilitating insights into the olfaction research through dynamic interface and visualizations.

DATA COMPILATION
The aim of developing OlfactionBase was to organize all information related to olfaction machinery under one umbrella. A list of odors and sub-odors was created from nine classification systems. The overlapping information between classification systems was manually examined and classified into 106 primary odors and 572 subodors.
Each chemical molecule was mapped to 30 functional groups. We provided the standard CAS (Chemical Abstract Service) for each chemical mapped to their corresponding Pubchem and ZINC IDs. Since CAS numbers are degenerate and often point to multiple molecules, PubChem IDs were used as the unique primary key for every chemical. Using PubChem Id, compounds identi-  (56), and OlfactionDB 57), followed by redundancy removal. Further ADMET, few physicochemical properties and Drug likeness for all 5109 chemicals were obtained from admetsar2 web server (58). The information of human and mouse ORs was collected from UniProt, HORDE, ORDB database. We provide its UniProt ID, GenBank Accession number, ORDB, and HORDE identifier, and links to respective databases for each receptor. Using UniProt IDs, receptor information such as length, family, subfamily, chromosome number, protein, and nucleotide sequences were obtained. The other proteins related to olfaction are collected using keyword search ('odorant-binding protein', chemosensory protein, and pheromone binding protein) in the UniProt database. The OR-odorant interaction information was collected from an extensive literature survey  and database search. The supporting literature evidence from PubMed for odorant-OR pairs was provided for each entry. Figure  2 refers to the statistics of entities of OlfactionBase.

DATABASE ARCHITECTURE AND WEB INTERFACE
OlfactionBase is a free, open-access web server powered by MySQL relational database management system. For better performance, data is kept in interrelated tables logically ( Figure 3). All the tables are optimized to serve their purpose. The webserver has been built using the PHP Laravel framework, to deploy dynamic pages onto the server. Two JavaScript libraries (D3.js and Vue.js) have been used to develop the Olfaction Wheel and inner graphs. Other applications like composer, git, CSS3 and HTML were also used to create the entire web application of OlfactionBase.
OlfactionBase provides a user-friendly web interface to browse odorants, odors, olfactory receptors, and odorant-OR pairs. On the 'Home' page, users can find a summary of OlfactionBase, including the total number of OR-odorant pairs with evidence for human and M. musculus, ORs, chemicals and other proteins. 'Odors' page lists out all primary and subodors along with their CAS No. of odorants in a tabular format ( Figure 4A). Users can select primary odor and subodor information from the dropdown menus to obtain a list of related compounds. 'Chemical page list out odorants and odorless compounds in separate tables. General information related to chemicals like Common name, SMILES, CAS No., molecular weight, molecular formula, PubChem ID, ZINC ID, number of interacting ORs and number of odors is given in a tabular format ( Figure 4B). By clicking on 'View', users can view detailed information     GenBank accession no., (c) chromosome, (d) family and (e) length of the sequence. On 'OR−odorant pairs' page, users can find information ORs interacting with odorants (Figure 4F). Related evidences for each OR-odorant pair can be viewed by clicking on the 'View' button ( Figure 4G). Users can explore proteins other than ORs related to olfaction on the 'OBP/PBP' page. It summarises the general information about OBPs, PBPs and chemosensory proteins from 190 species ( Figure 4H). User can search for proteins using UniProt ID, organism, and length. All pages are in-terlinked, enabling users to easily navigate the pages to gain more information about the desired entity. Figure 5 shows how to look for the 'fruity' sub-odor and its related odorants, as well as their interacting ORs.

OLFACTION WHEEL
Various industry-dependent odor classification systems are developed for conveniently learning and remembering the odors in the form of circular diagrams/wheels. These visual  (99) and Wine (100). Odors verbatim not covered in the nine classification systems were added as 'Other.' The 106 primary odors were aggregated in alphabetic order, connected to 572 sub-odors, 30 functional groups and 3985 odorants.
The olfaction wheel comprises four concentric circles (Figure 6), the innermost circle is derived from compiling information from 10 aroma classification systems. The second and third circles list the primary and subodors and the relation between them. The fourth circle comprises odorants and is linked to sub-odors associated with them. The first three circles (classification system, primary odors, subodors) appear on the Olfaction Wheel homepage, while the fourth circle is visible in the subgraph, which appears on navigating deeper into the graph. All nodes are interlinked, making the graph's exploration (forward and backward) easy to understand visually and conceptually. A single click on a node explores the relationship/ connectivity between the node in question, its parents, and child nodes and helps navigate the wheel (forward and backward). Double click on any node lands on the page containing detailed information related to the node.

COMPARISON WITH ODORACTOR
OdoRactor (56) and OlfactionBase appear to have overlapping information. OdoRactor is a web server that predicts the ORs for small-molecule compounds by combining two separate techniques; characterising an odorant and then predicting its candidate ORs. However, OlfactionBase is an open-access database that brings together experimental knowledge on different aspects related to olfaction mechanism at one place from a variety of sources, as mentioned above in data compilation section. The data statistics in both databases are shown in Table 2.

CONCLUSIONS
The current version of OlfactionBase contains comprehensive information for (a) 3985 odorant compounds, (b) 1124 odorless compounds, (c) 2871 OBP/PBP proteins. It also includes 408 and 466 Odorant-OR interaction pairs for humans and M. musculus, respectively. The odorant−OR  Figure  2). All the data were extracted and annotated from published articles; the majority of data was indeed extracted from the seminal work of Saito et al. (69,83). A typical entry of a chemical compound offers a chemical characterization of the compound, including H-bond donors and acceptors, molecular weight and mass, log P and log K values among 34 physicochemical properties, 31 pharmacokinetic based properties and five drug-likeness violations. A typical entry for a receptor includes UniProt, GenPept, GenBank accession code, name, FASTA sequence (nucleotide and protein), location, family, and organism; an entry for odors includes primary odor classification and a list of compounds possessing a particular subodor. Moreover, direct links to the corresponding databases (UniProt, PubChem, ZINC, HORDE, ORDB, PubMed) entries are given in each of the entry pages. The database's relational scheme, as shown in Figure 3, on which the current database is based, is specifically designed to capture OR−odorant interaction pairs and odorant−odor preferences as stated in scientific literature and databases. OlfactionBase offers several navigation and data retrieval options. For compounds, one can perform (a) CAS ID, (b) molecular weight, c) substructure based, (d) odor and (e) functional group-based searches. Similarly, for ORs, (a) GenBank accession number, (b) organism, (c) chromosome, (d) family and (e) length-based search options are available. One can explore compounds possessing specific odors based on primary and sub-group odors. OlfactionBase is a useful and informative resource for investigating odorsodorants and odorant−OR interactions in one place, and it may be a helpful tool for deciphering olfaction mechanisms. It could be valuable for both academic olfaction research and odorant discovery in the industry.