The COVID-19 Ontology

Abstract Motivation The COVID-19 pandemic has prompted an impressive, worldwide response by the academic community. In order to support text mining approaches as well as data description, linking and harmonization in the context of COVID-19, we have developed an ontology representing major novel coronavirus (SARS-CoV-2) entities. The ontology has a strong scope on chemical entities suited for drug repurposing, as this is a major target of ongoing COVID-19 therapeutic development. Results The ontology comprises 2270 classes of concepts and 38 987 axioms (2622 logical axioms and 2434 declaration axioms). It depicts the roles of molecular and cellular entities in virus-host interactions and in the virus life cycle, as well as a wide spectrum of medical and epidemiological concepts linked to COVID-19. The performance of the ontology has been tested on Medline and the COVID-19 corpus provided by the Allen Institute. Availabilityand implementation COVID-19 Ontology is released under a Creative Commons 4.0 License and shared via https://github.com/covid-19-ontology/covid-19. The ontology is also deposited in BioPortal at https://bioportal.bioontology.org/ontologies/COVID-19. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Since the end of 2019, the COVID-19 pandemic is in the spotlight of the global scientific community. The spectrum of scientific approaches to combat SARS-CoV-2 includes diverse disciplines such as epidemiology modeling (Kucharski et al., 2020, Zhao andChen, 2020), molecular docking and molecular dynamics on supercomputers (Smith and Smith, 2020) and high throughput screening using drug-repurposing libraries (Jin et al., 2020, Ton et al., 2020. One major bottleneck facing current strategies to find candidate compounds for drug repurposing is the limited availability of chemical information in the COVID-19 context. Any serious attempt at identifying new or repurposed drugs against SARS-CoV-2 needs to consider what has been published on the respective drug target and what classes of chemical compounds display activity against viral proteins. Despite that, various publications claim to have identified new candidates for drug repurposing (Fan et al., 2020). The literature on COVID-19 targets and putative target-binding ligands is growing with impressive dynamics.
To facilitate dedicated literature searches on COVID-19 pathophysiology, epidemiology, targets and medical implications, we have generated a prototypical version of a COVID-19 ontology. This ontology is meant to capture and represent the majority of essential entities and concepts relevant for COVID-19 research context. The ontology serves two major purposes: a. As a template to define context in COVID-19 specific text mining approaches. b. As a structured system of concepts and categories that helps to bring order into the COVID-19 knowledge space.
We demonstrate the usability of the COVID-19 Ontology in both cases.
The COVID-19 Ontology was assembled using the Protégé ontology editor. This ontology is constructed based on guidelines and principles defined by Open Biological and Biomedical Ontology (OBO, http://www.obofoundry.org/) foundry as well as aligned with the Basic Formal Ontology (BFO) hierarchy. We applied Ontofox (http://ontofox.hegroup.org) to reuse already existing classes from other relevant ontologies. Terms not predefined in other ontologies are added with proper definitions as well as clear provenance. In order to increase recall in text mining applications, we have added synonyms for each concept. The COVID-19 Ontology is released under a Creative Commons 4.0 License and shared via https://github.com/covid-19-ontology/covid-19. The ontology is also deposited in BioPortal at https://bioportal.bioontol ogy.org/ontologies/COVID-19 and in the COVID-OLS at https:// ols-covid.scaiview.com/ols/ontologies/covid19. The version made available here will be constantly updated. We would like to encourage the community to help us improve the ontology by providing feedback and application examples.

Results
The COVID-19 Ontology comprises 2270 terms, including 2121 terms imported from existing ontologies together with 149 newly defined terms. The ontology focuses on a wide spectrum of domainspecific topics ranging from epidemiology (risk factors, transmission, etc.), via clinical aspects (such as signs and symptoms, diagnostics, medical intervention) and aspects of prevention and control, to clinical trials, genetic and molecular processes (of both human and virus) and signaling pathways.
We are aware of the parallel development of ontologies (see Supplementary Document for comparison with other related ontologies) around COVID-19, however, many of them were not suitable for the use cases we define below.

Application in text mining
From the COVID-19 Ontology, we have derived a terminology tailored for use in text mining. For named entity recognition, concepts and entities were used together with their synonyms. The terminology was then integrated into the literature mining engine SCAIView (https://covid.scaiview.com/) and tested in retrieval and entity recognition experiments. Adaptation of the ontology for text mining purposes allows for information retrieval and information extraction of COVID-related research topics such as 'risk factor', 'clinical aspect', 'prevention and control', 'model', 'transmission process', 'signs and symptoms', 'virology'. The aforementioned topics are defined via logical Axioms. This allows grouping of related concepts and classes. This is used in the search engine SCAIView to aggregate related research documents into topics (see Supplementary Document). The text mining service will be made publicly available in a dedicated, public SCAIView COVID environment.

Organizing the COVID-19 knowledge space
The ontology is used to annotate data and models in the COVID-19 Knowledge Space (https://www.covid19-knowledgespace.de). We are using the COVID-19 ontology for efficient data interchange, data sharing and semantic interoperability among different sources in the COVID-19 Knowledge Space. One of the core components of the COVID-19 Knowledge Space is the Covid Knowledge SuperGraph (https://www.covid19-knowledgespace.de/), which is built by integrating knowledge from literature, publicly available knowledge graphs and disease maps including proprietary pathophysiology graphs (https://precisionlife.com/news/Covid19-sepsis/). COVID-19 Ontology is used to attain the semantic interoperability and mapping between various entities and relationships in the COVID Knowledge SuperGraph.

Discussion
The novel coronavirus has been the prime focus of thousands of scientific research topics for almost several months. Numerous scientific publications are published, and dealing with that much information brings growing limitations for the researchers and clinicians, and it is crucial to implement tools to overcome the mentioned issue. The developed COVID-19 Ontology captures and represents the essential entities and concepts relevant for COVID-19 research, and is meant to assist text mining approaches and semantic interoperability in the COVID-19 domain. COVID-19 Ontology mentioned here forms the basis for establishing the reference namespace for the COVID Knowledge Graph that encapsulates the molecular mechanisms around COVID-19 infection (Domingo-Ferná ndez et al., 2020). With the text mining use case described in our paper, researchers and clinicians have the opportunity to find papers more specific to their interests. It is also noteworthy to mention that our ontology is well suited to categorize and annotate the rapidly growing number of COVID-19 portals.