Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator+

Abstract Summary Second use of clinical data commonly involves annotating biomedical text with terminologies and ontologies. The National Center for Biomedical Ontology Annotator is a frequently used annotation service, originally designed for biomedical data, but not very suitable for clinical text annotation. In order to add new functionalities to the NCBO Annotator without hosting or modifying the original Web service, we have designed a proxy architecture that enables seamless extensions by pre-processing of the input text and parameters, and post processing of the annotations. We have then implemented enhanced functionalities for annotating and indexing free text such as: scoring, detection of context (negation, experiencer, temporality), new output formats and coarse-grained concept recognition (with UMLS Semantic Groups). In this paper, we present the NCBO Annotator+, a Web service which incorporates these new functionalities as well as a small set of evaluation results for concept recognition and clinical context detection on two standard evaluation tasks (Clef eHealth 2017, SemEval 2014). Availability and implementation The Annotator+ has been successfully integrated into the SIFR BioPortal platform—an implementation of NCBO BioPortal for French biomedical terminologies and ontologies—to annotate English text. A Web user interface is available for testing and ontology selection (http://bioportal.lirmm.fr/ncbo_annotatorplus); however the Annotator+ is meant to be used through the Web service application programming interface (http://services.bioportal.lirmm.fr/ncbo_annotatorplus). The code is openly available, and we also provide a Docker packaging to enable easy local deployment to process sensitive (e.g. clinical) data in-house (https://github.com/sifrproject). Supplementary information Supplementary data are available at Bioinformatics online.


I. Proxy Architecture
The NCBO Annotator+ architecture follows the proxy architectural pattern. We have created a front-end REST API that is to be queried instead of NCBO Annotator API. This service supports all the API functions supported by NCBO Annotator. Within the confines of the API of the original NCBO Annotator, our service merely forwards the request to NCBO Annotator and returns result as-is. If a user queries the NCBO Annotator+ API with an extended parameter, then the parameter is stripped and appropriate pre-processing steps are applied, before crafting a query to the original annotator and potentially post processing the annotation results.
The proxy architecture has been applied to produce NCBO Annotator+, however it is generic and is also used on the French Annotator in the SIFR BioPortal (http://bioportal.lirmm.fr/annotator) or on the AgroPortal Annotator (http://agroportal.lirmm.fr/annotator).

II. Application Programming Interface Guide
We are fully compatible with the original NCBO Annotator API, which is described in detail on the NCBO website: http://data.bioontology.org/documentation#nav_annotator. We will describe all the new parameters added by our proxy in the table below: Parameter Description semantic_groups=GROUP1,GROUP2,...
Filter the annotations by one or more UMLS Semantic Groups as defined in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4300099/

format=[json|brat|quaero|rdf]
Selects the output format of the annotations among json-ld, brat, quaero (a custom BRAT format for the Quaero annotated corpus), and rdf (a custom RDF format using the Annoation Ontology).

score=[old|cvalue|cvalueh]
Activate annotations scoring, takes a value among: old, cvalue (the c-value score as described in the paper) and cvalueh (a hierarchical version of c-value). The score is added to the annotatedClass of the JSON output as score.
score_threshold=[0-9]+ Filters the annotation by an absolute score threshold. Only annotations with a score above the threshold will be shown. Requires score to be activated.

confindence_threshold=0-100
Filters the annotations by a threshold between 0% and 100% relative to the distribution of scores in the output annotations. For example, a value of 90% will only keep annotations with scores in the top 90% of the score distribution.

negation=[true|false]
Activate negation detection with the ConText algorithm. The output is added to the annotations object of the JSON output as negationContext.

experiencer=[true|false]
Activate experiencer detection with the ConText akgorithm. The output is added to the annotations object of the JSON output as experiencerContext.

temporality=[true|false]
Activate temporality detection with the ConText algorithm. The output is added to the annotations object of the JSON output as temporalityContext.