Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project

Abstract Background The COVID-19 pandemic highlighted the importance of rapidly updating scientific information. However, the guidelines’ drafting process is highly time- and resource-consuming. The COKE Project aims to accelerate and streamline the extraction and synthesis of scientific evidence. To do so, the Project used deep learning to implement a semi-automated system that enhances the systematic literature review processes. We aim to show some preliminary results on the automatic classification of abstract sentences in papers related to COVID-19. Methods The tool is based on Natural Language Processing algorithms to detect and classify PICO elements and medical terms and organize abstracts accordingly. We built a BERT + bi-LSTM language model. The tool was trained on a corpus of 24,668 abstracts unrelated to COVID-19. We assessed the tool performance in a specific topic related to COVID-19 that has not been covered during training. To carry out manual validation, we randomly selected 50 abstracts. Abstract sentences were classified by 2 domain experts into 7 types: Aim (A), Participants (P), Intervention (I), Outcome (O), Method (M), Results (R), and Conclusion (C). The performance of the tool was compared with that of the experts in terms of precision, recall, and F1. Results The classifier proved to have a 76% overall accuracy. Precision, recall, and F1 were above 75% for all types of sentences except I, M, and P. Conclusions The results indicate a promising ability of the semi-automated classifier to predict expert-validated labels on abstracts of different topics. Our proposed tool is expected to significantly reduce the effort for producing medical guidelines and therefore have a strong, positive impact, particularly in emergency scenarios. The COKE Project also represents a call-to-action for similar initiatives, aimed at enhancing the information extraction process in medicine. Key messages • A rapidly changing healthcare requires fast decisions supported by scientific evidence. This is not compatible with the human limits in cognitive skills that reduce the ability to extract information. • The COKE Project aims to speed up the creation of healthcare guidelines, semi-automating parts of the workflow, and supporting the human-performed process of extracting and analyzing contents.


Issue/problem: AGENAS supports the implementation of health policies in direct collaboration with Italian Regions and Autonomous
Provinces. To improve public reporting, we aimed to complement the production of technical reports with new forms of timely communication, using National Portals. Description of the problem: Between October-December 2020, we designed and implemented the Covid-19 National Portal, including a suite of targeted indicators, fully automated via ad hoc scripts written in php and R on top of a relational database using internal and external data sources. Targeted information was widely communicated and continuously updated. Dedicated sections on forecasting and resilience were delivered in collaboration with specialised academic institutions. In 2021, we deployed the Portal for the Transparency of Health Services, broadly oriented towards health issues, the location of services and performance indicators.

Results:
Pre-post comparisons of web analytics for Jan-Apr 2020-2022 showed clear advantages of Covid-19 Portal. By Apr 2020, Italy had introduced national lockdown, while AGENAS covered the topic traditionally, recording 48,122 users overall, with daily peaks below 5,000 sessions. In 2021-2022, the number of users skyrocketed at 436,280, with daily peaks of 100,000 sessions, and 421,123 respectively, with daily peaks of 150,000 sessions. Visits to the Transparency Portal were considerably more limited.

Lessons:
To be widely used, public health information needs to be relevant (responding to personal need close to home), understandable, accurate and timely. National Portals can gain efficiency through the mediation of search engines, enhanced by: targeted naming (url), coherent semantic perimeter (third level domain in a highly referenced institu-

Background:
The COVID-19 pandemic highlighted the importance of rapidly updating scientific information. However, the guidelines' drafting process is highly time-and resource-consuming. The COKE Project aims to accelerate and streamline the extraction and synthesis of scientific evidence. To do so, the Project used deep learning to implement a semi-automated system that enhances the systematic literature review processes. We aim to show some preliminary results on the automatic classification of abstract sentences in papers related to COVID-19.

Methods:
The tool is based on Natural Language Processing algorithms to detect and classify PICO elements and medical terms and organize abstracts accordingly. We built a BERT + bi-LSTM language model. The tool was trained on a corpus of 24,668 abstracts unrelated to COVID-19. We assessed the tool performance in a specific topic related to COVID-19 that has not been covered during training. To carry out manual validation, we randomly selected 50 abstracts. Abstract sentences were classified by 2 domain experts into 7 types: Aim (A), Participants (P), Intervention (I), Outcome (O), Method (M), Results (R), and Conclusion (C). The performance of the tool was compared with that of the experts in terms of precision, recall, and F1.

Results:
The classifier proved to have a 76% overall accuracy. Precision, recall, and F1 were above 75% for all types of sentences except I, M, and P.

Conclusions:
The results indicate a promising ability of the semi-automated classifier to predict expert-validated labels on abstracts of different topics. Our proposed tool is expected to significantly reduce the effort for producing medical guidelines and therefore have a strong, positive impact, particularly in emergency scenarios. The COKE Project also represents a call-to-action for similar initiatives, aimed at enhancing the information extraction process in medicine.

Key messages:
A rapidly changing healthcare requires fast decisions supported by scientific evidence. This is not compatible with the human limits in cognitive skills that reduce the ability to extract information.
The COKE Project aims to speed up the creation of healthcare guidelines, semi-automating parts of the workflow, and supporting the human-performed process of extracting and analyzing contents.

Issue:
Infodemics (i.e., overflow of information in physical and digital spaces that makes it difficult for people to make good health decisions) can undermine emergency response, but capacity for infodemic management has been limited in countries thus far. Specifically, there is a need to build capacities in the field with practical and scalable tools. Description of the problem: WHO has developed tools and trainings to quickly build and enhance infodemic management (IM) capacity at the countrylevel, such as tools for rapid generation of IM insights and a framework for conducting landscape analyses to establish sustainable IM capacities. These were developed in collaboration with multidisciplinary experts who provided feedback. We sought to create tools that can be a basis for introducing evidence-generation in health information systems to inform emergency preparedness and response, and mainstream methods into routine infodemic diagnostics activities.

Results:
The tools and trainings provide a comprehensive framework for diagnosing and addressing infodemics, such as a public health taxonomy to guide digital intelligence analysis and integrated analysis methods for generation of actionable insights. Additionally, the landscape analysis framework outlines steps for assessing strategic needs and assets for routinizing IM functions as part of existing public health systems and programs.

Lessons:
The tools and trainings will be deployed in the field to evaluate utility. Feedback from users in the global WHO infodemic manager community will be systematically captured.

Key messages:
Field responders need practical tools and trainings that guide quick infodemic response during health emergencies. These tools and trainings can be used to diagnose and intervene on infodemics, even in settings where infodemic insights units are not yet established.

Issue:
The COVID-19 pandemic and current recovery efforts have been complicated by a parallel infodemic. The infodemic has manifested itself in the rapid spread of questions, concerns and misinformation that can affect population attitudes and behavior harmful to health -promoting stigma and discrediting science, non-recommended treatments and cures, politicizing health programs and eroding trust in health workers and health systems. Description: WHO's COVID-19 Pillar 2 (risk communication, community engagement and infodemic management) developed an integrated public health infodemic insights methodology for weekly analysis of social media, traditional media and other data sources to identify, categorize, and understand the key concerns and narratives expressed, and inform risk communication and response activities.

Results:
The infodemic characterization, integrated analysis and insights generation consisted of a 3-step mixed-methods approach. First, data was collected from publicly available social and news media and categorized into categories of conversations by a COVID-19 public health taxonomy. Second, the dataset was analyzed and compared week-onweek to identify changes in narratives and conversation sentiment. Third, the digital infodemic intelligence was reviewed by a group of subject matter experts and triangulated with other data sources to derive infodemic insights and provide recommendations for action for the week. The methodology has been applied to inform COVID-19 response, COVID-19 vaccine demand promotion, and preparing for mass gatherings or mass immunization campaigns.

Lessons:
The methodology for infodemic intelligence generation and integration has introduced evidence-based analytical practices for generation of infodemic insights and recommendations for action into the work of WHO. It must be further adapted for use by different health programmes and preparedness functions, and is described WHO Field Infodemiology Manual. Key messages: Health authorities can use infodemic insights to respond to people's concerns, questions and information deficits in a timely and effective manner. An evidence-based methodology has been developed and validated to generate infodemic insights and recommendations for action during an acute health event or emergency.