Extraction of causal relations based on SBEL and BERT model

Abstract Extraction of causal relations between biomedical entities in the form of Biological Expression Language (BEL) poses a new challenge to the community of biomedical text mining due to the complexity of BEL statements. We propose a simplified form of BEL statements [Simplified Biological Expression Language (SBEL)] to facilitate BEL extraction and employ BERT (Bidirectional Encoder Representation from Transformers) to improve the performance of causal relation extraction (RE). On the one hand, BEL statement extraction is transformed into the extraction of an intermediate form—SBEL statement, which is then further decomposed into two subtasks: entity RE and entity function detection. On the other hand, we use a powerful pretrained BERT model to both extract entity relations and detect entity functions, aiming to improve the performance of two subtasks. Entity relations and functions are then combined into SBEL statements and finally merged into BEL statements. Experimental results on the BioCreative-V Track 4 corpus demonstrate that our method achieves the state-of-the-art performance in BEL statement extraction with F1 scores of 54.8% in Stage 2 evaluation and of 30.1% in Stage 1 evaluation, respectively. Database URL: https://github.com/grapeff/SBEL_datasets


Introduction
Biomedical entity relation extraction (RE) identifies the semantic relationships between biomedical entities (such as genes, proteins, chemicals, diseases, biological processes and so on), such as protein-protein interactions (1)(2)(3), drug-to-drug interactions (4,5) and relations between chemicals and proteins (7)(8)(9). It is of great significance to the construction of biomedical knowledge bases, precision medicine and new drug discovery as well. Majority of these entity relations represent a single interaction or regulatory relation between two biomedical entities and cannot fully reflect more complex causal relationship involving multiple biomedical entities, and therefore, the form and scope of knowledge they can express are quite restricted. The BioCreative-V community organized a shared task4 (http://www.biocreative.org/tasks/ biocreative-v/track-4-bel-task/) to extract the causal relation between biomedical entities from the literature in the form of Biological Expression Language (BEL (6); http://www.openbel.org/), which is appropriate for both machine processing and human reading. BEL can express not only causal relations between entities, but also functions around entities. This form of representation has great capacity to express rich domain-specific knowledge; it, however, poses new challenges to biomedical text mining.
There are roughly three existing strategies for tackling automatic extraction of BEL statements: rule-based methods, cross-task ones and intra-task ones. The rule-based method, such as that by Ravikumar et al. (10,11), introduces a rule-based semantic analyzer to perform the BEL extraction task. Due to the high complexity of BEL statements, it obtains an F1 value of 21.29%; for cross-task methods, the NCU-IISR system by Lai et al. (12) first uses the biomedical semantic role labeling technology to parse a sentence into the predicate-argument structure and then converts it to BEL statement, achieving an F1 measure of 32.08%; Choi et al. (13) propose an event-based extraction method and further use the reference resolution technique to identify more entities and thus more BEL statements, which achieves the performance with F1 of 35%. The reason for the low scores in cross-task methods is that information loss is unavoidable in transferring instances between different tasks, and furthermore, the training corpus provided by the BEL task is not used at all. Following the success of deep learning on many NLP (Natural Language Processing) tasks as well as RE, Liu et al. (14) propose an intra-task method to directly train a deep-learning model on the BEL training corpus. They cast the BEL extraction task as a combination of two fundamental subtasks, RE and function detection (FD), and further use attentionbased BiLSTM models to extract relations and functions that are further combined into BEL statements. Through the confidence threshold filtering of detected entity functions, their final BEL statement performance reaches the F1 value of 46.9%. Generally speaking, BEL statement extraction remains a challenging task with the F1 value below 50%. The reasons are 2-fold: one is that the training set used for RE and FD is relatively small due to the loss when it is converted from complex BEL statements and the other is that the overall performance of RE and FD in the biomedical domain is not sufficiently high.
To address the aforementioned issues, we follow the path of Liu et al. (14) and further introduce the concept of Simplified Biological Expression Language (SBEL) statements, thus transform the BEL statement extraction into the extraction of SBEL statements, so as to make full use of as many training instances (including relations and functions) as possible. Meanwhile, we employ the powerful BERT model by Devlin et al. (15), which has demonstrated the effectiveness of contextualized word representations in fine-tuning a specific task from a pretrained language model.

Complexity of BEL statements
Selven et al. (16) first proposed the BEL in 2011, which is designed to represent the complex causal relationship between biomedical entities in the field of life sciences. BEL is not only editable but also easily readable by humans. Figure 1 illustrates an example of BEL statement 'complex(p(HGNC:ITGAV), p(HGNC:ITGB6)) increases (HGNC:TGFB1)'.
BEL statements generally consist of Term, Function and Relation (17,18). Terms or entities contain entity identifiers and entity types, along with their namespaces. Entity types include proteins, chemicals, diseases and biological processes. For example, the term 'p(HGNC:TGFB1)' refers to the protein entity (TGBFBI) defined in the HGNC (HUGO Gene Nomenclature Committee) namespace. Function 'complex()' expresses the combination of protein (ITGAV) and protein (ITGB6). The causal relationship (a predicate) 'increases' indicates that the combination of the subject (HGNC:ITGAV and HGNC:ITGB6) promotes the abundance of the object (HGNC:TGFB1).
Compared with the conventional RE task with two involved entities and no functions at all, causal RE in BEL statements poses a significant challenge in NLP community. Through our analysis of the BioCreative-V Task 4 corpus, the complexity of BEL statements can be reflected in the following aspects:  (19,20,21,22) in biomedical domain can neither deal with the difficulty of self-relations and multiple relations, nor can it tackle the issue of the nested relations. If the BEL statement is regarded as a kind of semantic representation and its extraction as semantic parsing, then it is confronted with difficult issues like insufficient corpus and the erroneous alignment between entity identifiers and their mentions in text.

SBEL statements
Due to the aforementioned complexity of BEL statements and the fact that the proportion of complex statements is relatively low, this paper proposes to use an intermediate form of SBEL statements to extract BEL statements. The basic idea is to transform or discard complex structures in BEL statements, while retaining the relations between two entities as many as possible.
Formally, SBEL statements can be defined as follows: <SBEL> =: <Subject> <Relation> <Object> <Subject> =: <Function>(Entity) | <Entity> <Object> =: <Function>(Entity) | <Entity> <Entity> =: <DatabaseID>: <EntityID> <Relation> =: Increases | Decreases <Function> =: act | cat | pmod | … where <Subject> and <Object> represent BEL Terms and <Relation> describes the relationship between the subject and the object. A BEL term or an entity can be modified with a <Function>, which represents a specific biological function, and an <Entity> consists of a database identifier and an entity identifier. In short, an SBEL statement expresses the causal relationship between a subject and an object both with at most one function. Due to its simplicity, an SBEL statement can be further encoded in a quintuple: <func1, entity1, relation, func2, entity2> where func1 and func2 are the corresponding functions to entity1 (subject) and entity2 (object) respectively, and the relation is the one between the subject and the object. The following SBEL example indicates that entity (HGNC:SUMO1) promotes the catalytic activity of entity (HGNC:MDM2): <None, HGNC:SUMO1, increases, cat, HGNC:MDM2> Different from the research in Liu et al. (14), this paper considers functions with multiple arguments, mainly the 'complex()' function that takes a list of arguments. However, emphasized that function 'complex()' in a BEL statement must be decomposed to multiple single-argument functions in order to generate multiple SBEL statements.

Conversion between BEL and SBEL statements
Conversion of BEL statements to SBEL statements. Since a BEL statement has more powerful expressiveness than an SBEL statement, information loss is unavoidable during the conversion from the former to the latter. Our goal is to retain sufficient information of the original BEL statements as much as possible. Specifically, for the complex BEL statements, we perform the following processing steps: (i) Nested relations: we select the statement for conversion in nested BEL statements that only contains entities (possibly with functions) as the subject and object, so as to produce more SBEL statements and thus increase training corpus size. (ii) Nested functions: we pick up the intermediate function of an entity as its function and discard its upper functions to ensure that an entity has at most one function.
The assumption here is that the keyword expressing the intermediate function is closest to the entity in the text. (iii) Function with multiple entities: It is important to note that only the 'complex()' function has multiple arguments. To obtain as many SBEL statements as possible, 'complex()' is distributed to each of its entities to form multiple SBEL statements with the other subject or object. (iv) Self-relations: the BEL statement containing the same entity in the subject and object is discarded, because the current binary RE model cannot deal with selfrelations. (v) Multiple relations: if there are multiple relationships between two entities, only the first BEL statement is selected for conversion. Since SBEL extraction will ultimately be transformed into the task of binary RE with single label and multiple classes, only one relation type can be retained between a pair of entities. (vi) Standard conversion: a BEL statement with two entities as the subject and the object with at most one function is directly converted to an SBEL statement.
For example, the following BEL statement (BEL:2007 3928) involves nested functions with multiple entities: 'cat(complex(p(HGNC:ITGA2),p(HGNC:ITGB1))) increases bp(GOBP: "cell adhesion")' After the above conversion, two separate SBEL statements are available as follows: Obviously, there is some degree of information loss for conversion steps (i), (ii), (iv) and (v). As we can see from the above BEL statement (BEL:20073928), the function 'cat()' is lost during the conversion from BEL to SBEL.
Merging of SBEL statements to BEL statements. It is much easier to merge SBEL statements to BEL statements than the other way. We can transform an SBEL quintuple into a BEL statement by concatenating the two entities' functions with the relation, in the form of 'func1(entity1) relation func2(entity2)' (note that the function should be omitted if it is 'None'). Differently, Ravikumar et al. (10,11) first extract BEL functions and then determine relationships involving functions to complete a BEL statement. In addition to functions in Liu et al. (14), more functions like 'complex()' and 'tloc()' are included in our SBEL statements, and therefore, when the entity function in an SBEL statement is 'complex()', the merging of 'complex()' functions with the same entity should be performed. The merging strategy is as follows: Subject merging. When the subject function in a SBEL statement is 'complex()', it should be merged with other SBEL statements with the 'complex()' function in subject as well as the same predicate and object. The entities in the subjects of these statements constitute a new entity set, which is assigned to the same 'complex()' function in order to form a new BEL statement.
Object merging. Corresponding to subject merging, when the object function is 'complex()', SBEL statements with the complex() function in subject and the same predicate and object should be merged. Similarly, the entities in their objects constitute a new entity set, and the set with 'complex()' function forms a new BEL statements.
In the subsection 'Conversion of BEL statements to SBEL statements', two separate SBEL statements SBEL1 and SBEL2 are taken as examples. Their functions in the subject are 'complex()' and their predicates and objects are the same, and therefore, the original BEL statement (BEL:20073928) without 'cat()' can be obtained by subject merging. Note that the difference between the original statement and the regenerated statement is caused by the conversion from BEL to SBEL, not by the merging of SBEL to BEL since it is intuitive to see that the merging retains all the information in SBEL.

Statistics of SBEL statements on the corpus
The corpus provided by BioCreative-V BEL task includes a training set and a test set both in sentences (18). The statistics of sentences, BEL statements, transformed SBEL statements, relations and functions in this corpus from top to bottom, as well as relations and functions in SBEL statements for training and test sets are shown in Table 1 from top to bottom, where relations and functions are further broken down into their minor categories. For comparison, statistics of transformed set by Liu et al. (14) is also listed in column 3 and column 5. It can be observed from the table that: (i) Due to information loss in the conversion from BEL to SBEL, the number of SBEL statements in the training set is less than that of the BEL statements.

BEL statement extraction based on SBEL
The basic idea behind BEL extraction based on SBEL is that an intermediate format SBEL is adopted between complex BEL statements and fundamental binary and unary relations. First, BEL statements in the original training corpus are transformed into SBEL statements, which are in turn used to train both RE and FD models. Then, these two models are applied to the test set to predict both relations between entities and entity functions, which are further combined to SBEL statements. Finally, we assemble BEL statements from SBEL statements.

SBEL statement extraction based on RE and FD
For the extraction of SBEL statements, we follow the similar path as Liu et al. (14), i.e. the task is decomposed into two subtasks: extraction of binary relations between entities and detection of entity functions. The difference lies in the statements to be decomposed. We decompose SBEL statements to relations and functions, while in the work by Liu et al. (14), it is the original BEL statements to be directly decomposed. Here the procedure follows three stages: (i) SBEL statements obtained on the training set are decomposed into relation and function instances. It should be noted that if an entity appears in multiple SBEL statements with different functions, the function that appears for the first time is selected to ensure that one entity has exactly one function. (ii) Two BERT models are used to train RE and FD (as unary RE) respectively. The difference between RE and FD is that for RE every pair of two entities in the sentence is regarded as a potential instance, while for FD, every entity is regarded as a potential instance. (iii) Predict binary relationship for each pair of entities on the test set and one function (unary relation) for each entity. If there is a causal relationship between an entity pair, two involved entities, their respective functions and the relation are combined to form a quintuple, i.e. an SBEL statement.
In recent decades, research on RE in biomedical field has made great progress, as in the general domain. In addition to conventional machine learning methods such as SVM (Support Vector Machines) (19) and KNN (K-Nearest Neighbor) (20), deep learning methods such as CNN (Convolutional Neural Network) (4) and RNN (Recurrent Neural Network) (22) also exhibit superior performance. In particular, BERT (15), which is a dominant pretrained language model in recent years, not only greatly improves the performance of binary RE in the general domain, but also performs excellently in the biomedical domain [BioBERT (23)]. Naturally, we use BERT to both extract binary relations and entity functions.
BERT is a pretrained language model using Transformer (24) as a feature extractor, which converts input sentences or pairs of sentences into hidden vector sequences. Furthermore, BERT uses an MLM (Mask Language Model) (25), which can predict randomly masked words in a sequence, so bidirectional contexts are considered to train word representations. After pretraining, only fine-tuning on a specific task is needed. Figure 2 shows the structural diagram of fine-tuning a sentence-level multi-class classification task (RE and FD) on the BERT model. In the figure, Tok i and Tok j in the input sequence are referred to the 1st and 2nd entities respectively with their surface names replaced with placeholders. @ and $ are special delimiters to mark the two entities respectively. It needs to be emphasized that we simply mark just one entity with @ when dealing with FD. E 1 , E 2 · · · E N denote the input word vectors, T 1 , T 2 · · · T N denote the contextual representations from the BERT model. [CLS] is a special token used to output classification label. A fully connected layer (FC) and a softmax layer are stacked on the [CLS] output, in order to get the classification labels for RE and FD separately.
The standard BERT pretraining corpora come from BooksCorpus (26) and English Wikipedia dataset; it may not perform best in biomedical domain. Therefore, we adopt the BioBERT (23) model, which is pretrained on the combination of PubMed abstracts (PubMed) and PubMed Central full-text articles. More important, BioBERT achieves excellent performance in several biomedical text mining tasks including biomedical RE.

Experimentation
This section first introduces the hyper-parameters of our model, then describes the evaluation datasets and metrics and finally details the experimental results.

Hyper-parameter setting
We use the BioBERT version 'biobert-pubmed-v1.1' as the BERT encoder. The fine-tuning parameters of RE and FD are shown in Table 4.

Evaluation datasets
The corpus was provided by the organizer for the BioCreative V BEL task, which contains the training, sample and test sets. There is also a similar task (27) in the 2017 BioCreative VI, which uses the same training set as BC-V, but provides a new test set. However, the new test set is not publicly available, and therefore, we conduct experiments and compare results on the BC-V test set.

Evaluation metrics
We use standard metrics to evaluate the performance at a certain level, namely, Precision (P), Recall (R) and F1 (f1-measure). Precision refers to the ratio of the number of correct instances to the total number of instances extracted by the system. Recall refers to the ratio of the number of correct instances extracted from the system to the number of gold instances. F1 represents the harmonic mean of precision and recall. The three metrics can be defined as follows, where TP, FP and FN mean the numbers of true positives, false positives and false negatives respectively.

Cross-validation performance of RE and FD on the BC-V training set
Ten-fold cross-validation is performed on the training set, and the results are compared with Liu et al. (14), as shown in Table 5. The three involved models are Att-BiLSTM on the training set in Liu et al. (14), BERT (14) (we merely trained ourselves BERT models on the training set of Liu (i) No matter what kind of corpus is used, the RE performance of BERT models is improved by about 10 units compared with the Att-BiLSTM model, which is much anticipated since BERT is a more powerful model. The increase largely comes from the 'decrease' relation with fewer instances than the 'increase' relation, implying that BERT can better alleviate the problem of data sparsity than Att-BiLSTM. (ii) In terms of FD, BERT has no superiority over Att-BiLSTM as BERT obtains a little lower F1 score, which may explain that the original BERT model might not be suitable for FD. However, the precision for 'act()' function with BERT is much higher than that with Att-BiLSTM. As pointed out in Liu et al. (14), the precision of FD plays a critical role when combining functions into relations. The higher precision of FD, the greater the performance contribution of combining functions into relations to form BEL statements. (iii) Our training set for FD includes two additional types, i.e. 'tloc()' and 'complex()', whose performance is generally not high and needs to be improved. In particular, 'complex()' function involves the assembly of multiple entity functions, so it will have a significant impact on the performance of BEL statements.

Performance on the BC-V test set with/without functions
We evaluate our SBEL-BERT model on the BC-V test set with gold entities, known as Stage 2 BEL evaluation. In this case, the whole training set is used to train the models, and the induced models are then applied to the test set. The results at various evaluation levels by three methods are shown in Table 6. Similarly, the highest P/R/F1 scores in each row among three models are highlighted in bold. It can be seen that: (i) At Relation level, both BERT models perform much better than Att-BiLSTM by a margin of about 8 units in F1 score, though the performance on RS level by Att-BiLSTM performs better than BERT due to its loose evaluation. Compared with Liu et al. (14), SBEL-BERT on our training set achieves higher recall but lower precision, probably because our training set is bigger than theirs (c.f. Table 1).  (14), leading to better precision and recall. The second is that the performance gaps between BERT and Att-BiLSTM on the test set and the 10-fold crossvalidation reflect the fact that the distributions of the training and test sets on function instances are quite different as shown in Table 1. Nevertheless, this conclusion is not statistically evident because of the limited number of functions in the test set. (iv) At State(MRG) level, our model based on BERT and SBEL achieves the best F1 score of 54.8, which outperforms the state-of-the-art Att-BiLSTM model. While for BERT (14) the P/R/F1 scores at State(MRG) are lower than those at State(REL) caused by erroneous functions in BEL statements due to low precision in FD, for our SBEL-BERT model, the P/R/F1 scores State(MRG) are higher than those at State(REL) by 2∼3 units, thanks to the high precision in FD.
We also experiment with the case where gold entities on the test set are not provided (defined as Stage 1), and the same approach as Liu et al. (14) is used to automatically recognize entity mentions and link them to the corresponding databases. After that, our trained models are applied to entity mentions in the test sentences to identify entity functions and relations between entity pairs. Finally, two entity functions and their relation constitute an SBEL quintuple, which is ultimately transformed into a BEL statement. Table 7 reports the performance by three models on the test set. Likewise, the highest P/R/F1 scores in each row among three models are highlighted in bold. Other experimental setting is similar to Stage 2.
Compared with Stage 2, lower performance in P/R/F1 at all evaluation levels is apparently due to the noise associated with automatic named entity recognition and entity normalization. Interestingly, BERT (14) achieves consistently the highest precision at almost all levels, while our SBEL-BERT obtains the best recall and F1 score. This may be due to relatively larger SBEL training set with 'tloc()' and 'complex()' functions and more relation instances, leading to better generalization capability at the expense of lower precision for the BERT model.

Comparison with other systems
We compare our model with other models on the BC-V BEL test set in Stage 1(the upper part) and Stage 2 (the lower part) evaluation at various levels in Table 8. The four systems compared are (i) rule-based model (10,11), (ii) event-based model (13), (iii) NCU-IISR (12) and (iv) Att-BiLSTM model (14). The best performance of F1 scores for each column is shown in boldface in the table.
As shown in Table 8, in Stage 2, our system achieved the best performance at four evaluation levels (including relation and function levels), particularly at the BEL statement level, where the F1 value reaches 54.8%, outperforming other systems by at least 8 units. This demonstrates the efficacy of our model based on SBEL and BERT. In Stage 1, we observe that our system still achieves competitive performance, surpassing other systems except the rule-based one (10,11).

Error analysis
We perform error analysis in order to better understand the complexity and difficulty of BC-V BEL extraction task and divide the errors into the following five categories: i. Modeling deficiency Due to the nature of SBEL statements, some complex BEL statements involving nested relations, nested functions, with multiple arguments, self-relation and multiple relations, cannot be recognized. They are discarded during the process of being converted to SBEL. It is also impossible to be reconstructed from SBEL. The results in Table 2 suggest that, even if the performance of RE and FD is perfect, i.e. 100%, the F1 score at the BEL statement level is less than 90%.
ii. Misaligned entity mentions Aligning entity identifiers in a BEL statement to entity mentions in the corresponding sentence is performed before converting BEL to SBEL. We use the same approach as in Liu et al. (14) to entity alignment, which is a fuzzy matching algorithm based on edit distance. In some cases, entity identifiers may be aligned to erroneous entity mentions or may not be aligned at all. For example, for the sentence 'ClC-3 is activated by Ca(2+)-calmodulin-dependent protein kinase II; however, the magnitude of the Ca(2+)-dependent Cl(-) current was unchanged in the Clcn3(-/-) animals.

Conclusion
Following the work by Liu et al. (14), we apply the similar idea of decomposing BEL statement extraction into RE and FD subtasks. Differently, an intermediate statement form (SBEL statement) bridges the gap between BEL statements with rich entity functions and relation instances without entity functions. SBEL enhances the expressivity of relations, thus entails more learning instances than the previous one and leaves space for further improvement, though at the expense of losing a small fraction of BEL statements. Meanwhile, we employ the more powerful BERT model than the original Att-BiLSTM model in order to achieve better performance for RE and FD. Ultimately, experimental results on the BioCreative-V Track 4 corpus demonstrate that our method significantly improves the performance of BEL statement extraction. Our system achieves the stateof-the-art results in Stage 2 evaluation with an F1 score of 54.8%. One deficiency is that our system cannot achieve a satisfactory level of performance in FD. Therefore, one direction in future research is how to use a more effective model or incorporate more features to improve the FD performance. On the other hand, we will also explore the joint learning strategy between RE and FD in SBEL statement extraction, aiming to make full use of the dependence between relations and functions.