A gene–phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach

Collier

N.

et al. (

2015

)

Phenominer: from text to a database of phenotypes associated with OMIM diseases

.

Database

,

2015

, bav104.

Coulet

A.

et al. (

2010

)

Using text to build semantic networks for pharmacogenomics

.

J. Biomed. Informatics

,

43

,

1009

–

1019

.

Fader

A.

et al. (

2011

) Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.

1535

–

1545

. Association for Computational Linguistics.

Fu

R.

et al. (

2014

)

Genotype–phenotype correlations in neurogenetics: lesch-nyhan disease as a model disorder

.

Brain

,

137

,

1282

–

1303

.

Gaizauskas

R.

et al. (

2003

)

Protein structures and information extraction from biological texts: the pasta system

.

Bioinformatics

,

19

,

135

–

143

.

Horn

F.

et al. (

2004

)

Automated extraction of mutation data from the literature: application of mutext to g protein-coupled receptors and nuclear hormone receptors

.

Bioinformatics

,

20

,

557

–

568

.

Humphreys

B.L.

et al. (

1998

)

The unified medical language system: an informatics research collaboration

.

J. Am. Med. Informatics Assoc

.,

5

,

1

–

11

.

Jiang

Z.

et al. (

2011

)

Ahd2. 0: an update version of arabidopsis hormone database for plant systematic studies

.

Nucleic Acids Res

.,

39

,

D1123

–

D1129

.

Kim

J.

et al. (

2017

)

An analysis of disease-gene relationship from medline abstracts by digsee

.

Sci. Rep

.,

7

,

40154.

Lamesch

P.

et al. (

2012

)

The arabidopsis information resource (tair): improved gene annotation and new tools

.

Nucleic Acids Res

.,

40

,

D1202

–

D1210

.

Le

Q.

,

Mikolov

T.

(

2014

) Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp.

1188

–

1196

.

Luo

Y.

et al. (

2017

)

Bridging semantics and syntax with graph algorithmsstate-of-the-art of extracting biomedical relations

.

Brief. Bioinformatics

,

18

,

160

–

178

.

Michal

P.

et al. (

2011

)

Language combinatorics: a sentence pattern extraction architecture based on combinatorial explosion

.

Int. J. Comput. Linguistics

,

2

,

24

–

36

.

Mikolov

T.

et al. (

2013

) Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp.

3111

–

3119

.

Müller

H.-M.

et al. (

2004

)

Textpresso: an ontology-based information retrieval and extraction system for biological literature

.

PLoS Biol

.,

2

,

e309.

Nickel

M.

et al. (

2016

)

A review of relational machine learning for knowledge graphs

.

Proc. IEEE

,

104

,

11

–

33

.

Özgür

A.

et al. (

2008

)

Identifying gene-disease associations using centrality on a literature mined gene-interaction network

.

Bioinformatics

,

24

,

i277

–

i285

.

Papanikolaou

N.

et al. (

2015

)

Protein–protein interaction predictions using text mining methods

.

Methods

,

74

,

47

–

53

.

Rindflesch

T.C.

et al. (

1999

) Edgar: extraction of drugs, genes and relations from the biomedical literature. In Biocomputing 2000, pp.

517

–

528

. World Scientific.

Schmitz

M.

et al. (

2012

) Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp.

523

–

534

. Association for Computational Linguistics.

Segura-Bedmar

I.

et al. (

2008

)

Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems

.

Drug Discov. Today

,

13

,

816

–

823

.

Segura-Bedmar

I.

et al. (

2011

)

The 1st DDIExtraction-2011 challenge task: extraction of drug-drug interactions from biomedical texts

.

CEUR workshop proc

,

761

,

1

–

9

.

Segura Bedmar

I.

et al. (

2013

) Semeval-2013 task 9: extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In Second Joint Conference on Lexical and Computational Semantics (* SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Vol. 2, pp. 341–350.

Seren

Ü.

et al. (

2017

)

Arapheno: a public database for Arabidopsis thaliana phenotypes

.

Nucleic Acids Res

.,

45

,

D1054

–

D1059

.

Singhal

A.

et al. (

2016

)

Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine

.

PLoS Comput. Biol

.,

12

,

e1005017.

Tariq

A.

et al. (

2017

)

Nelasso: group-sparse modeling for characterizing relations among named entities in news articles

.

IEEE Trans. Pattern Anal. Mach. Intell

.,

39

,

2000

–

2014

.

Wei

C.-H.

et al. (

2015

)

Gnormplus: an integrative approach for tagging genes, gene families, and protein domains

.

BioMed Res. Int

.,

2015

,

1.