Using Deep Learning Towards Biomedical Knowledge Discovery

Full Text (PDF, 515KB), PP.1-10

Views: 0 Downloads: 0

Author(s)

Nadeem N. Rather 1,* Chintan O. Patel 1 Sharib A. Khan 1

1. Applied Informatics Inc., New York, NY 10001, USA

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2017.02.01

Received: 1 Jan. 2017 / Revised: 31 Jan. 2017 / Accepted: 1 Mar. 2017 / Published: 8 Apr. 2017

Index Terms

Biomedical knowledge, Bioinformatics, Deep learning, Machine learning, Unified Medical Language System, UMLS, Word2vec, Word vectors

Abstract

A vast amount of knowledge exists within biomedical literature, publications, clinical notes and online content. Identifying hidden, interesting or previously unknown biomedical knowledge from free text resources using an automated approach remains an important challenge. Towards this problem, we investigate the use of deep learning methods that have shown significant promise in identifying hidden patterns from large corpus of text in an unsupervised manner. For example, it can deduce that 'husband' - 'man' + 'woman' = 'wife'. We use the text corpus from MRDEF file in the Unified Medical Language System (UMLS) dataset as training set to discover potential relationships. To evaluate our approach, we cross-verify new relationships against the UMLS MRREL dataset and conduct a manual evaluation from a sample of the non-overlapping set. The algorithm found 32% of new relationships not originally represented in the UMLS. The deep learning methods provide a promising approach in discovering potential new biomedical knowledge from free text.

Cite This Paper

Nadeem N. Rather, Chintan O. Patel, Sharib A. Khan,"Using Deep Learning Towards Biomedical Knowledge Discovery", International Journal of Mathematical Sciences and Computing(IJMSC), Vol.3, No.2, pp.1-10, 2017.DOI: 10.5815/ijmsc.2017.02.01

Reference

[1]Swanson DR., Fish oil, Raynaud's syndrome, and undiscovered public knowledge.

[2]Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Briefings in Bioinformatics. 2005; 6:57–71.

[3]Srinivasan P, Libbus B. Mining MEDLINE for implicit links between dietary substances and diseases. Bioinformatics. 2004; 20:i290–i296.

[4]Smalheiser NR, Swanson DR. Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine 1998; 57: 149-153.

[5]Hristovski D, Peterlin B, Mitchell JA, Humphrey SM. Improving literature based discovery support by genetic knowledge integration. Studies in Health Technology and Informatics. 2003; 95:68–73.

[6]UMLS® Reference Manual, http://www.ncbi.nlm.nih.gov/books/NBK9685/

[7]Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.

[8]Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.

[9]Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004; 32:D267–D270.

[10]Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.

[11]https://code.google.com/archive/p/word2vec/

[12]Slides about word vectors from NIPS 2013 Deep Learning Workshop: https://drive.google.com/file/d/0B7XkCwpI5KDYRWRnd1RzWXQ2TWc

[13]Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507.