Document Summarization using TextRank and Semantic Network

Full Text (PDF, 809KB), PP.26-33

Views: 0 Downloads: 0

Author(s)

Ahmad Ashari 1,* Mardhani Riasetiawan 1

1. Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.11.04

Received: 16 Mar. 2017 / Revised: 1 May 2017 / Accepted: 8 Jun. 2017 / Published: 8 Nov. 2017

Index Terms

TextRank, Semantic Network, Document Summarization, Rouge-N, F-Score

Abstract

The research has implemented document summarizing system uses TextRank algorithms and Semantic Networks and Corpus Statistics. The use of TextRank allows extraction of the main phrases of a document that used as a sentence in the summary output. The TextRank consists of several processes, namely tokenization sentence, the establishment of a graph, the edge value calculation algorithms using Semantic Networks and Corpus Statistics, vertex value calculation, sorting vertex value, and the creation of a summary. Testing has done by calculating the recall, precision, and F-Score of the summary using methods ROUGE-N to measure the quality of the system output. The quality of the summaries influenced by the style of writing, the selection of words and symbols in the document, as well as the length of the summary output of the system. The largest value of the F-Score is 10% of the length ta of the document with the F-Score 0.1635 and 150 words with the F-Score 0.1623.

Cite This Paper

Ahmad Ashari, Mardhani Riasetiawan, "Document Summarization using TextRank and Semantic Network", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.11, pp.26-33, 2017. DOI:10.5815/ijisa.2017.11.04

Reference

[1]Bond, F., Lim, L.T., Tang, E.K. and Riza, H., 2014, The Combined Wordnet Bahasa, NUSA: Linguistic studies of languages in and around Indonesia, 57, 83–100.
[2]Brin, S. and Page, L., 1998, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks and ISDN Systems, 1-7, 30, 107–117.
[3]Dinakaramani, A., Rashel, F., Luthfi, A. and Manurung, R., 2014, Designing an Indonesian Part of Speech Tagset and Manually Tagged Indonesian Corpus, 2014 International Conference on Asian Language Processing (IALP), Kuching.
[4]Li, Y., McLean, D., Bandar, Z., O'Shea, J.D. and Crockett, K., 2006, Sentence Similarity Based on Semantic Nets and Corpus Statistics, IEEE Transactions on Knowledge and Data Engineering, 8, 18, 1138–1150
[5]Lin, C.Y., 2004, Rouge: A Package for Automatic Evaluation of Summaries, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona.
[6]Aliguliyev, R.M., 2007, Automatic Document Summarization by Sentence Extraction, Вычислительные технологии, 5, 12, 5–15
[7]Radev, D.R., Hovy, E. and McKeown, K., 2002, Introduction to the Special Issue on Summarization, Computational Linguistics, 4, 28, 399–408.
[8]Mihalcea, R. and Tarau, P., 2004, TextRank: Bringing Order into Texts, Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona.
[9]Li, Y., Bandar, Z.A. and McLean, D., 2003, An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources, IEEE Transactions on Knowledge and Data Engineering, 4, 15, 871–882
[10]Zikra, H., 2009, Sistem Peringkas Teks Otomatis Menggunakan Algoritme Page Rank, Tesis, Jurusan Ilmu Komputer FMIPA IPB, Bogor
[11]Hoffmann, A. and Pham, S.B., 2003, Towards Topic-Based Summarization for Interactive Document Viewing, K-CAP 2003 - Proceedings of the 2nd International Conference on Knowledge Capture, Sanibel Island.
[12]Park, S., 2009, User-focused Automatic Document Summarization using Non-negative Matrix Factorization and Pseudo Relevance Feedback, Proceedings of 2009 International Conference on Computer Engineering and Applications (ICCEA 2009), Manila.
[13]Aji, S. and Kaimal, R., 2012, Document Summarization Using Positive Pointwise Mutual Information, International Journal of Computer Science & Information Technology, 2, 4, 47–55.
[14]Miller, G.A., 1995, WordNet: A Lexical Database for English, Communications of the ACM, 11, 38, 39–41
[15]Noor, N.H.M., Sapuan, S. and Bond, F., 2011, Creating the Open Wordnet Bahasa., Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), Singapore.
[16]Fang, C., Mu, D., Deng, Z., Wu, Z., 2016, Word-Sentence co-ranking for automatic extractive text summarization, Expert System with Appliactions, 72,, 189-195.
[17]Abdi, A., Idris, N., Alguliyev, R.M., 2017, Query-based multi-dicuments summarization using linguistic knowledge and content word expansion, Soft Computing, 21(7), 1785-1801.
[18]Di Sciascio, C., Mayr, L., Veas, E., 2017, The 2017 ACM Workshop on Exploratory Search and Intreactive Data Aanalytics, 41-48.
[19]Kumar, Y.J., Kang, F.J., Goh, O.S., Khan, A., 2017, Text Summarization Besd on Classficiation Using ANFIS, Studies in Computational Intelligence, 405-417.