A Turkish Wikipedia Text Summarization System for Mobile Devices

Full Text (PDF, 613KB), PP.1-10

Views: 0 Downloads: 0

Author(s)

Akif Hatipoglu 1,* Sevinc ilhan Omurca 2

1. Yapı Kredi Bank, Content and Document Management Software Programming Department, Software Developer, Levent/İstanbul, 34330, Turkey

2. Kocaeli University Computer Engineering Department, Kocaeli, 41000, Turkey

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2016.01.01

Received: 23 Apr. 2015 / Revised: 15 Aug. 2015 / Accepted: 6 Oct. 2015 / Published: 8 Jan. 2016

Index Terms

Turkish text summarization, Turkish Wikipedia, Latent semantic analysis, Helmholtz principle, Mobile application

Abstract

Today Wikipedia provides a very large and reliable domain-independent encyclopedic repository. With this study a mobile system which summarizes Turkish Wikipedia text is presented. The presented system selects the sentences due to structural features of Turkish language and semantic features of the sentences. The performance evaluation is made based on judgments of human experts. The results are tested due to precision and recall values of a ranked sentence list and it is concluded that, the summarization results are promising.

Cite This Paper

Akif Hatipoglu, Sevinç İlhan Omurca, "A Turkish Wikipedia Text Summarization System for Mobile Devices", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.1, pp.1-10, 2016. DOI:10.5815/ijitcs.2016.01.01

Reference

[1]Balinsky A., Balinsky H. and Simske S., On the Helmholtz principle for documents processing, In: Proceedings of the 10th ACM Document Engineering, 2010.

[2]Balinsky A., Balinsky H. and Simske S., On the Helmholtz principle for data mining, In: Proceedings of the Conference on Knowledge Discovery, 2011a.

[3]Balinsky A., Balinsky H. and Simske S., Rapid change detection and text mining, In: Proceedings of the 2nd Conference on Mathematics in Defence, 2011b.

[4]Balinsky H., Balinsky A. and Simske, S. Document sentences as a small world, In: Proceedings of the IEEE International Conference on Systems, 2011c, pp. 2583-2588.

[5]Bawakid A., and Oussalah M., Summarizing with Wikipedia, In Proceedings of the text analysis conference, 2010.

[6]Binwahlan M.S., Salim N. and Suanmali L., Fuzzy swarm diversity hybrid model for text summarization, Information Processing and Management, 46 (2010), 571–588.

[7]Cığır C., Kutlu M. and Cicekli I., Generic Text Summarization for Turkish, The Computer Journal, 53, 8 (2010), 1315-1323.

[8]Collaboration and Decision, Support Software for Groups and Organizations 2011–2012, available at: http://www.expertchoice.com/.

[9]Deerwester S.C., Dumais S.T., Landauer T.K., Furnas G.W. and Harshman R.A. Indexing by latent semantic analysis, Journal of the American Society of Information Science, 41, 6 (1990), 391–407.

[10]Gong S., Qu Y., and Tian S., Summarization using Wikipedia, In Proceedings of text analysis conference, 2010.

[11]Güran A., Bayazıt N.G. and Bekar E., Automatic Summarization of Turkish Documents Using Non-negative Matrix Factorization, Innovations in Intelligent Systems and Applications (INISTA), IEEE, 2011, pp. 480 – 484.

[12]Güran A., Bayaz?t N.G. and Gürgüz M.Z., Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization, Turkish Journal of Electrical Engineering & Computer Sciences, 21 (2013), 1411 – 1425.

[13]Jiang Y., Zhang X., Tang Y. and Nie R., Feature-based approaches to semantic similarity assessment of concepts using Wikipedia, Information Processing and Management, 51 (2015), 215–234.

[14]L’Huillier G., Hevia A., Weber R. and Rios S., Latent Semantic Analysis and Keyword Extraction for Phishing Classification, IEEE International Conference on Intelligence and Security Informatics, 2010, pp. 129 – 131.

[15]Liu B. Web Data Mining Exploring Hyperlinks, Contents, and Usage Data, First Edition, Springer, 2006.

[16]Lowe D., Perceptual Organization and Visual Recognition, Amsterdam: Kluwer Academic Publishers, 1985.

[17]Mashechkin I.V., Petrovskiy M.I., Popov D.S. and Tsarev D.V., Automatic text summarization using latent semantic analysis, Programming and Computer Software, 37 (2011), 299–305.

[18]Miao Y. and Li C., WikiSummarizer – A Wikipedia-based summarization system, In Proc. text analysis conference (TAC), 2010.

[19]Ozsoy M.G., Cicekli I. and Alpaslan F.N., Text Summarization of Turkish Texts using Latent Semantic Analysis, Proceedings of the 23rd International Conference on Computational Linguistics, 2010, pp. 869–876.

[20]Pourvali M., and Abadeh M. S., A new graph based text segmentation using Wikipedia for automatic text summarization, International Journal of Advanced Computer Science and Applications, 3,1 (2012a).

[21]Pourvali M. and Abadeh M. S., Automated text summarization base on lexicales chain and graph using of WordNet and Wikipedia knowledge base, International Journal of Computer Science Issues, 9, 1 (2012b).

[22]Radev D.R., Hovy E. and McKeown K., Introduction to the special issue on summarization, Computational Linguistics, 28, 4 (2002), 399–408.

[23]Ramanathan K., Sankarasubramaniam Y., Mathur N. and Gupta, A, Document summarization using Wikipedia, In Proceedings of the first international conference on intelligent human computer interaction, Springer, 2009, pp. 254–260.

[24]Saaty T.L., A Scaling Method for Priorities in Hierarchical Structures, Journal of Mathematical Psychology, 15 (1997), 57-68.

[25]Sankarasubramaniam Y., Ramanathan K. and Ghosh S., Text summarization using Wikipedia, Information Processing and Management, 50 (2014), 443–461.

[26]Sakhare and Kumar, Syntactic and Sentence Feature Based Hybrid Approach for Text Summarization, I.J. Information Technology and Computer Science, 03, (2014), 38-46.

[27]Steinberger and Jezek, Latent Semantic Analysis in Text Summarization and Summary Evaluation, Proceedings of ISIM, 2004, pp. 93-100.

[28]Utkin L.V. and Simanova N.V., The DS/AHP Method Under Partial Information About Criteria and Alternatives by Several Levels of Criteria, International Journal of Information Technology & Decision Making, 11, 2, (2012), 307-326.

[29]Wong K-F., Wu M. and Li W., Extractive Summarization Using Supervised and Semi-supervised Learning, Proceedings of the 22nd International Conference on Computational Linguistics, (2008), pp. 985–992.

[30]Yeh J-Y., Ke H-R, Yang W-P and Meng I-H., Text summarization using a trainable summarizer and latent semantic analysis, Information Processing and Management, 41 (2005), 75–95.

[31]Zemberek website: http://code.google.com/p/zemberek/.