Keyphrase Extraction of News Web Pages

Full Text (PDF, 351KB), PP.48-58

Views: 0 Downloads: 0

Author(s)

Chandrakala Arya 1,* Sanjay k. Dwivedi 1

1. B.B. Ambedkar University, Lucknow-226025,India

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2018.01.06

Received: 28 Feb. 2017 / Revised: 22 May 2017 / Accepted: 11 Sep. 2017 / Published: 8 Jan. 2018

Index Terms

Keyphrase extraction, Lexical chain, Web News, TF*IDF, WordNet

Abstract

Keyphrase extraction from news web pages is an important task for news documents retrieval and summarization. Keyphrases are like index terms that enclose the important information about document content. Keyphrases actually offer concise and precise description of document content. Key phrases are considered as a single word or a combination of more than one word that represent the important concepts in a text documents. The aim of this paper is to develop and evaluate an automatic keyphrases extraction approach for news web pages. Our approach identifies the candidate keyphrases from documents and chooses those candidate keyphrase having highest weight score. Weight formula combines the feature set that includes TF*IDF, phrase disatnce in documents and lexical chain that is based on WordNet to represent semantic relations between words. The experimental results show that the performance of our approach is better than the contemporary approaches today.

Cite This Paper

Chandrakala Arya, Sanjay k. Dwivedi,"Keyphrase Extraction of News Web Pages", International Journal of Education and Management Engineering(IJEME), Vol.8, No.1, pp.48-58, 2018. DOI: 10.5815/ijeme.2018.01.06

Reference

[1] Chien LF. PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Inf. Process. Manage.1999 Jul 1; 35(4):501-21.

[2] Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG. KEA: Practical automatic keyphrase extraction. InProceedings of the fourth ACM conference on Digital libraries 1999 Aug 1; 254-255.

[3] Martínez-Fernández JL, García-Serrano A, Martínez P, Villena J. Automatic keyword extraction for news finder. InInternational Workshop on Adaptive Multimedia Retrieval 2003 Sep 15; 99-119.

[4] Wu YF, Li Q, Bot RS, Chen X. KIP: a keyphrase identification program with learning functions. In Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on 2004 Apr 5; 2: 450-454.

[5] Wang J, Peng H, Hu JS. Automatic keyphrases extraction from document using neural network. Advances in Machine Learning and Cybernetics. 2006; 633-41.

[6] Wu YF, Li Q. Document keyphrases as subject metadata: incorporating document key concepts in search results. Information Retrieval. 2008 Jun 1; 11(3):229-49.

[7] Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG. Domain-specific keyphrase extraction. In16th International Joint Conference on Artificial Intelligence (IJCAI 99) 1999; 2: 668-673).

[8] Morris J, Hirst G. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational linguistics. 1991 Mar 1; 17(1):21-48.

[9] Ercan G, Cicekli I. Using lexical chains for keyword extraction. Information Processing & Management. 2007 Nov 30; 43(6):1705-14.

[10] Steffen R. Lexical chain Annotation Guidelines. 2012.

[11] Lui YJ, Brent R, Calinescu A. Extracting significant phrases from text. In Advanced Information Networking and Applications Workshops, 2007, AINAW'07. 21st International Conference on 2007 May 21; 1: 361-366.

[12] Li ZF, Zhao XH, Yi J, He B. Improvement of KEA Based on Lexical Chain. In Advanced Materials Research 2013; 756: 2999-3004.

[13] Duwairi R, Hedaya M. Automatic keyphrase extraction for Arabic news documents based on KEA system. Journal of Intelligent & Fuzzy Systems. 2016 Jan 1; 30(4):2101-10.

[14] Xu S, Yang S, Lau FC. Keyword Extraction and Headline Generation Using Novel Word Features. In AAAI 2010 Jul 5; 1461-1466.

[15] Luo Z, Tang J, Wang T. Improving keyphrase extraction from web news by exploiting comments information. InAsia-Pacific Web Conference 2013; 140-150.

[16] Boudin F. A comparison of centrality measures for graph-based keyphrase extraction. In International Joint Conference on Natural Language Processing (IJCNLP) 2013; 834-838.

[17] Xie F, Wu X, Hu X. Keyphrase extraction based on semantic relatedness. In Cognitive Informatics (ICCI), 2010 9th IEEE International Conference on 2010; 308-312

[18] Gao Y, Liu J, Ma P. The hot keyphrase extraction based on tf* pdf. InTrust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on 2011 Nov 16; 1524-1528

[19] Li Z, He B. Adding Lexical Chain to Keyphrase Extraction. InWeb Information System and Application Conference (WISA), 2014 11th 2014; 254-257.

[20] Hsu HM, Chang RI, Chang YJ, Lin SY, Wang YJ, Ho JM. Subject-Keyphrase Extraction Based on Definition-Use Chain. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE/WIC/ACM International Conference on 2015 Dec 6; 3: 199-202.

[21] Wang C, Zhang M, Ru L, Ma S. An automatic online news topic keyphrase extraction system. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01 2008 Dec 9; 214-219.

[22] Suo H, Liu Y, Cao S. A keyword selection method based on lexical chains. Journal of Chinese Information Processing. 2006; 20(6):25-30.

[23] Silber HG, McCoy KF. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics. 2002 Dec; 28(4):487-96.

[24] Barzilay R. Lexical chains for summarization (Doctoral dissertation, Ben-Gurion University of the Negev). 1997

[25] Wu X, Wu GQ, Xie F, Zhu Z, Hu XG. News filtering and summarization on the web. IEEE Intelligent Systems. 2010 Sep; 25(5):68-76.

[26] http://nlp.stanford.edu/software/tagger.shtml