Combining Naïve Bayes and Modified Maximum Entropy Classifiers for Text Classification

Full Text (PDF, 509KB), PP.32-38

Views: 0 Downloads: 0

Author(s)

Hanika Kashyap 1,* Bala Buksh 1

1. R. N. Modi Institute of Technology, Kota, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2016.09.05

Received: 3 Oct. 2015 / Revised: 27 Feb. 2016 / Accepted: 11 May 2016 / Published: 8 Sep. 2016

Index Terms

Text Classification, Combination of clas-sifiers, Naïve Bayes Classifier, Maximum Entropy Clas-sifier, Accuracy

Abstract

Text Classification is done mainly through classifiers proposed over the years, Naïve Bayes and Maximum Entropy being the most popular of all. However, the individual classifiers show limited applicability according to their respective domains and scopes. Recent research works evaluated that the combination of classifiers when used for classification showed better performance than the individual ones. This work introduces a modified Maximum Entropy-based classifier. Maximum Entropy classifiers provide a great deal of flexibility for parameter definitions and follow assumptions closer to real world scenario. This classifier is then combined with a Naïve Bayes classifier. Naïve Bayes Classification is a very simple and fast technique. The assumption model is opposite to that of Maximum Entropy. The combination of classifiers is done through operators that linearly combine the results of two classifiers to predict class of documents in query. Proper validation of the 7 proposed modifications (4 modifications of Maximum Entropy, 3 combined classifiers) are demonstrated through implementation and experimenting on real life datasets.

Cite This Paper

Hanika Kashyap, Bala Buksh, "Combining Naïve Bayes and Modified Maximum Entropy Classifiers for Text Classification", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.9, pp.32-38, 2016. DOI:10.5815/ijitcs.2016.09.05

Reference

[1]D. Lewis, “Naive Bayes at Forty:The Independence As-sumption”, Information Retrieval. Proc. ECML-98, 10th European Conf. Machine 1998.

[2]K. Nigam, J. Lafferty, and A. McCulllum, “Using Maxi-mum Entropy for Text Classification”, IJCAI-99, Work-shop on Machine learning for Information Filtering, pgs 61-67, 1999.

[3]G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification”, Proc. ODBASE pp- 986 – 996, 2003.

[4]C. Basu and M. S. Waters, “Support Vector Machines for Text Categorization”, Proc. 36th Annual Hawaii Interna-tional Conference on System Sciences, 2003.

[5]FerruhYiğit, Ömer Kaan Bayka, “Detection of The News about Turkey with Web-based Text Mining System”, In ternational Journal of Information Technology & Computer Science (IJITCS, Volume 11, Issue No: 2, pp.56-6-, 2013.

[6]Fatma Howedi and Masnizah Mohd, “Text Classification for Authorship Attribution Using Naive Bayes Classifier with Limited Training Data”, Computer Engineering and Intelligent Systems, Vol.5, No.4, 2014.

[7]L.S. Larkey. and W. B. Croft, “Combining classifiers in text  categorization”, Proc. SIGIR-96, 19th ACM Interna-tional  Conference on Research and Development in In-formation Retrieval (Zurich, CH, 1996), pp. 289–297 1996.

[8]Paul N. Bennett, Susan T. Dumais, Eric Horvitz, “Proba-bilistic Combination of Text Classifiers Using Reliability Indicators: Models and Results”, Proceedings of 25th An-nual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Fin-land, August 2002. ACM Press.

[9]B. Grilheres, S. Brunessaux, and P. Leray, “Combining classifiers for harmful document filtering”, RIAO '04 Coupling approaches, coupling media and coupling lan-guages for information retrieval, Pages 173-185, 2004.

[10]Kanoksri Sarinnapakorn and Miroslav Kubat, “Combining Subclassifiers in Text Categorization: A DST-Based Solu-tion and a Case Study”, IEEE Transactions On Knowledge And Data Engineering, Vol. 19, No. 12, December 2007.

[11]Dino Isa, Lam Hong lee, V. P Kallimani, and R. Raj Kumar, “Text Documents Preprocessing with the Bayes Formula for Classification using the Support vector machine”, IEEE Transactions of Knowledge and Data Engineering, vol.20, no. 9, pp.1264-1272, September 2008.

[12]Dino Isa, V. P Kallimani and Lam Hong lee, “Using Self Organizing Map for Clustering of Text Documents”, Ex-pert System with Applications, vol. 36, no. 5, pp. 9584-9591, July, 2009.

[13]Duoqian Miao , QiguoDuan, Hongyun Zhang, and Na Jiao, “Rough set based hybrid algorithm for text classification”, Journal of  Expert Systems with Applications, vol. 36, no. 5, pp. 9168-9174, July 2009.

[14]K. Fragos, P. Belsis, and C. Skourlas, “Combining Proba-bilistic Classifiers for Text Classification”, Procedia - So-cial and Behavioral Sciences, Volume 147 Pages 307–312, 3rd International Conference on Integrated Information(IC-ININFO), doi: 10.1016 /j.sbspro .2014.07. 098, 2014.

[15]S. Keretna, C. P. Lim and D. Creighton, “Classification Ensemble to Improve Medical Named Entity Recognition”, 2014 IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA, USA, 2014.

[16]S. Ramasundaram, “NGramsSA Algorithm for Text Cate-gorization”, International Journal of Information Technol-ogy & Computer Science ( IJITCS ), Volume 13, Issue No : 1, pp.36-44, 2014. 

[17]UCI Machine Learning Repository, http://ics.uci.edu/ mlearn/MLRepository.html