A Study on Analysis of SMS Classification Using Document Frequency Thresold

Full Text (PDF, 165KB), PP.44-50

Views: 0 Downloads: 0

Author(s)

R. Parimala 1,* R. Nallaswamy 2

1. National Institute of Technology Tiruchirappalli

2. Department of Mathematics National Institute of Technology, Tiruchirappalli

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2012.01.06

Received: 14 Nov. 2011 / Revised: 25 Dec. 2011 / Accepted: 3 Jan. 2012 / Published: 8 Feb. 2012

Index Terms

Text Mining, Support Vector Machine, Document Term Matrix, Document frequency threshold

Abstract

Recent years, feature selection is chief concern in text classification. A major characteristic in text classification is the high dimensionality of the feature space. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. Feature selection is performed here using Document Frequency Threshold. This paper focus on SVM based text message classification using document frequency threshold. The experiment is performed with NUS SMS text messages data set. An experimental result shows that the results of proposed method are more efficient.

Cite This Paper

R.Parimala, R. Nallaswamy, "A Study on Analysis of SMS Classification Using Document Frequency Thresold", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.4, no.1, pp.44-50, 2012. DOI:10.5815/ijieeb.2012.01.06

Reference

[1]http://timesofindiaindiatimes.com/tech/personaltech/computing/junk-sms-no-end-tomobile-pammes/ articleshow/6247207.cms.

[2]http://www.livemint.com/2010/07/27000020/Scour e-of-SMS-spam-swamps-mob.html.

[3]C.J.C. Burges., A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2): 955-974, 1998.

[4]T. Joachims, Learning to Classify Text Using Support Vector Machines Dissertation, Kluwer, 2002.

[5]J.T. Kwok. Automated text categorization using support vector machine, In Proceedings of the International Conference on Neural Information Processing, Kitakyushu, Japan, Oct. 1998, pp. 347-351.

[6]V. N. Vapnik. The nature of Statistical Learning Theory, Springer, Berlin, 1995.

[7]N. Cristianini, and J. Shawe-Taylor, Support Vector and Kernel Methods, Intelligent Data Analysis: An Introduction Springer – Verlag, 2003.

[8]N.Cristianini, and J. Shawe-Taylor, An introduction to support vector machines, Cambridge, UK: Cambridge University Press, 2004.

[9]B. Schölkopf. C.J.C. Burges, and A.J. Smola,Advances in Kernel Methods: Support Vector Learning, MIT Press, (Eds.), 1998.

[10]A.J. Smola and B. Scholkopf, Learning with kernels: Support Vector Machines, regularization, optimization, and beyond, Cambridge, MA: MIT press.

[11]SU Gao-li, Deng Fang-ping. Introduction to Model selection of SVM Regression, Bulletin of Science and Technology, 2006.22(2):154-157

[12]Ingo Feinerer. An introduction to text mining in R. R News, 8(2):19-22, October 2008

[13]Ingo Feinerer, Kurt Hornik, and David Meyer. Text mining infrastructure in R. Journal of Statistical Software, 25(5):1-54, March 2008.

[14]Karatzoglou, A., Smola, A., Hornik, K,, Zeileis, A., 2005, kernlab–Kernel Methods., R package, Version 0.6-2., Available from http://cran.R-project.org.

[15]Alexandros Karatzoglou and Ingo. Feinerer. Kernel-based machine learning for fast text mining in R. Computational Statistics & Data Analysis, 54(2):290-297, February 2010.

[16]Y. Yang and X. Liu., 1999, A re-examination of text categorization methods, In Proc of SIGIR, ACM press, NewYork, NY, USA. 

[17]C. J. van Rijsbergen., 1979, Information Retrieval. Butterworth’s, London.