A Fuzzy Approach for Text Mining

Full Text (PDF, 649KB), PP.34-43

Views: 0 Downloads: 0

Author(s)

Deepa B. Patil 1,* Yashwant V. Dongre 1

1. Vishwakarma Institute of Information Technology,3/4 Kondhwa (Bk), Pune-411048, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2015.04.04

Received: 1 Aug. 2015 / Revised: 4 Sep. 2015 / Accepted: 3 Oct. 2015 / Published: 8 Nov. 2015

Index Terms

Fuzzy clustering, fuzzy c means clustering algorithm, text mining

Abstract

Document clustering is an integral and important part of text mining. There are two types of clustering, namely, hard clustering and soft clustering. In case of hard clustering, data item belongs to only one cluster whereas in soft clustering, data point may fall into more than one cluster. Thus, soft clustering leads to fuzzy clustering wherein each data point is associated with a membership function that expresses the degree to which individual data points belong to the cluster. Accuracy is desired in information retrieval, which can be achieved by fuzzy clustering. In the work presented here, a fuzzy approach for text classification is used to classify the documents into appropriate clusters using Fuzzy C Means (FCM) clustering algorithm. Enron email dataset is used for experimental purpose. Using FCM clustering algorithm, emails are classified into different clusters. The results obtained are compared with the output produced by k means clustering algorithm. The comparative study showed that the fuzzy clusters are more appropriate than hard clusters.

Cite This Paper

Deepa B. Patil, Yashwant V. Dongre,"A Fuzzy Approach for Text Mining", International Journal of Mathematical Sciences and Computing(IJMSC), Vol.1, No.4, pp.34-43, 2015.DOI: 10.5815/ijmsc.2015.04.04

Reference

[1][Online] Availablehttps://www.cs.cmu.edu/~./enron/

[2]T. W. Schoenharl & G. Madey. Evaluation of measurement techniques for the validation of agent-based simulations against streaming data. Proc. ICCS 2008, Krakow, Poland. 

[3]J. Han & M. Kamber. Data Mining Concepts and Techniques. 2nd ed. San Francisco ,CA, USA: Elsevier;2006.

[4]C.G. Gonzalez, W. Bonventi, Jr. & A.L.V. Rodrigues.Density of closed balls in real-valued and autometrized Boolean spaces for clustering applications. Proc. 19th Brazilizn Symp. Artif. Intel 2008; pp. 8-22.

[5]J. A. Aslam & M. Frost. An information-theoretic measure for document similarity. Proc. 26th SIGIR 2003; pp. 449-450.

[6]D. Lin. An information theoretic definition of similarity. Proc. 15th Int. Conf. Mach. Learn 1998 SanFrancisco, CA, USA.

[7]J.D'hondt, J. Vertommen, P.A. Verhaegen, D. Cattrysse & R.J. Duflou. Pairwise-adaptive dissimilarity measure for document clustering. Inf. Sci. 2010; Vol. 180, No. 12, pp. 2341-2358.

[8]R.W.Hamming. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950; Vol. 29, No.2, pp.147-160.

[9]S. Kullback & R.A.Leibler. On information and sufficiency. Annu. Math. Statist. 1951; Vol. 22, No. 1, pp. 79-86.

[10]Yung-Shen Lin, Jung-Yi Jiang & Shie-Jue Lee. Similarity Measure for Text Classification and clustering. IEEE Transactions on Knowledge and Data Engineering 2014; Vol. 26, No. 7. 

[11]Zadeh, L.A. Fuzzy sets. Information and Control 8 (3): 338–353 1965; doi:10.1016/s0019-9958(65)90241-x.

[12]Zadeh, L.A. Fuzzy Logic. Stanford Encyclopedia of Philosophy. Stanford University 2006.

[13]Zadeh, L. A. et al. Fuzzy Sets, Fuzzy Logic, Fuzzy Systems, World Scientific Press 1996; ISBN 981-02-2421-4

[14]Kosko, B. Fuzzy Thinking: The New Science of Fuzzy Logic.1994; Hyperion.

[15]Pratihar, D.K.: Soft Computing. Narosa Publishing House, New Delhi, India 

[16]Dunn, J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well- Separated Clusters. J. Cybernet 1973; Vol. 3, pp. 32–57.

[17]Bezdek, J.C.Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell, MA, USA, 1981.

[18]Pal, N.R.—Bezdek, J.C. .On Cluster Validity for the Fuzzy C-Means Model. IEEEFS 1995; Vol. 3, No. 3, p. 370.

[19]Albayrak, S.—Armasyali, F. Fuzzy C-Means Clustering on Medical Diagnostic System. Proc. Int. XII Turkish Symp 2003; on Artif. Intel. NN.

[20]Zhang, D.Q.—Chen, S.C. A Novel Kernelized Fuzzy C-Means Algorithm With Application in Medical Image Segmentation. Artif. Intel. Med 2004; Vol. 32, pp. 37–50.

[21]Migaly, S.—Abonyi, J.—Szeifert, F. Fuzzy Self-Organizing Map Based on Regularized Fuzzy C-Means Clustering. Advances in Soft Computing, Engineering Design and Manufacturing. J.M. Benitez, O. Cordon, F. Hoffmann, et al. (Eds.), Springer Engineering Series 2002; 2002, pp. 99–108.

[22]Sikka, K.—Sinha, N.—Singh, P.K.—Mishra, A.K. A Fully Automated Algorithm Under Modified FCM Framework for Improved Brain MR Image Segmentation. Magnetic Resonance Imaging. 2009 Vol. 27, No. 7, pp. 994–1004.

[23]Krinidis, S.—Chatzis, V. A Robust Fuzzy Local Information C-Means Clustering Algorithm.IEEE Trans. on Image Processing 2010; Vol. 19, No. 5, pp. 1328–1337.

[24]Belhassen, S.—Zaidi, H. A Novel Fuzzy C-Means Algorithm for Unsupervised Heterogeneous Tumor Quantification. PET. Medical Physics 2010; Vol. 37, No. 3, pp. 1309–1324.