Fuzzy Clustering of Sequential Data

Full Text (PDF, 377KB), PP.43-54

Views: 0 Downloads: 0

Author(s)

B.K. Tripathy 1,* Rahul Dahiya 2

1. VIT, SITE, Vellore-632014, INDIA

2. VIT, SCOPE, Vellore-632014, INDIA

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.01.05

Received: 24 May 2018 / Revised: 1 Jul. 2018 / Accepted: 15 Jul. 2018 / Published: 8 Jan. 2019

Index Terms

Clustering, Fuzzy Clustering, Sequence mining, Similarity measures, Pattern mining

Abstract

With the increase in popularity of the Internet and the advancement of technology in the fields like bioinformatics and other scientific communities the amount of sequential data is on the increase at a tremendous rate. With this increase, it has become inevitable to mine useful information from this vast amount of data. The mined information can be used in various spheres; from day to day web activities like the prediction of next web pages, serving better advertisements, to biological areas like genomic data analysis etc. A rough set based clustering of sequential data was proposed by Kumar et al recently. They defined and used a measure, called Sequence and Set Similarity Measure to determine similarity in data. However, we have observed that this measure does not reflect some important characteristics of sequential data. As a result, in this paper, we used the fuzzy set technique to introduce a similarity measure, which we termed as Kernel and Set Similarity Measure to find the similarity of sequential data and generate overlapping clusters. For this purpose, we used exponential string kernels and Jaccard's similarity index. The new similarity measure takes an account of the order of items in the sequence as well as the content of the sequential pattern. In order to compare our algorithm with that of Kumar et al, we used the MSNBC data set from the UCI repository, which was also used in their paper. As far as our knowledge goes, this is the first fuzzy clustering algorithm for sequential data.

Cite This Paper

B.K. Tripathy, Rahul, "Fuzzy Clustering of Sequential Data", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.1, pp.43-54, 2019. DOI:10.5815/ijisa.2019.01.05

Reference

[1]K. Alexandros, A. Smola, K. Hornik and A. Zeileis, “kernlab - An S4 Package for Kernel Methods in R”, Journal of Statistical Software, vol. 11(9), (2004), 1-20.
[2]Anuradha, J., B.K.Tripathy and A. Sinha: Hybrid Clustering algorithm using Possibilistic Rough C-means, International journal of Pharma and Bio-informatics, vol.6, issue 4, (2015), pp.799-810.
[3]Anuradha, J. and Tripathy, B.K.: An optimal rough fuzzy clustering algorithm using PSO, Int. Jour. of Data Mining, Modeling and Management, vol.7, issue 4, (2014). pp. 257-275.
[4]Atanassov, K. T., Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20(1) (1986) 83-96.
[5]Bellman, R., Kalaba, R., and Zadeh, L. A., Abstraction and pattern classification, Journal of Mathematical Analysis and Applications, 2, (1966), 581-586.
[6]Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, (1981).
[7]Bezdek, J. C., Coray, C., Gunderson, R. and Watson, J., Detection and characterisations of cluster substructure: I. Linear structure: Fuzzy C-lines, SIAM J. Appl. Math. 40(2) (1981), 339-357.
[8]Bezdek, J. C., Coray, C., Gunderson, R. and Watson, J., detection and characterisations of cluster substructure: II, Fuzzy c-varieties and convex combinations thereof, SIAM Journal of Applied. Math. 40(2) (1981), 358-372.
[9]Bezdek, J. C. and Hathaway, R. J., Dual object-relation clustering models, Int. J. General Systems, 16 (1990), 385-396.
[10]Bezdek, J. C. and Hathaway, R. J., Howard, R.E., Wilson, C. E. and Windham, M. P., Local convergence analysis of a grouped variable version of coordinate descent, J. Optimization Theory and Applications, 54 (3), (1986), 471-477.
[11]Bhargava, R., Tripathy, B.K., Tripathy, A., Dhull, R., Verma, E., and Swarnalatha, P., Rough intuitionistic fuzzy c-means algorithm and a comparative analysis, in: Proceedings of the 6th ACM India Computing Convention, (2013), 1-11.
[12]Chaira, T., A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images, Applied Soft Computing, 11(2) (2011), 1711-1717
[13]Dunn, J. C., A fuzzy relative of the ISODATA process and its use in detecting compact, well-separated clusters, J. Cybernetics, 3 (1974) 32-57.
[14]Dunham, M.H., Data Mining: Introductory and Advanced Topics, Prentice Hall, NJ, (2003).
[15]G. Sasikumar and B.K.Tripathy: Classification and Analysis of EEG Brain Signals for Finding Epilepsy Risk Levels Using SVM, World Applied Sciences Journal, 33 (4), (2015), pp. 631-639.
[16]Guralnik, V. and Karypis, G., A scalable algorithm for clustering sequential data, In: Proceedings of the 1st IEEE International Conference on Data Mining- ICDM, (2001), 179- 186.
[17]Hathaway, R.J Davenport, J.W. and Bezdek, J.C., Relational duals of the c-means clustering algorithms, Pattern Recognition, 22, (1989), 205-212.
[18]Hathaway, R. J. and Bezdek, James C., Nerf c-means: Non-Euclidean relational fuzzy clustering, Pattern Recognition, 27, (1994), 429-437.
[19]Hofmann, T., Schölkopf, B., and Smola, A. J., ‘Kernel methods in machine learning’, Annals of Statistics 36 (3), (2008), 1171—1220.
[20]Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T. (2014). Big Data Clustering: A Review. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8583. Springer, Cham.
[21]Jayaram Reddy, A. and Tripathy, B.K.: Covering Rough Set Fuzzy C- Medoids (Crfm) Clustering Algorithm For Gene Expression Data, Journal of Advanced Research in Dynamical and Control Systems, vol.9, sp-14, (2017), pp.1702-1714.
[22]Jacques, J., Preda, C. Functional data clustering: a survey. Adv Data Anal Classif 8, 231–255 (2014).
[23]Kalaiselvi, T., Kalaichelvi, N. and Sriramakrishnan, P.: Automatic Brain Tissues Segmentation based on Self Initializing K-Means Clustering Technique, IJISA, vol. 9, no. 11, (2017), pp.52-61.
[24]Kaufman, L. and Rousseeuw, P.J., Clustering by means of Medoids, in Statistical Data Analysis Based on the L1–Norm and Related Methods, edited by Y. Dodge, North-Holland,(1987), 405–416.
[25]Kaufman, L. and Rousseeuw, P.J., Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, (1990).
[26]Kumar, P., Radha Krishna, P., Bapi, R. S., and De, S.K., Rough clustering of sequential data, Data Knowledge Engineering, 63, (2007), 183-199.
[27]Libert, G. and Roubens, M., Non-metric fuzzy clustering algorithms and their cluster validity, In Fuzzy Information and Decision processes, (Edited by M. Gupta and E. Sanchez), (1982), 417-425, New York.
[28]Lodhi, H., Saunders, C., Taylor, J. S. and Cristianini, N., Text Classification using String Kernels , Journal of Machine Learning Research 2 (2002), 419-444.
[29]Machler, M., P. Rousseeuw, A. Struyf, M. Hubert and K. Hornik: Cluster: Cluster Analysis Basics and Extensions, (2012), https://www.researchgate.net/publication/ 272176869_Cluster_Cluster_Analysis_Basics_and_Extensions.
[30]Maji, P., and Pal, S. K., RFCM: A hybrid clustering algorithm using rough and fuzzy sets, Fundamenta Informaticae, 80 (4), (2007), 475-496.
[31]Mitra, S., Banka, H., and Pedrycz, W., Rough and Fuzzy Collaborative Clustering, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 36 (4), (2006), 795-805.
[32]P. Prabhavathy, B.K.Tripathy: Sequential clustering: A Study on Covering Based Rough Set Theory, Research Journal of Pharmaceutical, Biological and Chemical Sciences, Volume 7, Issue 2, 2016, pp. 1799-1807
[33]Prabhavathy, P. and Tripathy, B. K.: Covering rough clustering approach for unstructured activity analysis, International Journal of Intelligent Information Technologies, Volume 12, Issue 2, April-June 2016, pp. 1- 11.
[34]Roubens, M., Pattern classification problems with fuzzy sets, Fuzzy sets and systems, 1 (1978), 239-253
[35]Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comp. Appl. Math., 20, (1987), 53–65.
[36]Rui, X. and Wunch, D., Clustering, Wiley, IEEE Press, (2009).
[37]Ruspini, E. H., A New Approach to Clustering, Information and Control, 15, (1969), 22-32
[38]Ruspini, E. H., Numerical Methods for fuzzy clustering, Information Sciences, 2, 319 – 350
[39]Ruspini, E. H., New experimental results in fuzzy clustering, Information Sciences, 6, (1973), 273 -284.
[40]Sandhu, S.S., Jadhav, A. R. and Tripathy, B.K.: Comparison of centroid-based clustering algorithms in the context of divide and conquer paradigm based FMST framework, IEEE proceedings of ICRCICN 2017, (2017), pp.219-224.
[41]Seetha, H., Tripathy, B.K. and Murthy, M. K.: Modern Technologies for Big Data Classification and Clustering, IGI Edited volume, ISBN-10: 1522528059, (2017).
[42]Swarnalatha, P. and Tripathy, B.K.: A Comparative analysis of Depth computation of Leukemia Images using a refined Bit Plane and Uncertainty based clustering Techniques, Cybernetics and Information Technologies, vol.15, no.1,(2015), pp.126- 146
[43]Tripathy, B.K. and Swarnalatha, P.: A Comparative Study of RIFCM with Other Related Algorithms from Their Suitability in Analysis of Satellite Images using Other Supporting Techniques, Kybernetes, vol.43, no.1,(2014), pp. 53-81
[44]Tripathy, B.K. and P. Prabhavathy: An Integrated Covering based Rough Fuzzy set Clustering Approach for Sequential data, International Journal of Reasoning based Intelligent Systems, vol.7, issues 3-4, (2015), pp.296-304.
[45]Tripathy, B.K., Goyal, A., Chowdhury, R. and Patra, A. S.: MMeMeR: An Algorithm for Clustering Heterogeneous Data using Rough Set Theory, I.J. Intelligent Systems and Applications, (2017), vol.9 (8), pp. 25-33.
[46]Tripathy, B.K., and Sharmila Banu, K.: Soft Computing Techniques for Categorical Data Analysis on Bio-informatics, International journal of Pharma and Bio-informatics, vol.6, issue 4, (2015), pp.642-646.
[47]Yang, M. S. and Yu, K. F., On stochastic convergence theorems for the fuzzy c-means clustering procedure, Int. J. general Systems, 16 (1990), 397-411.
[48]Yang, M. S. and Yu, K. F., On existence and strong consistency of a class of fuzzy c-means clustering procedure, Cybernetics and Systems, 23 (1992), 583-602.
[49]Yang, M. S., On asymptotic normality of a class of fuzzy c-means clustering procedure.
[50]Zadeh, L. A., Fuzzy Sets, Information and Control, 8, (1965), 338 – 353.
[51]Zimmermann, H. J., Fuzzy set theory and its applications, Boston, Kluwer Academic Publishers, (1991).