Analyzing the Performance of Various Clustering Algorithms

Full Text (PDF, 631KB), PP.45-53

Views: 0 Downloads: 0

Author(s)

Bhupesh Rawat 1,* Sanjay Kumar Dwivedi 1

1. BBAU/Department of Computer Science, Lucknow, 226025, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2019.01.06

Received: 4 Nov. 2018 / Revised: 15 Nov. 2018 / Accepted: 25 Nov. 2018 / Published: 8 Jan. 2019

Index Terms

Cluster analysis, k-means algorithm, Hierarchical algorithm, Expectation maximization, Make density-based clustering, Agglomerative clustering, Divisive clustering, Birch, Cure

Abstract

Clustering is one of the extensively used techniques in data mining to analyze a large dataset in order to discover useful and interesting patterns. It partitions a dataset into mutually disjoint groups of data in such a manner that the data points belonging to the same cluster are highly similar and those lying in different clusters are very dissimilar. Furthermore, among a large number of clustering algorithms, it becomes difficult for researchers to select a suitable clustering algorithm for their purpose. Keeping this in mind, this paper aims to perform a comparative analysis of various clustering algorithms such as k-means, expectation maximization, hierarchical clustering and make density-based clustering with respect to different parameters such as time taken to build a model, use of different dataset, size of dataset, normalized and un-normalized data in order to find the suitability of one over other.

Cite This Paper

Bhupesh Rawat, Sanjay Kumar Dwivedi, "Analyzing the Performance of Various Clustering Algorithms", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.1, pp. 45-53, 2019.DOI: 10.5815/ijmecs.2019.01.06

Reference

[1] M.S.Chen,J.Han,P.S.Yu,“Data Mining:An Overview from a Database Perspective”, IEEE Transaction on Data and Knowledge Engineering,Vol.8,pp.866-888,1996.

[2] J.Han,J.Pei,M.Kamber,Data Mining: Concepts and Techniques, Morgan Kaufman Publisher,2006.

[3] G.Kesavraj,S.Sukumaran, “A study on classification techniques in data mining”, In 4th International Conference on Computing, Communications and Networking Technologies (ICCCNT),IEEE,pp.1- 7,2013.

[4] A.Gosain,S.Dahiya, “Performance Analysis of various fuzzy clustering algorithm: A Review, In Proceeding of 7th International conference on communication, computing and virtulization,Elsevier,2016.

[5] T.Sajana,S.Rani,K.V.Narayana, “A Survey on Clustering Techniques for Big Data Mining”, Indian Journal of Science and Technology,Vol.9,2016.

[6] L.Rokach, O.Maimon. (2005) Clustering Methods. In: O.Maimon, Rokach L. (eds) Data Mining and Knowledge Discovery Handbook Springer, Boston, MA.

[7] L.Kaufman, P.J.Rousseeuw, “Finding Groups in Data An Introduction to Cluster Analysis”, A Wiley-Science Publication John Wiley & Sons.(1990).

[8] M.P.Veyssieres,R.E.Plant,“Identificaton of vegitation state and transition domain in California’s hardwood rangeland, University of California”,1998.

[9] I.Dhillon,D.Modha, “Concepts decompostion for large sparse text data using cluster machine learning”, Vol.42,pp.143-175,2001.

[10] Z.Huang, “Extension to the k-means algorithm for clustering large datasets with categorical values, Data mining and knowledge Discovery,Vol.2,pp.283-304,1998.

[11] F.Usama,G.Piatetsky-Shapiro,P.Smyth, “The KDD Process for Extracting useful Knowledge from Volumes of Data”, Communicaton of the ACM, Vol.39, pp.27-34,1996.

[12] K.S.Osama,“Data Mining in Sports: A Research Overview, MIS Master Project,2006.

[13] U.Fayyad,G.Piatetsky-Shapiro,P.Smyth, From data mining to knowledge discovery in Databases, AI Magazine, Vol.17,1996.

[14] N.Mehta,S.Dang,“A Review of Clustering Techniques in various Applications for effective data mining, International Journal of Research in Engineering & Applied Science, Vol.1,2011.

[15] H.Edelstein, “Mining Data Warehouses, Information Week,pp.48- 51,1996.

[16] W.K.Loh, Y.H.Park, “A Survey on Density-Based Clustering Algorithms”. In:Y.S.Jeong., Y.H.Park, Hsu C.H.Hsu,J.Park (eds) Ubiquitous Information Technologies and Applications. Lecture Notes in Electrical Engineering, Springer,pp.775-780,2014.

[17] T.Schön,Machine Learning, Lecture.6 Expectation Maximization(EM) and clustering", Available at: http://www.contol.isy.liu.se/student/graduate/MachineLearning/Lecture/ Machine Learning/Lectures/le6.pdf.

[18] E. Frank, M. Hall, and I. Witten, “The weka workbench,” Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufman, Burlington, 2016.

[19] E.Frank, M.Hall, G. Holmes, R. Kirkby,B. Pfahringer,I.H. Witten and, L. Trigg, Weka in Data Mining and Knowledge Discovery,Springer(2005) 1305–1314.

[20] M.Hall, E.Frank,G.Holmes, B.Pfahringer, P.Reutemann, I.H.Witten, “The WEKA data mining software: an update, ACM SIGKDD Explor. Newsl.Vol.11pp.10-18,2009.

[21] S.Borman, “The expectation maximization algorithm: A short tutorial, Unpublished paper. Available:http://ftp.csd.uwo.ca/faculty/olga/course s/Fall2006/Papers/EM algorithm.pdf.

[22] A.Hinneburg,D.A.Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, In: Proc. Int'l Conf.on Knowledge Discovery and Data Mining (KDD).pp.58-65.1998.

[23] M.Ankerst, M.M.Breunig,H-P.Kriegel,J.Sander: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. of Int'l Conf. on Management of Data, ACM SIGMOD,pp.49-60,1999.

[24] A.Hinneburg, H.-H.Gabriel:DENCLUE 2.0:Fast Clustering Based on Kernel Density Estimation. In: M.Berthold, J.Shawe-Taylor,N. Lavrač (eds.) IDA 2007.LNCS. Springer.4723,pp.70-80,2007.

[25] X.XU,M. ESTER, H.-P.KRIEGEL,J.SANDER, “A distribution-based clustering algorithm for mining in large spatial databases”, In Proceedings of the 14th ICDE,IEEE,pp.324-331,1998.

[26] X.Xu, J.Jäger, H.-P.Kriegel: “A Fast Parallel Clustering Algorithm for Large Spatial Databases", Data Mining and Knowledge Discovery (DMKD).Vol.3,pp.263-2990,1999.

[27] T.Zhang, R.Ramakrishnan, M.Linvy, “BIRCH: An efficient data clustering method for very large data sets”, Data Mining and Knowledge Discovery, Vol.1,pp.141-182,1997.

[28] S.Guha,R.Rastogi, K.Shim, “CURE: An efficient clustering algorithm for large data sets”, In Proceeding of ACM SIGMOD Conference,1998.

[29] G.Karypis, E.H.Han,V.Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling Computer”, IEEE Computer, Vol.32,pp.68-75,1999.

[30] Z.Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery.pp.283-304,1998.

[31] A.Hinneburg, D.A.Keim, “A General Approach to Clustering in Large Databases with Noise”, Knowledge and Information Systems (KAIS), Vol.5,pp.387-415,2003.

[32] Y.M.Cheung, “K*-means:A new generalized k-means clustering algorithm”,Pattern Recognition Letters, Elsevier. Vol.24,pp.2883- 2893,2003.

[33] P.Berkhin, “Survey of Clustering Data Mining techniques”, Accrue Software, Inc, 2000.

[34] J.Hartigan,M.Wong, “Algorithm AS136:A k-means clustering algorithm”, Applied Statistics, Vol.28, pp-100-108,1979.

[35] A.K.Jain, M.N.Murty, P.J.Flynn, “Data clustering: A review”, ACM Comput. Surv.Vol.31,pp.264-323,1999.

[36] K.Stoffel,A.Belkoniene, “Parallel - means clustering for large data set”, In Proc. Euro Par'99 Parallel Processing, Springer, pp.1451-1454,1999.

[37] G.Ball,D.Hall, “A clustering technique for summarizing multivariate data”,Behaviour Science, Vol.12,pp.153-155,1967.