Efficient and Fast Initialization Algorithm for Kmeans Clustering

Full Text (PDF, 930KB), PP.21-31

Views: 0 Downloads: 0

Author(s)

Mohammed El Agha 1,* Wesam M. Ashour 1

1. Islamic University of Gaza, Gaza, Palestine

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2012.01.03

Received: 3 Jun. 2011 / Revised: 6 Sep. 2011 / Accepted: 17 Nov. 2011 / Published: 8 Feb. 2012

Index Terms

Data mining, K-means initialization m pattern recognition

Abstract

The famous K-means clustering algorithm is sensitive to the selection of the initial centroids and may converge to a local minimum of the criterion function value. A new algorithm for initialization of the K-means clustering algorithm is presented. The proposed initial starting centroids procedure allows the K-means algorithm to converge to a “better” local minimum. Our algorithm shows that refined initial starting centroids indeed lead to improved solutions. A framework for implementing and testing various clustering algorithms is presented and used for developing and evaluating the algorithm.

Cite This Paper

Mohammed El Agha, Wesam M. Ashour, "Efficient and Fast Initialization Algorithm for K-means Clustering", International Journal of Intelligent Systems and Applications(IJISA), vol.4, no.1, pp.21-31, 2012. DOI:10.5815/ijisa.2012.01.03

Reference

[1]Sanjay Goil, Harasha Nagesh, Alok Choudhary, “MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets”, 1999

[2]U.M. Fayyad, G Piatesky –Shapiro, P.Smyth, and R.Uthuusamy. “Advances in data mining and knowledge discovery. MIT Press”, 1994

[3]M. Eirinaki and M. Vazirgiannis, “Web Mining for Web Personalization,” ACM Transactions on Internet Technology (TOIT), vol. 3, no. 1, pp. 1-27, 2003

[4]B.Bahmani Firouzi, T. Niknam, and M. Nayeripour, “A New Evolutionary Algorithm for Cluster Analysis,” Proceeding of world Academy of Science, Engineering and Technology, vol. 36. Dec. 2008.

[5]A.Gersho and R. Gray, “Vector Quantization and Signal Compression,” Kulwer Acadimec, Boston, 1992.

[6]M. Al- Zoubi, A. Hudaib, A. Huneiti and B. Hammo, “New Efficient Strategy to Accelerate k-Means Clustering Algorithm,” American Journal of Applied Science, vol. 5, no. 9, pp 1247-1250, 2008.

[7]M. Celebi, “Effecitive Initialization of K-means for Color Quantization,” Proceeding of the IEEE International Conference on Image Processing, pp. 1649-1652, 2009.

[8]M. Borodovsky and J. McIninch, “Recognition of genes in DNA

[9]A.K Jane and R.C Dube, “Algorithms for Clustering Data. Prentice-Hall Inc”, 1988

[10]A.K. JAIN , M.N. MURTY and P.J. FLYNN, “Data Clustering: A Review”, 2000

[11]Guojun Gan, Chaoqun Ma and Jianhong Wu, “Data Clustering Theory, Algorithms, and Applications” 2007.

[12]MAO, J. AND JAIN, A. K, “Texture classification and segmentation using multi resolution simultaneous autoregressive models”, 1992.

[13]MCQUEEN, J. “Some methods for classification and analysis of multivariate observations”, 1967.

[14]R.O. Duda and P.E. Hart, “Pattern Classification and Scene Analysis”, 1973.

[15]R. Neal and G. Hinton, “A view of the EM algorithm that justifies incremental, sparse, and other variants'', 1998.

[16]P. S. Bradley, O. L. Mangasarian, and W. N. Street, "Clustering via Concave Minimization, 1997.

[17]K. Fukunaga ,” Introduction to Statistical Pattern Recognition”, 1990.

[18]Shehroz and Ahmad, “Cluster center initiation algorithm for k-means clustering” , 2004.

[19]Bradley and Fayyad, “Refining initial points for K-means clustering”, 1998

[20]Penã, J.M., Lozano, J.A., Larrañaga, P., 1999. “ An empirical comparison of four initialization methods for the K-means algorithm”, 1999.

[21]Kohei Arai and Ali Ridho Barakha, “Hierarchical K-means: an algorithm for centroids initialization for K-means” , 2007

[22]M. Al-Daoud, “A New Algorithm for Clustering Initialization,” Preceeding World Academy of Science, Engineering, and Technology, vol. 4, 2005.

[23]M. Meila and D. Heckerman, "An experimental comparison of several clustering methods", 1998.