Accelerated K-means Clustering Algorithm

Full Text (PDF, 456KB), PP.39-46

Views: 0 Downloads: 0

Author(s)

Preeti Jain 1,* Bala Buksh 1

1. R. N. Modi Engineering College, Kota, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2016.10.05

Received: 27 Jan. 2016 / Revised: 10 May 2016 / Accepted: 3 Jul. 2016 / Published: 8 Oct. 2016

Index Terms

Clustering, k-means, Optimization of k-means, Distance metrics, Poteras et al's Scheme

Abstract

Optimizing K-means is still an active area of research for purpose of clustering. Recent developments in Cloud Computing have resulted in emergence of Big Data Analytics. There is a fresh need of simple, fast yet accurate algorithm for clustering huge amount of data. This paper proposes optimization of K-means through reduction of the points which are considered for re-clustering in each iteration. The work is generalization of earlier work by Poteras et al who proposed this idea. The suggested scheme has an improved average runtime. The cost per iteration reduces as number of iterations grow which makes the proposal very scalable.

Cite This Paper

Preeti Jain, Bala Buksh, "Accelerated K-means Clustering Algorithm", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.10, pp.39-46, 2016. DOI:10.5815/ijitcs.2016.10.05

Reference

[1]E.W.Forgy, “Cluster analysis of multivariate data: efficiency v/s interpretability of classifications”, Biometrics, 21, pp.768–769, 1965

[2]Nameirakpam Dhanachandra, Khumanthem Manglem and Yambem Jina Chanu, “Image Segmentation using K -means Clustering Algorithm and Subtractive Clustering Algorithm”, Proceedings of the Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015), Procedia Computer Science 54, pp. 764 – 771, 2015.

[3]Ramzi A. Haraty, Mohamad Dimishkieh and Mehedi Masud, “An Enhanced k-Means Clustering Algorithm for Pattern Discovery in Healthcare Data”, International Journal of Distributed Sensor Networks, Volume 2015, pp. 1-11, 2015.

[4]M. Bala Krishna and M. N. Doja, “Deterministic K-means secure coverage clustering with periodic authentication for wireless sensor networks”, International Journal of Communication Systems, 2015. DOI: 10.1002/dac.3024.

[5]Suneetha Chittineni, Raveendra Babu Bhogapathi, “Determining Contribution of Features in Clustering Multidimensional Data Using Neural Network”, International Journal of Information Technology and Computer Science(IJITCS), Vol. 4, No. 10, pp.29-36, 2012.

[6]Joao Caldas da Costa, “K-means Clustering for Sleep Spindles Classification”, International Journal of  Information Technology & Computer Science ( IJITCS ), Volume 10  : Issue  No : 3, Proceedings of the 2nd International Conference  on Computer Science,  Information System & Communication Technologies  ( ICCSISCT 2013 )- Sydney , Australia, pp. 77-85, 2013.

[7]Shehroz. S. Khan and Amir Ahmad “Cluster center initialization algorithm for kmeans clustering” Pattern Recognition Letters. 25 (11), Pages 1293–1302, 2004.

[8]D. Arthur and S. Vassilvitskii, “k-means++: The Advantages of Careful Seeding”, Society for Industrial and Applied Mathematics , Philadelphia, 2007. 

[9]T. Su and J. G. Dy, “In Search of Deterministic Methods for Initializing K-Means and Gaussian Mixture Clustering,” Intelligent Data Analysis, vol. 11, no. 4, pp. 319–338,2007

[10]M. Emre Celebi and Hassan A. Kingravi, “Deterministic Initialization of the K-Means Algorithm using Hierarchical Clustering” International Journal of Pattern Recognition and Artificial Intelligence, 26(7):1250018, 2012.

[11]Abdessalam H. Elhabbash, “Enhanced k-means Clustering Algorithm”, Masters Diss., The Islamic University of Gaza, 2010.

[12]A. M. Fahim, A. M. Salem, F. A. Torkey and M. A. Ramadan, “An Efficient Enhanced K-means Clustering Algorithm”, Journal of Zhejiang University, Science A , 7(10), pages 1626 - 1633, 2006.

[13]S. Tiwari and T. Solanki, “An Optimized Approach for k-means Clustering” IJCA Proc.9th International ICST Conf on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine-2013), pp:5-7, December 2013.

[14]C. Elkan, “Using the triangle inequality to accelerate k-means”, Proc. Twentieth International Conference On Machine Learning (ICML), pp 147–153, 2003.

[15]G. Hamerly, “Making k-means even faster”,  Proc. SIAM Int. Conf. on Data Mining, 2010

[16]J. Drake, G. Hamerly, “Accelerated k-means with adaptive distance bounds”, 5th NIPS Workshop on Optimization for Machine Learning, 2012.

[17]S. J. Phillips, “Acceleration of k-means and related clustering algorithms”, Mount D, Stein C (eds), Algorithm engineering and experiments. Lecture notes in computer science, vol 2409. Springer, Berlin, Heidelberg, pp 61–62, 2002.

[18]G. Hamerly and J. Drake, “Accelerating Lloyd’s Algorithm for k-Means Clustering”, Partitional Clustering Algorithms, Springer, pp 41-78, 2014.

[19]C. M. Poteras , M. C. Mihaescu and M. Mocanu, “An Optimized Version of the K-Means Clustering Algorithm” Proc. 2014 Federated Conf. on Computer Science and Information Systems, pp. 695–699 , DOI: 10.15439/2014F258, 2014.

[20]UCI Machine Learning Repository, http://ics.uci.edu/ mlearn/MLRepository.html