Efficient Clustering Algorithm with Enhanced Cohesive Quality Clusters

Full Text (PDF, 434KB), PP.48-57

Views: 0 Downloads: 0

Author(s)

Anand Khandare 1,* Abrar Alvi 2

1. SGB Amravati University, Department of CSE, Amravati, India

2. PRMIT&R, Department of CSE, Badnera, Amravati, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2018.07.05

Received: 23 Jun. 2017 / Revised: 4 Aug. 2017 / Accepted: 15 Sep. 2017 / Published: 8 Jul. 2018

Index Terms

Clustering, Cluster, Massive Data, k-means, Cohesive, Quality, Validity Measures

Abstract

Analyzing data is a challenging task nowadays because the size of data affects results of the analysis. This is because every application can generate data of massive amount. Clustering techniques are key techniques to analyze the massive amount of data. It is a simple way to group similar type data in clusters. The key examples of clustering algorithms are k-means, k-medoids, c-means, hierarchical and DBSCAN. The k-means and DBSCAN are the scalable algorithms but again it needs to be improved because massive data hampers the performance with respect to cluster quality and efficiency of these algorithms. For these algorithms, user intervention is needed to provide appropriate parameters as an input. For these reasons, this paper presents modified and efficient clustering algorithm. This enhances cluster’s quality and makes clusters more cohesive using domain knowledge, spectral analysis, and split-merge-refine techniques. Also, this algorithm takes care to minimizing empty clusters. So far no algorithm has integrated these all requirements that proposed algorithm does just as a single algorithm. It also automatically predicts the value of k and initial centroids to have minimum user intervention with the algorithm. The performance of this algorithm is compared with standard clustering algorithms on various small to large data sets. The comparison is with respect to a number of records and dimensions of data sets using clustering accuracy, running time, and various clusters validly measures. From the obtained results, it is proved that performance of proposed algorithm is increased with respect to efficiency and quality than the existing algorithms.

Cite This Paper

Anand Khandare, Abrar Alvi, "Efficient Clustering Algorithm with Enhanced Cohesive Quality Clusters", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.7, pp.48-57, 2018. DOI:10.5815/ijisa.2018.07.05

Reference

[1]Tapas Kanungo David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu, "An Efficient k-Means Clustering Analysis and Implementation", IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, July 2002.
[2]Wei Zhong, Gulsah Altun, Robert Harrison, Phang C. Tai, and Yi Pan,” Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property”, IEEE Transactions On Nanobioscience, Vol. 4, No. 3, September 2005.
[3]Dimitrios Charalampidis, “A Modified K-Means Algorithm for Circular Invariant Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 12, December 2005.
[4]Sriparna Saha, and Sanghamitra Bandyopadhyay,"Performance Evaluation of Some Symmetry-Based Cluster Validity Indexes", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 39, No. 4, 2009.
[5]Juntao Wang and Xiaolong Su, "An improved K-Means clustering algorithm”, IEEE 3rd International Conference on Communication Software and Networks, 2011.
[6]Jiye Liang, Liang Bai, Chuangyin Dang, and Fuyuan Cao,” The K-Means-Type Algorithms Versus Imbalanced Data Distributions”, IEEE Transactions On Fuzzy Systems, Vol. 20, No. 4, August 2012.
[7]Mohamed Abubaker and Wesam Ashour,” Efficient Data Clustering Algorithms: Improvements over Kmeans ”, I.J. Intelligent Systems and Applications, 37-49,2013.
[8]Rui Máximo Esteves, Thomas Hacker, and Chunming Rong,”Competitive K-means”, IEEE International Conference on Cloud Computing Technology and Science,2013.
[9]Rui Xu, and Donald Wunsch II,” Survey of Clustering Algorithms ”, IEEE Transactions on Neural Networks, Vol. 16, No. 3, May 2005.
[10]Ferdinando Di Martino, Vincenzo Loia, and Salvatore Sessa, “Extended Fuzzy C-Means Clustering in GIS Environment for Hot Spot Events”, B. Apolloni et al. (Eds.): KES 2007/WIRN 2007, Part I, LNAI 4692, pp. 101–107, Springer-Verlag Berlin Heidelberg 2007.
[11]Bikram Keshari Mishra,Nihar Ranjan Nayak,Amiya Rath,Sagarika Swain ,” Far Efficient K-Means Clustering Algorithm ” , ICACCI-12 ,August 2012.
[12]Xiaohui Huang, Yunming Ye, and Haijun Zhang,” Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, No. 8, August 2014.
[13]Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, And Abdelaziz Bouras,” A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis ”, IEEE Transactions On Emerging Topics In Computing,2014.
[14]MichaƂ Kozielski and Aleksandra Gruca,” Soft approach to identification of cohesive clusters in two gene representations”, Procedia Computer Science 35, 281 – 289, 2014.
[15]G.Sandhiya and Mrs.ramyajothikumar,“Enhanced K-Means with Dijkstra Algorithm for”, 10th International Conference on Intelligent Systems and Control, 2016.
[16]Jeyhun Karimov and Murat Ozbayoglu,” Clustering Quality Improvement of k-means using a Hybrid Evolutionary Model”, Procedia Computer Science 61, 38 – 45, 2015.
[17]Vikas Verma, Shweta Bhardwaj, and Harjit Singh,” A Hybrid K-Mean Clustering Algorithm for Prediction Analysis”, Indian Journal of Science and Technology, Vol 9(28), DOI: 10.17485/ijst/2016/v9i28/98392, July 2016.
[18]Shashank Sharma, Megha Goel, and Prabhjot Kaur,” Performance Comparison of Various Robust Data Clustering Algorithms”, I.J. Intelligent Systems and Applications, 63-71, MECS, 2013.
[19]Mr. Anand Khandare, Dr. A.S. Alvi, “Efficient Clustering Algorithm with Improved Clusters Quality”, IOSR Journal of Computer Engineering, vol-18, pp. 15-19, Nov.-Dec. 2016.
[20]Rui Xu, Jie Xu, and Donald C. Wunsch, II,” A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering “, IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics, Vol. 42, No. 4, August 2012.
[21]Mr. Anand Khandare, Dr. A.S. Alvi, “Clustering Algorithms: Experiment and Improvements”, IRSCNS, Springer, LNNS, July 2016.
[22]Anand Khandare and A.S. Alvi, “Survey of Improved k-means Clustering Algorithms: Improvements, Shortcomings, and Scope for Further Enhancement and Scalability”, Information Systems Design and Intelligent Applications, Advances in Intelligent Systems and Computing 434, DOI 10.1007/978-81-322-2752-6_48, 2016.
[23]https://www.rstudio.com.
[24]https://cran.r-project.org.
[25]https://www.kaggle.com/datasets.
[26]http://scikitlearlearn.org/stable/modules/generated/sklearn.cluster.KMeans.html.
[27]Preeti Jain, Dr. Bala Buksh,"Accelerated K-means Clustering Algorithm ", I.J. Information Technology and Computer Science, 39-46 DOI: 10.5815/ijitcs.2016.10.05 ,MECS, 2016.
[28]Aleta C. Fabregas, Bobby D. Gerardo, Bartolome T. Tanguilig III,"Enhanced Initial Centroids for K-means Algorithm " ISSN: 2074-9007 (Print), ISSN: 2074-9015 (Online) DOI: 10.5815/ijitcs, MECS, 2017.
[29]P.SIVAKUMAR, Dr.M.RAJARAM,"Efficient and Fast Initialization Algorithm for K-means Clustering", I.J. Information Technology and Computer Science, 1, 19-24 DOI: 10.5815/ijitcs.2012.01.03,MECS, 2012.
[30]Yugal Kumar, G. Sahoo, “A Review on Gravitational Search Algorithm and its Applications to Data Clustering & Classification", I.J. Intelligent Systems and Applications, 2014, 06, 79-93 DOI: 10.5815/ijisa.2014.06.09, MECS, 2014.
[31]Handayani Tjandrasa, Isye Arieshanti, Radityo Anggoro, "Classification of Non-Proliferative Diabetic Retinopathy Based on Segmented Exudates using K-Means Clustering", I.J. Image, Graphics and Signal Processing, 1, 1-8 DOI: 10.5815/ijigsp.2015.01.01, MECS, 2015.
[32]Muhammad Ali Masood, M. N. A. Khan, "Clustering Techniques in Bioinformatics ", I.J. Modern Education and Computer Science, 2015, 1, 38-46 DOI: 10.5815/ijmecs.2015.01.06, MECS, 2015.
[33]Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu"An Efficient k-Means Clustering Algorithm: Analysis and Implementation", IEEE Transactions on Pattern Analysis and Machine Intelligence archive Volume 24 Issue 7, Page 881-892,2002.
[34]Purnawansyah, Haviluddin, "K-Means clustering implementation in network traffic activities", International Conference on Computational Intelligence and Cybernetics,10.1109/CyberneticsCom.2016.7892566, 2016.
[35]Chang Lu, Yueting Shi, Yueyang Chen, " Data Mining Applied to Oil Well Using K-Means and DBSCAN", 7th International Conference on Cloud Computing and Big Data, 10.1109/CCBD.2016.018,2016.
[36]Yohwan Noh, Donghyun Koo, Yong-Min Kang, DongGyu Park, DoHoon Lee, "Automatic crack detection on concrete images using segmentation via fuzzy C-means clustering", International Conference on Applied System DOI:10.1109/ICASI.2017.7988574,2017.
[37]Kai-Shiang Chang, Yi-Wen Peng, Wei-Mei Chen, "Density-based clustering algorithm for GPGPU computing", International Conference on Applied System Innovation, DOI: 10.1109/ICASI.2017.7988545, 2017.
[38]Dilmurat Zakirov, Aleksey Bondarev, Nodar Momtselidze, "A comparison of data mining techniques in evaluating retail credit scoring using R programming", Twelve International Conference on Electronics Computer and Computation, DOI:10.1109/ICECCO.2015.7416867, 2015.
[39]Tran Duc Chung, Rosdiazli Ibrahim, Sabo Miya Hassan, "Fast approach for automatic data retrieval using R programming language",2nd IEEE International Symposium on Robotics and Manufacturing Automation, DOI: 10.1109/ROMA.2016.7847824, 2016.
[40]M. Arif Wani, Romana Riyaz, "A new cluster validity index using maximum cluster spread based compactness measure", International Journal of Intelligent Computing and Cybernetics, ISSN: 1756-378X, 2016.
[41]Deepali Aneja, Tarun Kumar Rawat, "Fuzzy Clustering Algorithms for Effective Medical Image Segmentation", I.J. Intelligent Systems and Applications, 11, 55-61 DOI: 10.5815/ijisa.2013.11.06 , MECS ,2013.
[42]J Anuradha, B K Tripathy, "Hierarchical Clustering Algorithm based on Attribute Dependency for Attention Deficit Hyperactive Disorder", I.J. Intelligent Systems and Applications, 06, 37-45, DOI: 10.5815/ijisa.2014.06.04, MECS, 2014.
[43]SudiptoGuha,Rajeev Rastogi,KyuseokShim, "Cure: an efficient clustering algorithm for large databases",DOI: 10.1016/S0306-4379(01)00008-4 ,Elsevier, 2001.
[44]Tian Zhang, Raghu Ramakrishnan, Miron Livny," BIRCH: an efficient data clustering method for very large databases", SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data Pages 103-114,1996.
[45]https://www.python.org.
[46]https://www.programiz.com/python-programming.
[47]Brian S. Everitt, Sabine Landau, Morven Leese, "Cluster Analysis ", 4th Wiley Publishing ISBN:0340761199 9780340761199,2009.
[48]Fareeha Zafar, Zaigham Mahmood, "Comparative analysis of clustering algorithms comprising GESC, UDCA, and k-Mean methods for wireless sensor networks ", URSI Radio Science Bulletin Volume:84, Issue:4, 10.23919/URSIRSB.2011.7909974, 2011.
[49]Xiaohui Huang, Yunming Ye, Haijun Zhang, "Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation ", IEEE Transactions on Neural Networks and Learning Systems Volume: 25, Issue: 8, 10.1109/TNNLS.2013.2293795, 2014.
[50]Jianyun Lu, Qingsheng Zhu,"An Effective Algorithm Based on Density Clustering Framework", IEEE Wireless Communications Letters, Volume: 5, Issue: 6, DOI: 10.1109/LWC.2016.2603154,2016.
[51]Yuan Zhou, Ning Wang; Wei Xiang,"Clustering Hierarchy Protocol in Wireless Sensor Networks Using an Improved PSO Algorithm", IEEE Access, Volume: 5, DOI: 10.1109/ACCESS.2016.2633826,2016.
[52]Neha Bharill, Aruna Tiwari, Aayushi Malviya,"Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark" IEEE Transactions on Big Data, Volume: 2, Issue: 4, Pages: 339 - 352, DOI: 10.1109/TBDATA.2016.2622288, 2016.