Optimized Time Efficient Data Cluster Validity Measures

Full Text (PDF, 451KB), PP.46-54

Views: 0 Downloads: 0

Author(s)

Anand Khandare 1,* A. S. Alvi 2

1. Department of CSE, SGB Amravati University Amravati, India

2. Department of IT, PRMIT &R, Badnera, Amravati, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2018.04.05

Received: 12 Nov. 2017 / Revised: 3 Dec. 2017 / Accepted: 7 Dec. 2017 / Published: 8 Apr. 2018

Index Terms

Clustering Algorithm, Cluster, Validity Measure, Runtime, Compactness, Separation

Abstract

The main task of any clustering algorithm is to produce compact and well-separated clusters. Well separated and compact type of clusters cannot be achieved in practice. Different types of clustering validation are used to evaluate the quality of the clusters generated by clustering. These measures are elements in the success of clustering. Different clustering requires different types of validity measures. For example, unsupervised algorithms require different evaluation measures than supervised algorithms. The clustering validity measures are categorized into two categories. These categories include external and internal validation. The main difference between external and internal measures is that external validity uses the external information and internal validity measures use internal information of the datasets. A well-known example of the external validation measure is Entropy. Entropy is used to measure the purity of the clusters using the given class labels. Internal measures validate the quality of the clustering without using any external information. External measures require the accurate value of the number of clusters in advance. Therefore, these measures are used mainly for selecting optimal clustering algorithms which work on a specific type of dataset. Internal validation measures are not only used to select the best clustering algorithm but also used to select the optimal value of the number of clusters. It is difficult for external validity measures to have predefined class labels because these labels are not available often in many of the applications. For these reasons, internal validation measures are the only solution where no external information is available in the applications. 

All these clustering validity measures used currently are time-consuming and especially take additional time for calculations. There are no clustering validity measures which can be used while the clustering process is going on.

This paper has surveyed the existing and improved cluster validity measures. It then proposes time efficient and optimized cluster validity measures. These measures use the concept of cluster representatives and random sampling. The work proposes optimized measures for cluster compactness, separation and cluster validity. These three measures are simple and more time efficient than the existing clusters validity measures and are used to monitor the working of the clustering algorithms on large data while the clustering process is going on.

Cite This Paper

Anand Khandare, A. S. Alvi, "Optimized Time Efficient Data Cluster Validity Measures", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.4, pp.46-54, 2018. DOI:10.5815/ijitcs.2018.04.05

Reference

[1]Rui Xu, Donald Wunsch II, “Survey of Clustering Algorithms,  “ IEEE transactions on neural networks, vol. 16, no. 3, May 2005.

[2]M.H Dunham, “Data mining-Introductory and advanced concepts”, Pearson Education 2006.

[3]Sriparna Saha, Sanghamitra Bandyopadhyay, “Performance Evaluation of Some Symmetry-Based Cluster Validity Indexes “, IEEE Transaction on systems, man, and cybernetics—part c: applications and reviews, vol. 39, no. 4, 2009.

[4]Pawan Lingras, Member, Min Chen, and Duoqian Miao, “Rough Cluster Quality Index Based on Decision Theory”, IEEE Transactions On Knowledge And Data Engineering, vol. 21, no. 7, July 2009.

[5]Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao and Junjie Wu, “Understanding of Internal Clustering Validation Measures “, IEEE International Conference on Data Mining, 2010.

[6]Yanchi Liu, Zhongmou Li, Hui Xiong,  Xuedong Gao, Junjie Wu, and Sen Wu, "Understanding and Enhancement of Internal Clustering Validation Measures ", IEEE Transactions On Cybernetics, vol. 43, no. 3, June 2013.

[7]Susmita Datta and Somnath Datta,"Validation Measures for Clustering Algorithms Incorporating Biological Information”, International Multi-Symposiums on Computer and Computational Sciences, 2006.

[8]Ludmila I. Kunchev and Dmitry P. Vetrov, “Evaluation of Stability of k-means Cluster Ensembles with Respect to Random Initialization”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol. 28, No. 11, November 2006.

[9]Hoel Le Capitaine, CarlFrelicot," A Cluster-Validity Index Combining an Overlap Measure and a Separation Measure Based on Fuzzy-Aggregation Operators", IEEE Transactions on Fuzzy Systems, Volume: 19, Issue: 3, June 2011.

[10]Qinpei Zhao and Pasi Fränti, “Centroid Ratio for a Pairwise Random Swap Clustering Algorithm”, IEEE Transactions on Knowledge And Data Engineering, vol. 26, no. 5, May 2014.

[11]Hongyan Cui, Mingzhi Xie, Yunlong Cai, Xu Huang and Yunjie Liu,” Cluster validity index for adaptive clustering Algorithms”, IET Communication., vol. 8, Iss. 13, 2014.

[12]Xiaohui Huang, Yunming Ye, and Haijun Zhang, "Extensions of k-means-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation", IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 8, August 2014.

[13]ao, and Jian Chen, "k-means-Based Consensus Clustering: A Unified View", IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 1, January 2015.

[14]Christos Boutsidis, Anastasios Zouzias, Michael W. Mahoney, and Petros Drineas, "Randomized Dimensionality Reduction for k-means Clustering", IEEE Transactions On Information Theory, vol. 61, no. 2, February 2015.

[15]Shashank Sharma, Megha Goel, and Prabhjot Kaur,” Performance Comparison of Various Robust Data Clustering Algorithms”, I.J. Intelligent Systems and Applications, 63-71, MECS, 2013.

[16]Rui Xu and Donald Wunsch II, “Survey of Clustering Algorithms”, IEEE Trans. On Neural Networks, Vol. 16, No. 3, May 2005.

[17]Sukhkirandeep Kaur, Roohie Naaz Mir, "Wireless sensor networks (WSN); Quality of service (QoS); Clustering; Routing protocols ", IJCNIS Vol.8, No.6, Jun. 2016, ISSN: 2074-9104.

[18]Adil Fahad, Najlaa Alshatri, Zahir Tari,  Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, And Abdelaziz Bouras, “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis”, IEEE Transactions On Emerging Topics in Computing, October 2014.

[19]Qinpei Zhao and Pasi Fränti, “Centroid Ratio for a Pairwise Random Swap Clustering Algorithm”, IEEE Transactions on Knowledge And Data Engineering, vol. 26, no. 5, May 2014.

[20]Hadi Jaber, Franck Marle, and Marija Jankovic, "Improving Collaborative Decision Making in New Product Development Projects Using Clustering Algorithms", IEEE Transactions On Engineering Management, vol. 62, no. 4, November 2015.

[21]Jing Yang and Jun Wang, “Tag clustering algorithm LMMSK: an improved k-means algorithm based on latent semantic analysis”, Journal of Systems Engineering and Electronics, vol. 28, no. 2, pp. 374-384, April 2017.