A Performance of the Scattered Averaging Technique based on the Dataset for the Cluster Center Initialization

Full Text (PDF, 542KB), PP.40-50

Views: 0 Downloads: 0

Author(s)

Arief Bramanto Wicaksono Putra 1,* Achmad Fanany Onnilita Gaffar 1 Bedi Suprapty 1 Mulyanto 1

1. Department of Information Technology, Politeknik Negeri Samarinda, East Kalimantan, Indonesia

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2021.02.05

Received: 6 Dec. 2020 / Revised: 12 Jan. 2021 / Accepted: 18 Feb. 2021 / Published: 8 Apr. 2021

Index Terms

Global Optimum Solution, K-Mean, FCM, SOM, Scattered averaging technique

Abstract

Clustering is one of the primary functions in data mining explorations and statistical data analysis which widely used in various fields. There are two types of the clustering algorithms which try to optimize certain objective function, i.e. the hierarchical and partitional clustering. This study focuses on the achievement of the best cluster results of the hard and soft clustering (K-Mean, FCM, and SOM clustering). The validation index called GOS (Global Optimum Solution) used to evaluate the cluster results. GOS index defined as a ratio of the distance variance within a cluster to the distance variance between clusters. The aim of this study is to produce the best GOS index through the use of the proposed method called the scattered averaging technique based on datasets for the cluster center initialization. The cluster results of each algorithm are also compared to determine the best GOS index between them. By using the annual rainfall data as the dataset, the results of this study showed that the proposed method significantly improved K-Mean clustering ability to achieve the global optimum solution with a performance ratio of 69.05% of the total performance of the three algorithms. The next best clustering algorithm is SOM clustering (24.65%) followed by FCM clustering (6.30%). In addition, the results of this study also showed that the three clustering algorithms achieve their best global optimum solution at the number of even clusters.

Cite This Paper

Arief Bramanto Wicaksono Putra, Achmad Fanany Onnilita Gaffar, Bedi Suprapty, Mulyanto, " A Performance of the Scattered Averaging Technique based on the Dataset for the Cluster Center Initialization", International Journal of Modern Education and Computer Science(IJMECS), Vol.13, No.2, pp. 40-50, 2021.DOI: 10.5815/ijmecs.2021.02.05

Reference

[1] A. Sharma, Y. López, and T. Tsunoda, "Divisive hierarchical maximum likelihood clustering," BMC Bioinformatics, vol. 18, 2017.

[2] A. Salighehdar, Y. Liu, D. Bozdog, and a. I. Florescu, "Cluster Analysis of Liquidity Measures in a Stock Market using High Frequency Data," Journal of Management Science and Business Intelligence, pp. 1-8, 2017.

[3] S. Kumar and D. Toshniwal, "Analysis of hourly road accident counts using hierarchical clustering and cophenetic correlation coefficient (CPCC)," Journal of Big Data, vol. 3, 2016.

[4] P. Novianti, D. Setyorini, and U. Rafflesia, "K-Means cluster analysis in earthquake epicenter clustering," International Journal of Advances in Intelligent Informatics, vol. 3, pp. 81-89, 2017.

[5] V. Gurusamy, S. Kannan, and J. R. Prabhu, "Mining the Attitude of Social Network Users using K-means Clustering," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 7, pp. 226-230, 2017.

[6] M. Oskarsson, "Temporally Consistent Tone Mapping of Images and Video Using Optimal K-means Clustering," Journal of Mathematical Imaging and Vision, vol. 57, pp. 225-238, 2016.

[7] J. Wu, "The Uniform Effect of K-means Clustering," Springer Theses, pp. 17-35, 2012.

[8] X. Zhou, J. Gu, S. Shen, H. Ma, F. Miao, H. Zhang, and H. Gong, "An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density," ISPRS International Journal of Geo-Information, vol. 6, p. 392, 2017.

[9] A. Khan, D. Katanic, and J. Thakar, "Meta-analysis of cell-specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes," BMC Bioinformatics, vol. 18, 2017.

[10] H.-Y. Li, W.-J. Hwang, and C.-Y. Chang, "Efficient Fuzzy C-Means Architecture for Image Segmentation," Sensors, vol. 11, pp. 6697-6718, 2011.

[11] J. Yang, Y.-s. Ke, and M.-z. Wang, "An adaptive clustering segmentation algorithm based on FCM," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 25, pp. 4533-4544, 2017.

[12] A. Gupta and D. Kumar, "Fuzzy clustering-based feature extraction method for mental task classification," Brain Informatics, vol. 4, pp. 135-145, 2016.

[13] C. S. Chin, X. Ji, W. L. Woo, T. J. Kwee, and W. Yang, "Modified multiple generalized regression neural network models using fuzzy C-means with principal component analysis for noise prediction of offshore platform," Neural Computing and Applications, 2017.

[14] A. Saha and S. Das, "Feature-weighted clustering with inner product induced norm based dissimilarity measures: an optimization perspective," Machine Learning, vol. 106, pp. 951-992, 2017.

[15] G. Malsiner-Walli, S. Frühwirth-Schnatter, and B. Grün, "Model-based clustering based on sparse finite Gaussian mixtures," Statistics and Computing, vol. 26, pp. 303-324, 2014.

[16] P. D. McNicholas, "Model-based clustering," Journal of Classification, vol. 33, pp. 331-373, 2016.

[17] S. Zhang, Z. Li, K. Beland, and G. Lu, "Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses," BMC Bioinformatics, vol. 17, 2016.

[18] S. Akogul and M. Erisoglu, "An Approach for Determining the Number of Clusters in a Model-Based Cluster Analysis," Entropy, vol. 19, p. 452, 2017.

[19] F. Huang, Q. Zhu, J. Zhou, J. Tao, X. Zhou, D. Jin, X. Tan, and L. Wang, "Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform," Remote Sensing, vol. 9, p. 1301, 2017.

[20] J. Huang, Q. Zhu, L. Yang, D. Cheng, and Q. Wu, "QCC: a novel clustering algorithm based on Quasi-Cluster Centers," Machine Learning, vol. 106, pp. 337-357, 2017.

[21] M. Bertamini, M. Zito, N. E. Scott-Samuel, and J. Hulleman, "Spatial clustering and its effect on perceived clustering, numerosity, and dispersion," Attention, Perception, & Psychophysics, vol. 78, pp. 1460-1471, 2016.

[22] A. Akande, A. C. Costa, J. Mateu, and R. Henriques, "Geospatial Analysis of Extreme Weather Events in Nigeria (1985–2015) Using Self-Organizing Maps," Advances in Meteorology, vol. 2017, pp. 1-11, 2017.

[23] C. Nguyen, M. Starek, P. Tissot, and J. Gibeaut, "Unsupervised Clustering Method for Complexity Reduction of Terrestrial Lidar Data in Marshes," Remote Sensing, vol. 10, p. 133, 2018.

[24] D. Olszewski, "Asymmetric K -Means Clustering of the Asymmetric Self-Organizing Map," Neural Processing Letters, vol. 43, pp. 231-253, 2015.

[25] U. F. Alias, N. B. Ahmad, and S. Hasan, "Mining of E-learning Behavior using SOM Clustering," 6th ICT International Student Project Conference (ICT-ISPC), 2017.

[26] S. Honghong and H. Lili, "A Binary Approximate Naive Bayesian Classification Algorithm Based on SOM Neural Network Clustering " 2017 International Conference on Computer Systems, Electronics and Control (ICCSEC), 2017.

[27] D. K. Roy and H. M. Pandey, "A New Clustering Method Using an Augmentation to the Self Organizing Maps," 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2018.

[28] L.-F. Zhang, C.-F. Li, H.-R. Wang, and M.-Y. Shi, "Research on Face Image Clustering based on Integrating SOM and Spectral Clustering Algorithm," Proceeding of the 2018 International Conference on Machine Learning and Cybernetics, Chengdu, China, 2018.

[29] U. Maulik and S. Bandyopadhyay, "Performance Evaluation of Some Clustering Algorithms and Validity Indices," IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 24, pp. 1650-1654, 2002.

[30] O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. Pérez, and I. Perona, "An extensive comparative study of cluster validity indices," Pattern Recognition, vol. 46, pp. 243-256, 2013.

[31] S. Jauhiainen and T. Karkkainen, "A Simple Cluster Validation Index with Maximal Coverage," in European Symposium on Artificial Neural Networks (ESANN 2017), Computational Intelligence and Machine Learning, Bruges (Belgium), 2017, pp. 293-298.

[32] M. F. Zarandi, M. Faraj, and M. Karbasian, "An Exponential cluster validity index for Fuzzy Clustering with Crisp and Fuzzy Data," Transaction E: Industrial Engineering, vol. 17, pp. 95-110, 2010.

[33] J. Hämäläinen, S. Jauhiainen, and T. Kärkkäinen, "Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering," Algorithms, vol. 10, p. 105, 2017.

[34] Q. Zhao and P. Fränti, "WB-index: A sum-of-squares based index for cluster validity," Data & Knowledge Engineering, vol. 92, pp. 77-89, 2014.

[35] R. Kama, K. Chinegaram, R. B. Tummala, and R. R. Ganta, "Segmentation of Soft Tissues and Tumors from Biomedical Images using Optimized K-Means Clustering via Level Set formulation," International Journal of Intelligent Systems and Applications(IJISA), vol. 11, pp. 18-28, 2019.

[36] D. A. A. Gnana Singh and E. J. Leavline, "Dimensionality Reduction for Classification and Clustering," International Journal of Intelligent Systems and Applications(IJISA), vol. 11, pp. 61-68, 2019.

[37] A. Khandare and A. S. Alvi, "Optimized Time Efficient Data Cluster Validity Measures," International Journal of Information Technology and Computer Science(IJITCS), vol. 10, pp. 46-54, 2018.

[38] A. Khandare and A. Alvi, "Efficient Clustering Algorithm with Enhanced Cohesive Quality Clusters," I.J. Intelligent Systems and Applications (IJISA), vol. 10, pp. 48-57, 2018.

[39] S. H. Jung, H. Lee, and J. H. Huh, "A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost," Entropy (Basel), vol. 22, Aug 17 2020.

[40] B. K. Mishra, A. K. Rath, S. K. Nanda, and R. R. Baidyanath, "Efficient Intelligent Framework for Selection of Initial Cluster Centers," International Journal of Intelligent Systems and Applications(IJISA), vol. 11, pp. 44-55, 2019.

[41] A. Kumar and S. Kumar, "Density Based Initialization Method for K-Means Clustering Algorithm," International Journal of Intelligent Systems and Applications(IJISA), vol. 9, pp. 40-48, 2017.

[42] R. T. Aldahdooh and W. Ashour, "DIMK-means “Distance-based Initialization Method for K-means Clustering Algorithm”," International Journal of Intelligent Systems and Applications(IJISA), vol. 5, pp. 41-51, 2013.

[43] A. F. O. Gaffar, Ibayasid, R. Malani, A. B. W. Putra, and A. Wajiansyah, "Optimization of the Spatial Interpolation Based on the Sliding Neighborhood Operation Method by using K-Mean Clustering for Predicting the Topographic Shape of the Ground Surface " International Journal of Advances in Soft Computing & Its Applications, vol. 11, 2019.

[44] M. E. Hiswati, A. F. O. Gaffar, Rihartanto, and Haviluddin, "Minimum wage prediction based on K-Mean clustering using neural based optimized Minkowski Distance Weighting," International Journal of Engineering & Technology (IJET), vol. 7, pp. 90-93, 2018.

[45] Mislan, Haviluddin, R. Alfred, and A. F. O. Gaffar, "A Performance Neighborhood Distance (ndist) Between K -Means and SOM Algorithms," Advanced Science Letters, vol. 24, pp. 1224-1229, 2018.

[46] Purnawansyah, Haviluddin, A. F. O. Gafar, and I. Tahyudin, "Comparison Between K-Means and Fuzzy C-Means Clustering in Network Traffic Activities," Cham, 2018, pp. 300-310.