Optimal Clustering Algorithms for Data Mining

Full Text (PDF, 170KB), PP.22-27

Views: 0 Downloads: 0

Author(s)

Omar Y. Alshamesti 1,* Ismail M. Romi 2

1. Department of Computer science, Palestine Technical Colleges, Al-Aroub, Hebron, Palestine

2. College of Administrative sciences and Informatics, Palestine Polytechnic University, Hebron, Palestine

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2013.02.04

Received: 15 Apr. 2013 / Revised: 2 May 2013 / Accepted: 20 Jun. 2013 / Published: 8 Aug. 2013

Index Terms

Data Mining, Clustering, Self-Organizing Map, Support Vector Clustering, Computational Complexity

Abstract

Data mining is the process used to analyze a large quantity of heterogeneous data to extract useful information. Meanwhile, many data mining techniques are used; clustering classified to be an important technique, used to divide data into several groups called, clusters. Those clusters contain, objects that are homogeneous in one cluster, and different from other clusters. As a reason of the dependence of many applications on clustering techniques, while there is no combined method for clustering; this study compares k-mean, Fuzzy c-mean, self-organizing map (SOM), and support vector clustering (SVC); to show how those algorithms solve clustering problems, and then; compares the new methods of clustering (SVC) with the traditional clustering methods (K-mean, fuzzy c-mean and SOM). The main findings show that SVC is better than the k-mean, fuzzy c-mean and SOM, because; it doesn’t depend on either number or shape of clusters, and it dealing with outlier and overlapping. Finally; this paper show that; the enhancement using the gradient decent, and the proximity graph, improves the support vector clustering time by decreasing its computational complexity to O(nlogn) instead of O(n2d), where; the practical total time for improvement support vector clustering (iSVC) labeling method is better than the other methods that improve SVC.

Cite This Paper

Omar Y. Alshamesti, Ismail M. Romi, "Optimal Clustering Algorithms for Data Mining", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.5, no.2, pp.22-27, 2013. DOI:10.5815/ijieeb.2013.02.04

Reference

[1]Han. J., Kamber. M., (2006). Data mining: Concepts and technique, 2nd ed. Morgan Kaufmann: USA.

[2]Pawan, K. Pankaj, V and Rakesh, S., (2010). Comparative analysis of fuzzy c mean andhard c mean algorithm. international journal of information technology and knowledge management, 2(1): pp. 1-5.

[3]Jain, A.K., Murty, M.N., Flynn, P.J., (2000). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3): pp. 264-323.

[4]Anil, K.J., (2010). Data Clustering: 50 Years Beyond K-Means. Journal of Pattern Recognition Letters, 31(8).

[5]Estivill, V.C and Lee, I. A., (2000). Hierarchical clustering based on spatial proximity using Delaunay diagram. In Proc. of the 9th Int. Symposium on Spatial Data Handling, pp 26 – 41.

[6]Karthikeyani, V., Suguna, J., (2009). K-Means Clustering using Max-min Distance Measure. The 28th North American Fuzzy Information Processing Society Annual Conference (NAFIPS2009).

[7]Fedja, H and Tharam,S.D., (2005). CSOM: Self-Organizing Map for Continuous Data. 3rd IEEE international Conference on Industrial Informatics (INDIN).

[8]Asa, B.H., David, H., Hava T. S and Vladimir, V., (2001). Support Vector Clustering. Journal of Machine Learning Research, 2(1): pp. 125-137.

[9]Rajashree, D., Debahuti, M., Amiya, K.R., Milu.A., (2010). A hybridized K-means clustering approach for high dimensional dataset. International Journal of Engineering, Science and Technology. 2(2): pp. 59-66.

[10]Vladimir, Jianhua, Y., E.C, and Stephan, K.C., (2003). Support Vector clustering Through Proximity Graph Modeling. IEEE, 2: pp. 898 – 903.

[11]LEE, D., LEE, J., (2007). DOMAIN DESCRIBED SUPPORT VECTOR CLASSIFIER FOR MULTI-CLASSIFICATION PROBLEMS. PATTERN RECOGNITION, 40(10: PP. 41-51.

[12]Ling, P, Zhou, C.G and Zhou, X., (2010). Improved support vector clustering. Engineering Applications of Artificial Intelligence. Elsevier, 23(4): pp. 552-559.

[13]Hsiang, C.L., Jenz, M.Y.,Wen,C.L., Tung, S.Liu., (2009). Fuzzy c-means algorithm based on PSO and mahalanobis disyance. International Journal of Innovative Computing, Information and Control, 5(12): pp. 5033–5040.

[14]Abbas, O., (2008). Comparison between Data clustering algorithms. The international journal of information technology, 5(3): pp.320-325.

[15]Hanafi, G., Abdulkader, S., (2006). Comparison of clustering algorithm for analog modulation classification. Expert Systems with Applications: An International Journal, 30(4): pp. 642-649.

[16]Satchidanandan, D. Chinmay, M. Ashish, G and Rajib, M., (2006). A comparative study of clustering algorithms. Information technology journal, 5(3): pp 551-559.

[17]Velmurugan, T and Santhanam, T., (2010). Computational complexity between k-mean and k-medoids clustering algorithm for normal and uniform distribution of data points. Journal of computer science, 6(3): pp. 363-368.

[18]kumar, P., Siri, K.W., (2010). Comparative analysis of k-mean based algorithms. International journal of computer science and network security, 10(4): pp.314-318.

[19]Scholkopf. B., Smola. A.J., (2002). Learning with Kernels. London. MIT press.

[20]Hartigan, J. A. and M. A. Wong (1979). Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, 28.1: pp. 100-108.

[21]Hartigan, J. A. (1975). Clustering Algorithms (Probability & Mathematical Statistics). John Wiley & Sons Inc.

[22]Hammouda, K.M, (2008). A comparative study of data clustering techniques. International journal of computer science and information technology, 5(2), pp. 220-231.

[23]Sairam, Manikandan and Sowndary., (2011). Performance analysis of clustering algorithms in detecting outliers. International journal of computer science and information technology, 2(1): pp.486-488.

[24]Asa, B.H, David, H and Vapnik. V., (2001). A support vector method for hierarchical clustering. MIT press.