Comparative Weka Analysis of Clustering Algorithm‘s

Full Text (PDF, 707KB), PP.56-67

Views: 0 Downloads: 0

Author(s)

Harjot Kaur 1,* Er. Prince Verma 1

1. CT Group of Institution/CSE, Jalandhar, 144041, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2017.08.07

Received: 29 Mar. 2017 / Revised: 10 Apr. 2017 / Accepted: 19 Apr. 2017 / Published: 8 Aug. 2017

Index Terms

Data Mining, Clustering, Partitioning Algorithm, Hierarchical Clustering Algorithm, CURE, CHAMELEON, BIRCH, Density Based Clustering Algorithm, DENCLUE, OPTICS, WEKA Tool

Abstract

Data mining is a procedure of mining or obtaining a pertinent volume of data or information making the data available for understanding and processing. Data analysis is a common method across various areas like computer science, biology, telecommunication industry and retail industry. Data mining encompass various algorithms viz. association rule mining, classification algorithm, clustering algorithms. This survey concentrates on clustering algorithms and their comparison using WEKA tool. Clustering is the splitting of a large dataset into clusters or groups following two criteria ie. High intra-class similarity and low inter-class similarity. Every cluster or group must contain one data item and every data item must be in one cluster. Clustering is an unsupervised technique that is fairly applicable on large datasets with a large number of attributes. It is a data modelling technique that gives a concise view of data. This survey tends to explain all the clustering algorithms and their variant analysis using WEKA tool on various datasets.

Cite This Paper

Harjot Kaur, Prince Verma, "Comparative Weka Analysis of Clustering Algorithm's", International Journal of Information Technology and Computer Science(IJITCS), Vol.9, No.8, pp.56-67, 2017. DOI:10.5815/ijitcs.2017.08.07

Reference

[1]P. Verma and D. Kumar, “Association Rule Mining Algorithm’s Variant Analysis,” Int. J. of Comput. App. (IJCA), vol. 78, no. 14, pp. 26–34, Sept. 2013. “DOI: 10.5120/13593-1366”

[2]R. Xu. and D. Wunsch, “Survey of Clustering Algorithms,” IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645–678,  May 2005. “DOI: 10.1109/TNN.2005.845141”

[3]M. S. Chen, J. Han, and P. S. Yu, “Data mining: An Overview from a Database Perspective,” IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 866–883, Dec.1996. “DOI: 10.1.1.106.8976”

[4]A. Silberschatz, M. Stonebraker and J.D. Ullman, “Database Research: Achievements and Opportunities into the 21st Century,” Report NSF workshop Future of Database Systems Research, May 1995. “DOI:10.1.1.47.8847”

[5]A. Jain and R. Dubes, “Clustering Methodologies in Exploratory Data Analysis,” ELSEVIER- Advances in Computer, vol. 19, pp, 113-228, Feb 2008. “DOI: https://doi.org/10.1016/S0065-2458(08)60034-0”

[6]A. Baraldi and E. Alpaydin, “Constructive Feedforward ART clustering networks—Part I,” IEEE Trans. Neural Network., vol. 13, no. 3, pp. 645–661, May 2002. “DOI:10.1109/TNN.2002.1000130”

[7]A. Baraldi and E. Alpaydin, “Constructive Feedforward ART clustering networks—Part II,” IEEE Trans. Neural Network., vol. 13, no. 3, pp. 662–677, May 2002. “DOI:10.1109/TNN.2002.1000131”

[8]V. Cherkassky and F. Mulier, Learning From Data: Concepts, Theory and Methods, 2ND ED., New York: Wiley, 1998.

[9]R. Duda, P. Hart, and D. Stork, Pattern Classification, 2ND ED., New York: Wiley, 2001.

[10]A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a Review,” ACM Comput. Surv., vol. 31, no. 3, pp. 264–323, Sept. 1999. “DOI:10.1145/331499.331504”

[11]M. S. B. PhridviRaj and C. V. GuruRao, “Data Mining – Past, Present and Future – A Typical Survey on Data Streams,” in 7th Int. Conf. Interdis. in Eng. (INTER-ENG 2013)- ELSEVIER., vol. 12, pp. 255–263, Dec 2014. “DOI: 10.1016/j.protcy.2013.12.483”

[12]U. Fayyad, G. P. Shapiro, and P. Smyth, “From Data Mining to Knowledge Discovery in Databases,” AI Mag., vol. 17, no. 3, pp. 1- 34, 1996.

[13]P. Berkhin, “A Survey of Clustering Data Mining,” Springer - Group. Multidimens. Data, pp. 25–71, 2006. “ DOI: 10.1007/3-540-28349-8_2”

[14]B. Everitt, S. Landau, and M. Leese, Cluster Analysis, 5TH ED.,  London:Arnold, 2001.

[15]A. Jain, A. Rajavat, and R. Bhartiya, “Design, analysis and implementation of modified K-mean algorithm for large data-set to increase scalability and efficiency,” in - 4th Int. Conf. Comput. Intell. Commun. Networks (CICN) pp. 627–631, Dec. 2012. “ DOI :10.1109/CICN.2012.95”

[16]P. Chauhan and M. Shukla, “A Review on Outlier Detection Techniques on Data Stream by Using Different Approaches of K-Means Algorithm,” in- International Conference on Advances in Computer Engineering and Applications (ICACEA), pp. 580–585, July 2015. “DOI :10.1109/ICACEA.2015.7164758”.

[17]A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Comput. Surv., vol. 31, no. 3, pp. 264–323,  Sept 1999.” DOI:10.1145/331499.331504”

[18]S. Firdaus and A. Uddin, “A Survey on Clustering Algorithms and Complexity Analysis,” Int. J. Comput. Sci. Issues (IJCSI), vol. 12, no. 2, pp.  62–85, March 2015.

[19]D. Sisodia, “Clustering Techniques : A Brief Survey of Different Clustering Algorithms,” Int. J. latest trends Eng. Technology (IJLTET), vol. 1, no. 3, pp. 82–87,  Sept. 2012. 

[20]K. N. Ahmed and T. A. Razak, “An Overview of Various Improvements of DBSCAN Algorithm in Clustering Spatial Databases,” Int. J. Adv. Res. Comput. Commun. Eng. (IJARCCE), vol. 5, no. 2, pp. 360–363, 2016. “DOI: 10.17148/IJARCCE.2016.5277” 

[21]A. Joshi, R. Kaur “A Review : Comparative Study of Various Clustering Techniques in Data Mining,” Int. J. Adv. Res. Comput. Sci. Soft. Eng. (IJARCSSE), vol. 3, no. 3, pp. 55–57, March 2013. 

[22]A. Naik, “Density Based Clustering Algorithm,” 06-Dec-2010.[Online].Available:https://sites.google.com/site/dataclusteringalgorithms/density-based-clustering-algorithm. [Accessed: 15-Jan-2017].

[23]M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software,” ACM SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10-18, June 2009. “DOI: 10.1145/1656274.1656278”

[24]R. Ng and  J. Han ,” Efficient and Effective Clustering Method for Spatial Data Mining,” in - 20th VLDB Int. Conf. on Very Large Data Bases , pp. 144-155, Sept. 1994.

[25]Cios, K. J., W. Pedrycz, et al.,  Data Mining Methods for Knowledge Discovery, vol. 458, Springer Science & Business Media, 2012. “ DOI: 10.1007/978-1-4615-5589-6”

[26]S. Dixit, and N. Gwal, "An Implementation of Data Pre-Processing for Small Dataset," Int. J. of Comp. App. (IJCA), vol. 10, no. 6, pp. 28-3, Oct. 2014. “DOI: 10.5120/18080-8707”

[27]S. Singhal and M. Jena, “A Study on WEKA Tool for Data Preprocessing , Classification and Clustering,” Int. J. Innov. Technol. Explor. Eng., vol. 2, no. 6, pp. 250–253, May 2013. “DOI:10.1.1.687.799”

[28]O. Y. Alshamesti, and I. M. Romi, “Optimal Clustering Algorithms for Data Mining” Int. Journal of Info. Eng. and Electron. Bus. (IJIEEB), vol.5, no.2 ,pp. 22-27, Aug 2013. “DOI: 10.5815/ijieeb.2013.02.04 “

[29]N. Lekhi, M. Mahajan “Outlier Reduction using Hybrid Approach in Data Mining,” Int.  J. Mod. Educ. Comput. Sci., vol. 7, no. 5, pp. 43–49, May 2015. “DOI: 10.5815/ijmecs.2015.05.06” 

[30]C. L. P. Chen and C.Y. Zhang, "Data- Intensive Applications, Challenges, Techniques and Technologies: A survey on Big Data." ELSEVIER- Inform. Sci., pp. 314-347, Aug. 2014. “DOI: 10.1016/j.ins.2014.01.015”

[31]E. Rahm, and H. H. Do ,"Data cleaning: Problems and current approaches." IEEE- Data Eng. Bull., vol. 23, no. 4, pp. 3-13, Dec 2000. “DOI:10.1.1.101.1530”