Using an MST based Value for ε in DBSCAN Algorithm for Obtaining Better Result

Full Text (PDF, 361KB), PP.55-60

Views: 0 Downloads: 0

Author(s)

Nirmalya Chowdhury 1,* Preetha Bhattacharjee 1

1. Department of Computer Science and Engineering, Jadavpur University, Kolkata - 700032, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2014.06.08

Received: 1 Aug. 2013 / Revised: 9 Dec. 2013 / Accepted: 23 Feb. 2014 / Published: 8 May 2014

Index Terms

Clustering, Natural Grouping, DBSCAN Algorithm

Abstract

In this paper, an objective function based on minimal spanning tree (MST) of data points is proposed for clustering and a density-based clustering technique has been used in an attempt to optimize the specified objective function in order to detect the “natural grouping” present in a given data set. A threshold based on MST of data points of each cluster thus found is used to remove noise (if any present in the data) from the final clustering.
A comparison of the experimental results obtained by DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm and the proposed algorithm has also been incorporated. It is observed that our proposed algorithm performs better than DBSCAN algorithm. Several experiments on synthetic data set in R^2 and R^3 show the utility of the proposed method. The proposed method has also found to provide good results for two real life data sets considered for experimentation. Note thatK-means is one of the most popular methods adopted to solve the clustering problem. This algorithm uses an objective function that is based on minimization of squared error criteria. Note that it may not always provide the “natural grouping” though it is useful in many applications.

Cite This Paper

Nirmalya Chowdhury, Preetha Bhattacharjee, "Using an MST based Value for ε in DBSCAN Algorithm for Obtaining Better Result", International Journal of Information Technology and Computer Science(IJITCS), vol.6, no.6, pp.55-60, 2014. DOI:10.5815/ijitcs.2014.06.08

Reference

[1]Anderberg, M.R., Cluster Analysis for Application, Academic Press, Inc., NewYork, 1973.

[2]Devijver, P.A. and J. Kittler.,Pattern Recognition: A statistical Approach, Prentice-Hall International. HemelHemstead, Hertfordshire, UK, 1982.

[3]Jain, A.K. and R.C. Dubes. Algorithms for Clustering Data.Prentice-Hall, Englewood, Cliffs NJ 1988.

[4]Spath, H., Cluster Analysis Algorithms. Ellis Horwood, Chichester.UK, 1980.

[5]Tou, T.J. and C.R. Gonzalez.,Pattern Recognition Principles. Addison-Wesley Reading, MA, 1974.

[6]Ester, M., Kriegel, H.P., Xu, X., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd ACM SIGKDD, Portand, Oregon, pp. 226–231.

[7]M. Ester, H. –P. Kriegel, J. Sander, M. Wimmer, and X. Xu, “Incremental clustering for mining in a Data Warehousing environment,” Proc. 24th Int. Conf. on Very Large Databases (VLDB’98), New York, USA, 1998, pp. 323-333.

[8]A. Hinneburg and D. Keim, “An efficient approach to clustering in large multimedia data sets with noise,” in 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 58–65.

[9]M. Ankerst, M.M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: Ordering points to identify the clustering structure, in: Proceedings of ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999, pp. 49–60.

[10]Zhou, A., Zhou, S., Cao, J., Fan, Y., Hu, Y., 2000. Approaches for scaling DBSCAN algorithm to large spatial databases. J. Comput. Sci. Technol. 15 (6), 509–526.

[11]Borah, B., Bhattacharyya, D., 2004. An improved sampling-based DBSCAN for large spatial databases. In: Proc. of Internat. Conf. on Intelligent Sensing and Information Processing, 2004, pp. 92–96.

[12]E. Januzaj, H.-P. Kriegel, M. Pfeifle, Scalable density-based distributed clustering, in: Proceedings of PKDD, Pisa, Italy, Lectures Notes in Computer Science, 3202, Springer, 2004, pp. 231–244.

[13]El-Sonbaty, Y., Ismail, M., Farouk, M., 2004. An efficient density based clustering algorithm for large databases. In: 16th IEEE Internat. Conf. on Tools with Artificial Intelligence, 2004, ICTAI, pp. 673–677.

[14]P-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison-Wesley, 2005.

[15]Tsai, C., Liu, C., 2006. KIDBSCAN: A new efficient data clustering algorithm. In: Artificial Intelligence and Soft Computing – ICAISC 2006. Lecture Notes in Computer Science, 2150. Springer, Berlin/Heidelberg, pp. 02–711.

[16]Birant, Derya, and Alp Kut. "ST-DBSCAN: An algorithm for clustering spatial–temporal data." Data & Knowledge Engineering 60.1 (2007): 208-221.

[17]Viswanath, P., Babu, V.S., 2009. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets. Pattern Recognition Lett. 30 (16), 1477–1488.

[18]Folino, G., Forestiero, A., Spezzano, G., 2009. An adaptive flocking algorithm for performing approximate clustering. Inform. Sci. 179 (18), 3059–3078.

[19]Li Jian; Yu Wei; Yan Bao-Ping; , "Memory effect in DBSCAN algorithm," Computer Science & Education, 2009.ICCSE '09. 4th International Conference on , vol., no., pp.31-36, 25-28 July 2009.

[20]Peter, J. Hencil, and A. Antonysamy. "An Optimised Density Based Clustering Algorithm." International Journal of Computer Applications 6.9 (2010): 20-25.

[21]Chen, Zhenguo, and Yong Fei Li. "Anomaly Detection Based on Enhanced DBScan Algorithm." Procedia Engineering 15 (2011): 178-182.

[22]Mimaroglu, Selim, and Emin Aksehirli. "Improving DBSCAN’s execution time by using a pruning technique on bit vectors." Pattern Recognition Letters 32.13 (2011): 1572-1580.

[23]Jiang, Hua, et al. "A new hybrid method based on partitioning-based DBSCAN and ant clustering." Expert Systems with Applications 38.8 (2011): 9373-9381.

[24]Darong, Huang, and Wang Peng. "Grid-based DBSCAN Algorithm with Referential Parameters." Physics Procedia 24 (2012): 1166-1170.

[25]Smiti, Abir, and Zied Elouedi. "Dbscan-gm: An improved clustering method based on gaussian means and dbscan techniques." Intelligent Engineering Systems (INES), 2012 IEEE 16th International Conference on. IEEE, 2012.

[26]Edla, Damodar Reddy, and Prasanta K. Jana. "A Prototype-Based Modified DBSCAN for Gene Clustering." Procedia Technology 6 (2012): 485-492.

[27]Tran, Thanh N., Klaudia Drab, and Michal Daszykowski. "Revised DBSCAN algorithm to cluster data with dense adjacent clusters." Chemometrics and Intelligent Laboratory Systems (2012).

[28]IRS Data Users Handbook, Document No. IRS/NRSA/NDC/HB-01/86, NRSA, Hyderabad, India, Sept. 1986.

[29]Johnson,R. A., and Wichern, D. W. 1982. Applied Multivariate StatisticalAnalysis. Prentice-Hall, Englewood Cliffs, NJ, 1982.

[30]Yang, Christopher C., and T. Dorbin Ng. "Analyzing and visualizing web opinion development and social interactions with density-based clustering." Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 41.6 (2011): 1144-1155.