Classification via Clustering for Anonym zed Data

Full Text (PDF, 679KB), PP.52-58

Views: 0 Downloads: 0

Author(s)

Sridhar Mandapati 1,* Raveendra Babu Bhogapathi 2 M.V.P.C.Sekhara Rao 3

1. Dept. of Computer Applications, R.V.R & J.C College of Engineering, Guntur, India

2. Dept. of Computer Science & Engineering, VNR VJIET, Hyderabad, India

3. Dept. of Computer Science & Engineering, R.V.R & J.C College of Engineering, Guntur, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2014.03.07

Received: 11 May 2013 / Revised: 19 Sep. 2013 / Accepted: 16 Nov. 2013 / Published: 8 Feb. 2014

Index Terms

Privacy Preserving Data Mining (PPDM), Classification, Clustering, K-means, EM, Density based

Abstract

Due to the exponential growth of hardware technology particularly in the field of electronic data storage media and processing such data, has raised serious issues related in order to protect the individual privacy like ethical, philosophical and legal. Data mining techniques are employed to ensure the privacy. Privacy Preserving Data Mining (PPDM) techniques aim at protecting the sensitive data and mining results. In this study, the different Clustering techniques via classification with and without anonym zed data using mining tool WEKA is presented. The aim of this study is to investigate the performance of different clustering methods for the diabetic data set and to compare the efficiency of privacy preserving mining. The accuracy of classification via clustering is evaluated using K-means, Expectation-Maximization (EM) and Density based clustering methods.

Cite This Paper

Sridhar Mandapati, Raveendra Babu Bhogapathi, M.V.P.C.Sekhara Rao, "Classification via Clustering for Anonymization Data", International Journal of Computer Network and Information Security(IJCNIS), vol.6, no.3, pp.52-58, 2014. DOI:10.5815/ijcnis.2014.03.07

Reference

[1]M. Last et al., “Improving accuracy of classification models induced from anonymized datasets”, Inform.Sci. (2013), http://dx.doi.org/10.1016/j.ins.2013.07.034.
[2]A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam,“l-Diversity: Privacy Beyond k-Anonymity”, Proc. Int’l Conf. Data Engineering (ICDE), pp. 24, 2006.
[3]K. LeFevre, D. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full-Domain k-Anonymity”, In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), pp. 49–60, 2005.
[4]K. LeFevre, D. DeWitt, and R. Ramakrishnan, “Mondrian Multidimensional k-Anonymity”, In Proc. Int’l Conf. Data Engineering (ICDE), pp. 25, 2006.
[5]L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression”, In International journal on uncertainty, Fuzziness and knowledge based systems, 10(5), pp.571 – 588, 2002.
[6]L. Sweeney, “k-anonymity: a model for protecting privacy”, In International journal on uncertainty, Fuzziness and knowledge based systems, 10(5), pp. 557 – 570, 2002.
[7]N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: Privacy Beyond k-Anonymity and -Diversity”, In Proc. Int’l Conf. Data Engineering (ICDE), pp. 106115, 2007.
[8]P. Samarati, “Protecting respondents’ identities in microdata release”, In IEEE Transactions on Knowledge and Data Engineering, pp.13(6):1010–1027, 2001.
[9]X. Xiao and Y. Tao, “Anatomy: simple and effective privacy preservation”, In VLDB ’06: Proceedings of the 32nd international conference on Very large data bases, pages 139–150. VLDB Endowment, 2006.
[10]Tiancheng Li, Ninghui Li, Jian Zhang, and Ian Molloy,"Slicing: A New Approach for Privacy Preserving Data Publishing", In IEEE Transactions on Knowledge and Data Engineering, VOL. 24, NO. 3 MARCH 2012.
[11]Yufei Tao, Hekang Chen, Xiaokui Xiao, "ANGEL: Enhancing the Utility of Generalization for Privacy Preserving Publication", In IEEE transactions on knowledge and data engineering, Vol. 21, no. 7, July 2009.
[12]Ali Inan, Selim V. Kaya, Yu cel Saygın , Erkay Savas, Ayc¸a A. Hintoglu, Albert Levi, "Privacy preserving clustering on horizontally partitioned data", Elsevier Data & Knowledge Engineering, pp. 646–666, 2007.
[13]G. Aggarwal, T. Feder, K. Kenthapadi, A. Zhu, R. Panigrahy, and D. Thomas, “Achieving anonymity via clustering in a metric space”, In PODS ’06: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2006.
[14]Jiuyong Li, Raymond Chi-Wing Wong, Ada Wai-Chee Fu, Jian Pei, "Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures", Data Warehousing and Knowledge Discovery Lecture Notes in Computer Science Volume 4081, pp 405-416, Springer, 2006.
[15]J.-W. Byun, A. Kamra, E. Bertino, and N. Li., "Efficient k-anonymization using clustering techniques", In Internal Conference on Database Systems for Advanced Applications (DASFAA), 2007.
[16]Aris Gkoulalas-Divanis, Grigorios Loukides,"PCTA: Privacy-constrained Clustering-based Transaction Data Anonymization", 4th International Workshop on Privacy and Anonymity in the Information Society, pp. 5, ACM, 2011.
[17]M.E. Nergiz and C. Clifton, “Thoughts on k-Anonymization”, In Proc. 22nd Int’l Conf. Data Eng. Workshops (ICDEW ’06), pp. 96, 2006.
[18]M.I. Lopez, J.M Luna, C. Romero, S. Ventura, "Classification via clustering for predicting final marks based on student participation in forums", In EDM, pp. 148-151. www.educationaldatamining.org, (2012).
[19]M. Panda and M. Patra, “A novel classification via clustering method for anomaly based network intrusion detection system”, In International Journal of Recent Trends in Engineering, pp.1–6, 2009.
[20]A.K.Jain and and R. C. Dubes, “Algorithms for Clustering Data”, Prentice Hall, Englewood Cliffs, USA, 1988.
[21]J. Erman, M. Arlitt and A. Mahanti, “Traffic classification using clustering algorithms”, In SIGCOMM-06 workshops, sept.11- 15, Pisa, Italy.pp.281-286.ACM Press, 2006.
[22]The Expectation Maximization Algorithm. http://www.cs.unr.edu/~bebis/mathmethods/EM/lecture.pdf. Sept. 25, 2004.