Pooja Choudhary; Kanwal Garg

A Novel Privacy Preservation Scheme by Matrix Factorized Deep Autoencoder

PDF (1473KB), PP.84-98

Views: 0 Downloads: 0

Author(s)

Pooja Choudhary ^1,* Kanwal Garg ¹

1. Department of Computer Science and Application, Kurukshetra University, Kurukshetra, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2024.03.07

Received: 5 May 2023 / Revised: 22 Aug. 2023 / Accepted: 10 Oct. 2023 / Published: 8 Jun. 2024

Index Terms

Privacy Preservation, Matrix Factorization, Autoencoder, Deep Learning

Abstract

Data transport entails substantial security to avoid unauthorized snooping as data mining yields important and quite often sensitive information that must be and can be secured using one of the myriad Data Privacy Preservation methods. This study aspires to provide new knowledge to the study of protecting personal information. The key contributions of the work are an imputation method for filling in missing data before learning item profiles and the optimization of the Deep Auto-encoded NMF with a customizable learning rate. We used Bayesian inference to assess imputation for data with 13%, 26%, and 52% missing at random. By correcting any inherent biases, the results of decomposition problems may be enhanced. As the statistical analysis tool, MAPE is used. The proposed approach is evaluated on the Wiki dataset and the traffic dataset, against state-of-the-art techniques including BATF, BGCP, BCPF, and modified PARAFAC, all of which use a Bayesian Gaussian tensor factorization. Using this approach, the MAPE index is decreased for data which avails privacy safeguards than its corresponding original forms.

Cite This Paper

Pooja Choudhary, Kanwal Garg, "A Novel Privacy Preservation Scheme by Matrix Factorized Deep Autoencoder", International Journal of Computer Network and Information Security(IJCNIS), Vol.16, No.3, pp.84-98, 2024. DOI:10.5815/ijcnis.2024.03.07

Reference

[1]Frigerio, L., Oliveira, A.S.D., Gomez, L. and Duverger, P., 2019, June. Differentially private generative adversarial networks for time series, continuous, and discrete open data. In IFIP International Conference on ICT Systems Security and Privacy Protection (pp. 151-164). Springer, Cham.
[2]Alsulaimawi, Z., 2020, September. A non-negative matrix factorization framework for privacy-preserving and federated learning. In 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) (pp. 1-6). IEEE.
[3]Yang, M., Zhu, T., Xiang, Y. and Zhou, W., 2018. Density-based location preservation for mobile crowdsensing with differential privacy. Ieee Access, 6, pp.14779-14789.
[4]Wang, J. and Zhang, J., 2007, May. Addressing accuracy issues in privacy preserving data mining through matrix factorization. In 2007 IEEE Intelligence and Security Informatics (pp. 217-220). IEEE.
[5]Wang, D., Wu, Y., Zhao, W. and Fu, L., 2019. A Model of Privacy Preserving in Dynamic Set-valued Data Re-publication. Journal of Internet Technology, 20(1), pp.147-156.
[6]Peng Liu, YuanXin Xu, Quan Jiang, Yuwei Tang, Yameng Guo, Li-e Wang,Xianxian Li, Local Differential Privacy for Social Network Publishing,Neurocomputing(2019), doi:https://doi.org/10.1016/j.neucom.2018.11.104
[7]Xu, C., Ren, J., Zhang, D., Zhang, Y., Qin, Z. and Ren, K., 2019. GANobfuscator: Mitigating information leakage under GAN via differential privacy. IEEE Transactions on Information Forensics and Security, 14(9), pp.2358-2371.
[8]Acs, G., Melis, L., Castelluccia, C. and De Cristofaro, E., 2018. Differentially private mixture of generative neural networks. IEEE Transactions on Knowledge and Data Engineering, 31(6), pp.1109-1121.
[9]Mohapatra, D. and Patra, M.R., 2019. Anonymization of attributed social graph using anatomy based clustering. Multimedia Tools and Applications, 78(18), pp.25455-25486.
[10]Li, T., Wang, Y., Ren, Y., Ren, Y., Qian, Q. and Gong, X., 2022. Nonnegative matrix factorization‐based privacy‐preserving collaborative filtering on cloud computing. Transactions on Emerging Telecommunications Technologies, 33(6), p.e3914.
[11]Malekzadeh, M., Clegg, R.G. and Haddadi, H., 2017. Replacement autoencoder: A privacy-preserving algorithm for sensory data analysis. arXiv preprint arXiv:1710.06564.
[12]Jordon, J., Yoon, J. and Van Der Schaar, M., 2018, September. PATE-GAN: Generating synthetic data with differential privacy guarantees. In International conference on learning representations.
[13]Andrew, J., Karthikeyan, J. and Jebastin, J., 2019, March. Privacy preserving big data publication on cloud using Mondrian anonymization techniques and deep neural networks. In 2019 5th international conference on advanced computing & communication systems (ICACCS) (pp. 722-727). IEEE.
[14]Li, Q., Wu, Z., Wen, Z. and He, B., 2020, April. Privacy-preserving gradient boosting decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 784-791).
[15]Li, S., Shen, H., Sang, Y. and Tian, H., 2020. An efficient method for privacy-preserving trajectory data publishing based on data partitioning. The Journal of Supercomputing, 76(7), pp.5276-5300.
[16]Kanwal, T., Shaukat, S.A.A., Anjum, A., Choo, K.K.R., Khan, A., Ahmad, N., Ahmad, M. and Khan, S.U., 2019. Privacy-preserving model and generalization correlation attacks for 1: M data with multiple sensitive attributes. Information Sciences, 488, pp.238-256.
[17]Liu, C., Chen, S., Zhou, S., Guan, J. and Ma, Y., 2021. A general framework for privacy-preserving of data publication based on randomized response techniques. Information Systems, 96, p.101648.
[18]Wang, Z., Myles, P. and Tucker, A., 2019, June. Generating and evaluating synthetic UK primary care data: preserving data utility & patient privacy. In 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS) (pp. 126-131). IEEE.
[19]Zhang, J., Zhao, B., Song, G., Ni, L. and Yu, J., 2019. Maximum delay anonymous clustering feature tree based privacy-preserving data publishing in social networks. Procedia Computer Science, 147, pp.643-646.
[20]Bousquet, O., Livni, R. and Moran, S., 2019. Passing tests without memorizing: Two models for fooling discriminators.
[21]Ye, F., Chen, C. and Zheng, Z., 2018, October. Deep autoencoder-like nonnegative matrix factorization for community detection. In Proceedings of the 27th ACM international conference on information and knowledge management (pp. 1393-1402).
[22]Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I. and Talwar, K., 2016. Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755.
[23]Cao, Y., Xiong, L., Yoshikawa, M., Xiao, Y. and Zhang, S., 2018, August. ConTPL: controlling temporal privacy leakage in differentially private continuous data release. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases (Vol. 11, No. 12, p. 2090). NIH Public Access.
[24]Papernot, N., Song, S., Mironov, I., Raghunathan, A., Talwar, K. and Erlingsson, Ú., 2018. Scalable private learning with pate. arXiv preprint arXiv:1802.08908.
[25]Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Lee, R., Bhavnani, S.P., Byrd, J.B. and Greene, C.S., 2019. Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes, 12(7), p.e005122.
[26]Torkzadehmahani, R., Kairouz, P. and Paten, B., 2019. Dp-cgan: Differentially private synthetic data and label generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 0-0).
[27]Chathurdara Sri Nadith Pathirage, Jun Li, Ling Li, Hong Hao, Wanquan Liu, Pinghe Ni, Structural damage identification based on autoencoder neural networks and deep learning, Engineering Structures, Volume 172, 2018, Pages 13-28, ISSN 0141-0296, https://doi.org/10.1016/j.engstruct.2018.05.109.
[28]H. Li, C. He, Y. Zheng, X. Fei, Z. Hu and Y. Tang, "Boosting Nonnegative Matrix Factorization Based Community Detection With Graph Attention Auto-Encoder," in IEEE Transactions on Big Data, vol. 8, no. 4, pp. 968-981, 1 Aug. 2022, doi: 10.1109/TBDATA.2021.3103213.
[29]Lim, K. and Wang, X., 2015, April. Nonnegative matrix factorization based privacy preservation in vehicular communication. In SoutheastCon 2015 (pp. 1-2). IEEE.
[30]Fontenla‐Romero, O., Pérez‐Sánchez, B. and Guijarro‐Berdiñas, B., 2021. DSVD‐autoencoder: a scalable distributed privacy‐preserving method for one‐class classification. International Journal of Intelligent Systems, 36(1), pp.177-199.
[31]Alguliyev, R.M., Aliguliyev, R.M. and Abdullayeva, F.J., 2019. Privacy-preserving deep learning algorithm for big personal data analysis. Journal of Industrial Information Integration, 15, pp.1-14.
[32]Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K. and Zhang, L., 2016, October. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308-318).
[33]Jayapradha, J., & Prakash, M. (2022). f-Slip: an efficient privacy-preserving data publishing framework for 1: M microdata with multiple sensitive attributes. Soft Computing, 26(23), 13019-13036.
[34]Koren, Yehuda, Robert Bell, and Chris Volinsky. “Matrix factorization techniques for recommender systems.” Computer 8 (2009): 30-37.
[35]Minh X. Hoang, Xuan-Hong Dang, Xiang Wu, Zhenyu Yan, Ambuj K. Singh, “GPOP: Scalable Group-level Popularity Prediction for Online Content in Social Networks”, Proceedings of the 26th International Conference on World Wide Web, 2017, pp 725-733.
[36]Chen, Xinyu, Zhaocheng He, and Jiawei Wang. “Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition.” Transportation research part C: emerging technologies 86 (2018): 59-77.
[37]Paatero, Pentti. “Construction and analysis of degenerate PARAFAC models.” Journal of Chemometrics: A Journal of the Chemometrics Society 14, no. 3 (2000): 285-299.
[38]Kingma DP, Adam BJ., “A method for stochastic optimization: arXiv preprint arXiv:1412.6980. 2015, pp 1-15.
[39]Lee, Daniel D.; Sebastian, Seung, H. (1999). "Learning the parts of objects by non-negative matrix factorization" (PDF). Nature. 401 (6755): 788–791. Bibcode:1999Natur. 401..788L. doi:10.1038/44565. PMID 10548103. S2CID 4428232.
[40]Chen, Xinyu, Zhaocheng He, Yixian Chen, Yuhuan Lu, and Jiawei Wang. “Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model.” Transportation Research Part C: Emerging Technologies 104 (2019): 66-77

International Journal of Computer Network and Information Security (IJCNIS)