GNVDF: A GPU-accelerated Novel Algorithm for Finding Frequent Patterns Using Vertical Data Format Approach and Jagged Array

Full Text (PDF, 411KB), PP.28-41

Views: 0 Downloads: 0

Author(s)

P. Sumathi 1,* S.Murugan 1

1. Nehru Memorial College (Affiliated to Bharathidasan University), Puthanampatti, Tiruchirappalli-Dt, Tamil Nadu, India - 621 007

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2021.04.03

Received: 1 Jun. 2021 / Revised: 28 Jun. 2021 / Accepted: 24 Jul. 2021 / Published: 8 Aug. 2021

Index Terms

Frequent Patterns, GNVDF, Graphical Processing Unit, Novel Pattern Formation, Vertical Data Format, and Jagged Array.

Abstract

In the modern digital world, online shopping becomes essential in human lives. Online shopping stores like Amazon show up the "Frequently Bought Together" for their customers in their portal to increase sales. Discovering frequent patterns is a fundamental task in Data Mining that find the frequently bought items together. Many transactional data were collected every day, and finding frequent itemsets from the massive datasets using the classical algorithms requires more processing time and I/O cost. A GPU accelerated Novel algorithm for finding the frequent patterns using Vertical Data Format (GNVDF) has been introduced in this research article. It uses a novel pattern formation. In this, the candidate i-itemsets is divided into two buckets viz., Bucket-1 and Bucket-2. Bucket-1 contain all the possible items to form candidate-(i+1) itemsets. Bucket-2 has the items that cannot include in the candidate-(i+1) itemsets. It compactly employs a jagged array to minimize the memory requirement and remove common transactions among the frequent 1-itemsets. It also utilizes a vertical representation of data for efficiently extracting the frequent itemsets by scanning the database only once. Further, it is GPU-accelerated for speeding up the execution of the algorithm. The proposed algorithm was implemented with and without GPU usage and compared. The comparison result revealed that GNVDF with GPU acceleration is faster by 90 to 135 times than the method without GPU.

Cite This Paper

P. Sumathi, S.Murugan, "GNVDF: A GPU-accelerated Novel Algorithm for Finding Frequent Patterns Using Vertical Data Format Approach and Jagged Array ", International Journal of Modern Education and Computer Science(IJMECS), Vol.13, No.4, pp. 28-41, 2021. DOI:10.5815/ijmecs.2021.04.03

Reference

[1] H. Hamidi and A. Daraei, "Analysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases," International Journal of Engineering (IJE), TRANSACTIONS A: Basics, vol. 29, no. 7, pp. 921-930, 2016.

[2] J. Han, J. Pei and M. Kamber, Data mining: concepts and techniques, Morgan Kaufmann Publishers, 2011.

[3] H. Lisnawati and A. Sinaga, "Data Mining with Associated Methods to Predict Consumer Purchasing Patterns", International Journal of Modern Education and Computer Science(IJMECS), vol. 12, no. 5, pp. 16-28, 2020.

[4] A. Sinha, B. Sahoo, S.S.Rautaray and M. Pandey, "An Optimized Model for Breast Cancer Prediction Using Frequent Itemsets Mining", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.11, no.5, pp. 11-18, 2019.

[5] L. Vu and G. Alaghband, "A self-adaptive method for frequent pattern mining using a CPU-GPU hybrid model," in Proceedings of the Symposium on High Performance Computing, 2015.

[6] D. Albert, K. William, Fayaz and D. Veerabhadra Babu, "Exploiting Parallel Processing Power of GPU for High Speed Frequent Pattern Mining", International Journal of Computer Engineering and Applications, vol. 7, no. 2, pp. 71 - 81, 2014.

[7] W. Fang, M. Lu, X. Xiao, B. He and Q. Luo, "Frequent itemset mining on graphics processors," in Proceedings of International Conference on Network and Parallel Computing, 2009.

[8] S. M. Fakhrahmad and G. Dastghaibyfard, "An Efficient Frequent Pattern Mining Method and its Parallelization in Transactional Databases," Journal of Information Science and Engineering, vol. 27, no. 2, pp. 511-525, 2011.

[9] J. Zhou, K. M. Yu and B. C. Wu, "Parallel frequent patterns mining algorithm on GPU", in Proceedings of International Conference on Systems, 2010.

[10] D. William Albert, K. Fayaz and D. Veerabhadra Babu, "HSApriori: high speed association rule mining using apriori based algorithm for GPU," International Journal of Multidisciplinary and Current Research, vol. 2, pp. 759-763, 2014.

[11] M. Tiwary, A. K. Sahoo and R. Misra, "Efficient implementation of apriori algorithm on HDFS using GPU," in Proceedings of International Conference on High Performance Computing and Applications, 2014.

[12] J. Li, F. Sun, X. Hu and W. Wei, "A multi-GPU implementation of apriori algorithm for mining association rules in medical data," ICIC Express Letters, vol. 9, no. 5, pp. 1303-1310, 2015

[13] L. Vu and G. Alaghband, "A self-adaptive method for frequent pattern mining using a CPU-GPU hybrid model," in Proceedings of the Symposium on High Performance Computing, 2015.

[14] Y. Li, J. Xu, Y. H. Yuan and L. Chen, "A new closed frequent itemset mining algorithm based on GPU and improved vertical structure," Concurrency and Computation Practice and Experience, vol. 29, no. 06, pp. 1-12, 2016.

[15] K.W. Chon, S. H. Hwang and M. S. Kim, "GMiner: A fast gpu-based frequent itemset mining method for large-scale data," Information Sciences, vol. 439-440, pp.19-38, 2018.

[16] Y. Wang, T. Xu, S. Xue and Y. Shen, "D2P-Apriori: A deep parallel frequent itemset mining algorithm with dynamic queue," in Proceedings of 10th International Conference on Advanced Computational Intelligence, 2018.

[17] Y. Djenouri, D. Djenouri, A. Belhadi and A. Cano, "Exploiting GPU and cluster parallelism in single scan frequent itemset mining," Information Sciences, vol. 496, pp. 363-377, 2019.

[18] P. Sumathi, and S. Murugan, A Memory Efficient Implementation of Frequent Itemset Mining with Vertical Data Format Approach, International Journal of Computer Sciences and Engineering. 6(2018) 152-157.

[19] W. Gan, J. C. Lin, P. Fournier-Viger, H. C. Chao and P. S. Yu, "Survey of parallel sequential pattern mining," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 13, no. 3, pp. 1-34, 2019.

[20] Y. M. Guo and Z. J. Wang, "A vertical format algorithm for mining frequent item sets," in Proceedings of 2nd International Conference on Advanced Computer Control, 2010.

[21] E. Hashemzadeh and H. Hamidi, "Using a Data Mining Tool and FP-growth Algorithm Application for Extraction of the Rules in Two Different Dataset," International Journal of Engineering (IJE), TRANSACTIONS C: Aspects, vol. 29, no. 6, pp. 788-796, 2016.

[22] M. Samoliya and A. Tiwari, "On the Use of Rough Set Theory for Mining Periodic Frequent Patterns", International Journal of Information Technology and Computer Science, vol.8, no.7, pp.53-60, 2016.

[23] P. Prithiviraj and R. Porkodi, "A comparative analysis of association rule mining algorithms in data mining: a study," American Journal of Computer Science and Engineering Survey, vol. 3, pp. 98-119, 2015.

[24] F. Wang, J. Dong and B. Yuan, "Graph-based substructure pattern mining using cuda dynamic parallelism," in Proceedings of International conference on intelligent data engineering and automated learning, 2013.

[25] B. De Alwis, S. Malinga, K. Pradeeban, D. Weerasiri and S. Perera, "Horizontal format data mining with extended bitmaps," in International Conference of Soft Computing and Pattern Recognition,2011.

[26] P. Suresh, K. N. Nithya and K. Murugan, "Improved Generation of Frequent Item Sets using Apriori Algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no. 10, pp. 25-27, 2015.

[27] A.Subashini and M. Karthikeyan, "Itemset Mining using Horizontal and Vertical Data Format," International Journal for Research in Engineering Application & Management, vol. 05, no.03, pp. 534-539, 2019.