Improved Apriori Algorithm for Mining Association Rules

Full Text (PDF, 559KB), PP.15-23

Views: 0 Downloads: 0

Author(s)

Darshan M. Tank 1,*

1. Department of Information Technology, L.E.College, Morbi-363642, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2014.07.03

Received: 29 Oct. 2013 / Revised: 11 Mar. 2014 / Accepted: 3 Apr. 2014 / Published: 8 Jun. 2014

Index Terms

Association rule mining, frequent itemset generation, support and confidence

Abstract

Association rules are the main technique for data mining. Apriori algorithm is a classical algorithm of association rule mining. Lots of algorithms for mining association rules and their mutations are proposed on basis of Apriori algorithm, but traditional algorithms are not efficient. For the two bottlenecks of frequent itemsets mining: the large multitude of candidate 2- itemsets, the poor efficiency of counting their support. Proposed algorithm reduces one redundant pruning operations of C_2. If the number of frequent 1-itemsets is n, then the number of connected candidate 2-itemsets is C-n, while pruning operations C_n. The proposed algorithm decreases pruning operations of candidate 2-itemsets, thereby saving time and increasing efficiency. For the bottleneck: poor efficiency of counting support, proposed algorithm optimizes subset operation, through the transaction tag to speed up support calculations.
Algorithm Apriori is one of the oldest and most versatile algorithms of Frequent Pattern Mining (FPM). Its advantages and its moderate traverse of the search space pay off when mining very large databases. Proposed algorithm improves Apriori algorithm by the way of a decrease of pruning operations, which generates the candidate 2-itemsets by the apriori-gen operation. Besides, it adopts the tag-counting method to calculate support quickly. So the bottleneck is overcome.

Cite This Paper

Darshan M. Tank, "Improved Apriori Algorithm for Mining Association Rules", International Journal of Information Technology and Computer Science(IJITCS), vol.6, no.7, pp.15-23, 2014. DOI:10.5815/ijitcs.2014.07.03

Reference

[1]Mining Association Rules between Sets of Items in Large Databases by R. C. Agarwal, Imielienski T., and Swami A.

[2]Efficiently Mining Long Patterns from Databases by R. Bayardo. In Proc. of 2006 ACM-SIGMOD Intl. Conf. on Management of Data

[3]Fast Discovery of Association Rules. By Agrawal, A.,Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.

[4]A New Improvement on Apriori Algorithm by Lei Ji, Baowen Zhang, Jianhua Li. – June 2008

[5]The analysis and improvement of Apriori algorithm by HAN Feng, ZHANG Shu-mao, DU Ying-shuang

[6]The Research of Improved Apriori Algorithm for Mining Association Rules by Fangyi Wang, Erkang Wang, Bowen Chen

[7]The Optimization and Improvement of the Apriori Algorithm by Yiwu Xie, Yutong Li, Chunli Wang, Mingyu Lu (International Symposium on Intelligent Information Technology Application Workshops)

[8]An Efficient Frequent Patterns Mining Algorithm based on Apriori Algorithm and the FP-tree Structure by Bo Wu, Defu Zhang, Qihua Lan, Jiemin Zheng (Third 2008 International Conference on Convergence and Hybrid Information Technology) 

[9]UCI Repository of Machine Learning Databases by Blake, C.L. and Merz, C.J. (Dept. of Information and Computer Science, University of California at Irvine www.ics.uci.edu/mlearn/MLRepository.html)

[10]Synthetic Data Generation Code for Associations and Sequential Patterns. http://www.almaden.ibm.com/software/quest/Resources/index.shtml Intelligent Information Systems, IBM Almaden Research Center

[11]R.Agrawal and R.Srikant. Mining sequential patterns. In P.S.Yu and A.L.P.Chen, editors, Proc.11th Int.Conf. Data Engineering, ICDE, pages3–14.IEEE Press,6–10.1995.

[12]H.Mannila,H.Toivonen,and A.I.Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages210–215.AAAI Press, 1995.

[13]K.Hatonen, M.Klemettinen, H. Mannila, P.Ronkainen, and H.Toivonen. Knowledge discovery from telecommunication network alarm databases. In S.Y.W.Su, editor, Proceedings of the twelfth International Conference on Data Engineering, February 26–March 1,1996,New Orleans, Louisiana ,pages115–122,1109 Spring Street, Suite 300,Silver Spring, MD20910, USA,1996.IEEE Computer Society Press.

[14]A. Inokuchi, T.Washio and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages13–23. Springer-Verlag, 2000.

[15]M.Kuramochi and G.Karypis. Frequent sub graph discovery. In Proceedings of the first IEEE International Conference on Data Mining, pages313–320, 2001.

[16]Park J S, Chen Ming-Syan, Yu Philip S. Using a hash-based method with transaction trimming for mining association rules[J].IEEE Transactions on Knowledge and Data Engineering,1997,9(5):487-499.

[17]Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. Proceedings of the 21st International Conference on Very large Database, 1995.

[18]Brin S, Motwani R, Ullman J D,et al. Dynamic Itemset counting and implication rules for market basket data. ACM SIGMOD International Conference on the Management of Data, 1997.

[19]Luo Ke,Wu Jie. Apriori algorithm based on the improved. Computer Engineering and application, 2001, 20:20-22.

[20]Li Xiaohong,Shang Jin. An improvement of the new Apriori algorithm [J].Computer science, 2007,34 (4) :196-198.

[21]Gu Qing-feng, SONG Shun-Lin. The improvement of Apriori algorithm and in SQL applications. Computer engineering and design 2007,28(13):3060-3233.

[22]Luo Jiawei,Wang Yan. Apriori algorithm with a fully connected the improvement [J]. Computer applications, 2006. 26 (5):1174-1177.