Efficient Analysis of Pattern and Association Rule Mining Approaches

Full Text (PDF, 521KB), PP.70-81

Views: 0 Downloads: 0

Author(s)

Thabet Slimani 1,* Amor Lazzez 2

1. College of Computer Science and Information Technology , Taif University , KSA and LARODEC Lab

2. College of Computer Science and Information Technology, Taif University, KSA

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2014.03.09

Received: 12 Apr. 2013 / Revised: 5 Aug. 2013 / Accepted: 23 Sep. 2013 / Published: 8 Feb. 2014

Index Terms

Association Rule, Frequent Itemset, Sequence Mining, Pattern Mining, Data Mining

Abstract

The process of data mining produces various patterns from a given data source. The most recognized data mining tasks are the process of discovering frequent itemsets, frequent sequential patterns, frequent sequential rules and frequent association rules. Numerous efficient algorithms have been proposed to do the above processes. Frequent pattern mining has been a focused topic in data mining research with a good number of references in literature and for that reason an important progress has been made, varying from performant algorithms for frequent itemset mining in transaction databases to complex algorithms, such as sequential pattern mining, structured pattern mining, correlation mining. Association Rule mining (ARM) is one of the utmost current data mining techniques designed to group objects together from large databases aiming to extract the interesting correlation and relation among huge amount of data. In this article, we provide a brief review and analysis of the current status of frequent pattern mining and discuss some promising research directions. Additionally, this paper includes a comparative study between the performance of the described approaches.

Cite This Paper

Thabet Slimani, Amor Lazzez, "Efficient Analysis of Pattern and Association Rule Mining Approaches", International Journal of Information Technology and Computer Science(IJITCS), vol.6, no.3, pp.70-81, 2014. DOI:10.5815/ijitcs.2014.03.09

Reference

[1]Cios K.J., Pedrycz W, Swiniarski RW, & Kurgan LA. Data mining: A knowledge discovery approach. New York, NY: Springer, 2012.

[2]Marek Wo, Krzysztof Ga, Krzysztof Ga. “Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm”, Proc. of the 1st ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'05), 2005,Tallinn, Estonia.

[3]Alva Erwin, Raj P. Gopalan, N.R. Achuthan, “A Bottom-Up Projection Based Algorithm for Mining High Utility Itemsets”, In Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining, 2007, Vol. 84: 3-11.

[4]Liu X., Zhai K., & Pedrycz W. An improved association rules mining method. Expert Systems with Applications, 2012 39(1):1362–1374. doi:10.1016/j. eswa.2011.08.018.

[5]Agrawal, R., Imielinski, R., & Swami, A. Mining associations between sets of items in mas¬sive databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, 1993, Washington, DC: 207-216.

[6]Sarasere, A., Omiecinsky, E. & Navathe, S. “An efficient algorithm for mining association rules in large databases” In Proc. 21St International Conference on Very Large Databases (VLDB) , 1995, Zurich, Switzerland, Technical Report No. GIT-CC-95-04.

[7]Agrawal, R. and Srikant, R. “Mining sequential patterns” In P.S.Yu and A.L.P. Chen, editors, Proc.11the Int. Conf. Data engineering. ICDE, 1995, 3(14), IEEE :6-10.

[8]Jiawei Han., Y.F. “Discovery of multiple-level association rules from large databases” In Proc. of the 21St International Conference on Very Large Databases (VLDB), 1995, Zurich, Switzerland: 420-431.

[9]Lallich S., Vaillant B, & Lenca P. Parameterized Measures for the Evaluation of Association Rules Interestingness. In Proceedings of the 6th International Symposium on Applied Stochastic Models and Data Analysis (ASMDA 2005),2005, Brest, France, May: 220-229.

[10]Brin, S., Motwani, R., Vllman, J.D. & Tsur, S. Dynamic itemset counting and implication rules for market basket data, SIGMOD Record (ACM Special Interest Group on Management of Data), 1997, 26(2), 255.

[11]Loevinger, J.. A Systemic Approach to the Construction and Evaluation of Tests of Ability. Psychological Monographs, 1974, 61(4).

[12]Piatetsky-Shapiro G.. Knowledge Discoveryin Real Databases: A Report on the IJCAI-89 Workshop. AI Magazine, 1991, 11(5): 68–70.

[13]Tan, P.-N., Kumar, V., & Srivastava, J. Se¬lecting the right objective measure for association analysis. Information Systems,2004, 29(4), 293–313. doi:10.1016/S0306-4379(03)00072-3.

[14]Agrawal R. & Srikant R. Fast Algorithms for Mining Association Rules. In Proc. 20th Int. Conf. Very Large Data Bases (VLDB), 1994: 487-499.

[15]Park J.S., Chen M.S. & Yu P.S. An Effective Hash-based Algorithm for Mining Association Rules. In Proc. 1995 ACM SIGMOD International Conference on Management of Data, 1995, 175-186.

[16]Cheung C., Han J., Ng V.T., Fu A.W. & Fu Y. A Fast Distributed Algorithm for Mining Association Rules. In Proc. of 1996 Int'l Conf. on Parallel and Distributed Information Systems (PDIS'96), 1996, Miami Beach, Florida, USA.

[17]Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. In: Proceeding of the 5th international conference on extending database technology (EDBT’96), 1996, Avignon, France: 3–17.

[18]Brin S., Motwani R., Ullman J.D., and Tsur S. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, 1997, 26(2): 255–264. 

[19]Lin D. & Kedem Z. M. Pincer Search : A New Algorithm for Discovering the Maximum Frequent Set. In Proc. Int. Conf. on Extending Database Technology,1998.

[20]Hidber C. “Online association rule mining”. In Proc. of the 1999 ACM SIGMOD International Conference on Management of Data, 1999, 28(2): 145–156.

[21]Zaki M. J. and Hsiao C.-J. “CHARM: An efficient algorithm for closed association rule mining”. Computer Science Dept., Rensselaer Polytechnic Institute, Technical Report, 1999: 99-10.

[22]Agrawal R.C., Aggarwal C.C. & Prasad V.V.V. Depth First Generation of Long Patterns. In Proc. of the 6th Int. Conf. on Knowledge Discovery and Data Mining, 2000: 108-118.

[23]Han J., Pei J. & Yin Y. Mining Frequent Patterns without Candidate Generation. In Proc. 2000 ACM SIGMOD Intl. Conference on Management of Data, 2000.

[24]Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12, 2000:372–390.

[25]Mohammed J. Zaki. SPADE: An efficient algorithm for mining frequent sequences, Machine Learning, 2001: 31—60.

[26]Jay Ayres, Johannes Gehrke, Tomi Yiu and Jason Flannick. Sequential Pattern Mining using A Bitmap Representation. ACM Press, 2002:429—435.

[27]Zaki M.J., Gouda K. “Fast Vertical Mining Using Diffsets”, Proc. Ninth ACM SIGKDD Int‟l Conf. Knowledge Discovery and Data Mining, 2003: 326-335.

[28]Hua-Fu Li, Suh-Yin Lee, and Man-Kwan Shan. An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams. The Proceedings of First International Workshop on Knowledge Discovery in Data Streams, 2004. 

[29]Chuan Wang, Christos Tjortjis. PRICES: An efficient algorithm for mining association rules, in Lecture Notes Computer Science vol. 2004, 3177: 352-358, ISSN: 0302-9743.

[30]Pei J, Han J, Mortazavi-AslB,Wang J, PintoH, ChenQ,DayalU, HsuM-C. Mining sequential patterns by pattern-growth: the prefixspan approach. IEEETransKnowl Data Eng 16, 2004:1424–1440.

[31]Yun SK and Nathan Ro. Finding sporadic rules using apriori-inverse. InProceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining (PAKDD'05), Tu Bao Ho, David Cheung, and Huan Liu (Eds.). Springer-Verlag, Berlin, Heidelberg, 2005: 97-106.

[32]GASMI Ghada, Ben Yahia S., Mephu Nguifo Engelbert, Slimani Y., IGB: une nouvelle base générique informative des règles d’association, dans Information-Interaction-Intelligence (Revue I3), 6(1), CEPADUES Edition, octobre 2006: 31-67.

[33]Gouda, K. and Zaki,M.J. GenMax : An Efficient Algorithm for Mining Maximal Frequent Itemsets’, Data Mining and Knowledge Discovery, 2005, 11: 1-20.

[34]Grahne G. and Zhu G. Fast Algorithms for frequent itemset mining using FP-trees, in IEEE transactions on knowledge and Data engineering, 2005,17(10):1347-1362.

[35]Sulaiman Khan M., Maybin Muyeba, Christos Tjortjis, Frans Coenen. “An effective Fuzzy Healthy Association Rule Mining Algorithm (FHARM), In Lecture Notes Computer Science, 2006, 4224:1014-1022, ISSN: 0302-9743.

[36]Jian Pei, Jiawei Han, Hongjun Lu, Shojiro Nishio, Shiwei Tang and Dongqing Yang. HMine: Fast and space-preserving frequent pattern mining in large databases, IIE Transactions, 2007, 39(6):593-605.

[37]Chih-Chia Weng, Shan-Tai Chen, Hung-Che Lo, A Novel Algorithm for Completely Hiding Sensitive Association Rules, Eighth International Conference on Intelligent Systems Design and Applications, 2008.

[38]Kamrul Shah, Mohammad Khandakar, Hasnain Abu. Reverse Apriori Algorithm for Frequent Pattern Mining, Asian Journal of Information Technology, 2008, :524-530, ISSN: 1682-3915.

[39]Ansari E., Dastghaibfard G.H., Keshtkaran M., Kaabi H. “Distributed Frequent Itemset Mining using Trie Data Structure”, 2008, IAENG, vol.35:3.

[40]Bay Vo , Huy Nguyen , Tu Bao Ho , Bac Le, Parallel Method for Mining High Utility Itemsets from Vertically Partitioned Distributed Databases, Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part I, September 28-30, 2009, Santiago, Chile.

[41]Praksh S., Parvathi R.M.S. An enhanced Scalling Apriori for Association Rule Mining Efficiency, European Journal of Scientific Research, 2010, 39:257-264, ISSN: 1450-216X.

[42]Fournier-Viger, P., Faghihi, U., Nkambou, R. and Mephu Nguifo, E. CMRules: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences. In the Proceedings of 23th Intern. Florida Artificial Intelligence Research Society Conference, Daytona, USA, May 19--21, 2010, AAAI Press:410-415.

[43]Fournier-Viger, P. and Tseng, V. S. Mining Top-K Sequential Rules. In Proc. ADMA 2011 (Beijing, China, December 17--19, 2011). Springer, 2011, 180--194.

[44]Rajalakshmi, M., Purusothaman, T., Nedunchezhian, R. International Journal of Database Management Systems ( IJDMS ), 3(3), August 2011: 19-32.

[45]Lin D.-I and Kedem Z. Pincer Search: An efficient algorithm for discovering the maximum frequent set. IEEE Transactions on Database and Knowledge Engineering, 2002, 14 (3): 553 – 566.

[46]Rao, S., Gupta, P. Implementing Improved Algorithm Over Apriori Data Mining Association Rule Algorithm. IJCST.2012, 3 (1), 489-493.

[47]Philippe Fournier-Viger and Vincent S. Tseng. Mining top-K non-redundant association rules. In Proceedings of the 20th international conference on Foundations of Intelligent Systems (ISMIS'12), Li Chen, Alexander Felfernig, Jiming Liu, and Zbigniew W. Raś (Eds.). Springer-Verlag, Berlin, Heidelberg, 2012, 31-40.

[48]Antonio Gomariz, Manuel Campos, Roque Marín, Bart Goethals. ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences. PAKDD, 2013: 50-61.

[49]Holsheimer M, Kersten M, Mannila H, Toivonen H (1995) A perspective on databases and data mining. In Proceeding of the 1995 international conference on knowledge discovery and data mining (KDD’95), Montreal, Canada: 1995, 150–155.

[50]Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor 5:59–68.

[51]HolderLB, Cook DJ,Djoko S . Substructure discovery in the subdue system. In: Proceeding of the AAAI’94 workshop knowledge discovery in databases (KDD’94), Seattle, WA, 1994: 169–180.

[52]Mei Q, Xin D, Cheng H, Han J, Zhai C. Generating semantic annotations for frequent patterns with context analysis. In: Proceeding of the 2006 ACMSIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, 2006: 337–346.