Top-k Closed Sequential Graph Pattern Mining

Full Text (PDF, 450KB), PP.1-9

Views: 0 Downloads: 0

Author(s)

K. Vijay Bhaskar 1,* R B V Subramanyam 2 K. Thammi Reddy 1 S. Sumalatha 2

1. GITAM University/CSE, Visakhapatnam, 530045, India

2. NIT Warangal/CSE, Warangal, 506004, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2016.04.01

Received: 17 Apr. 2016 / Revised: 13 May 2016 / Accepted: 10 Jun. 2016 / Published: 8 Jul. 2016

Index Terms

Data mining, graph mining, frequent sequential patterns, closed sequential patterns

Abstract

Graphs have become increasingly important in modeling structures with broad applications like Chemical informatics, Bioinformatics, Web page retrieval and World Wide Web. Frequent graph pattern mining plays an important role in many data mining tasks to find interesting patterns from graph databases. Among different graph patterns, frequent substructures are the very basic patterns that can be discovered in a collection of graphs. We extended the problem of mining frequent subgraph patterns to the problem of mining sequential patterns in a graph database. In this paper, we introduce the concept of Sequential Graph-Pattern Mining and proposed two novel algorithms SFG(Sequential Frequent Graph Pattern Mining) and TCSFG(Top-k Closed Sequential Frequent Graph Pattern Mining). SFG generates all the frequent sequences from the graph database, whereas TCSFG generates top-k frequent closed sequences. We have applied these algorithms on synthetic graph database and generated top-k frequent graph sequences.

Cite This Paper

K. Vijay Bhaskar, R.B.V Subramanyam, K. Thammi Reddy, S. Sumalatha, "Top-k Closed Sequential Graph Pattern Mining", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.8, No.4, pp.1-9, 2016. DOI:10.5815/ijieeb.2016.04.01

Reference

[1]A. Inokuchi, T. Washio, and H. Motoda. "An apriori-based algorithm for mining frequent substructures from graph data," In Proc. PKDD, ser. LNCS, Springer, Vol.1910,pp. 13-23, July 2002. "doi: 10.1007/3-540-45372-5_2".

[2]F. Zhu, X.Yan, J. Han, and P. S.Yu, "GPrune: A constraint pushing framework for graph pattern mining," In Proc. PAKDD, ser. LNCS, Springer, vol. 4426, pp. 388–400, May 2007. "doi: 10.1007/978-3-540-71701-0_38".

[3]G. Sethuraman, and Kavitha Joseph. "Star Coloring Problem:The DNA Solution," International Journal of Information Technology and Computer Science, Vol. 4, No.3, pp.31-37, Apr.2012."doi: 10.5815/ijitcs.2012.03.05".

[4]J. Han, J. Wang, Y. Lu, and P. Tzvetkov. "Mining top-k frequent closed patterns without minimum support,". In Proc. ICDM 2002, Maebashi, Japan, pp. 211-218, Dec. 2002. "doi: 10.1109/ICDM.2002.1183905".

[5]J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U.Dayal, and M.C.Hsu. "Mining sequential patterns by pattern growth: Prefix span approach," IEEE Transactions on Knowledge and data engineering, Vol.16, No.11,pp.1424-1440,Nov2004, "doi:10.1109/TKDE.2004. 77 ".

[6]Jianyong Wang and Jiawei Han, "Bide:Efficient mining of frequent closed sequences," In Proc. of 20th IEEE international conference on Data Engineering ,pp. 79-90, "doi:10.1109/ICDE.2004.1319986".

[7]Jianzhong Li, Yong Liu, and Hong Gao "Efficient Algorithm for Summarizing Graph Patterns," IEEE Transactions on Knowledge and Data Engineering, Vol.23, Issue:9,pp.1388-1405,2011,"doi:10.1109/TKDE. 2010.48".

[8]M. Zaki, "SPADE:An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, Kluwer Academic Publishers, Vol.42, issue:1, pp.31-60, 2001, "doi: 10.1023/A:1007652502315".

[9]M. Kuramochi and G. Karypis, "Frequent Subgraph discovery," In Proc. 2001. Int. conf. Data Mining 2001, pp. 313-320, San Jose, CA, November 2001, "doi: 10.1109/ICDM.2001.989534".

[10]Mohammed Hasan Mahafzah. "An Efficient Graph-coloring Algorithm for processor Allocation," International Journal of Information Technology and Computer Science, Vol.5, No.7, pp. 43-48, June 2013, "doi: 10.5815/ijitcs. 2013.07.05".

[11]Natalia Vanetik. "Mining Graphs with Constraints on Symmetry and Diameter," In WAIM 2010 International Workshops, Vol. 6185,Springer, pp. 1-12, July 2010, "doi: 10.1007/978-3-642-16720-1_1".

[12]Petre Tzvetkov, Xifeng Yan, Jiawei Han. "TSP: Mining Top-K Closed Sequential Patterns," Third IEEE International Conference on Data mining, pp. 347-354, November 2003, "doi:10.1109/ICDM.2003.1250939".

[13]R. Agrawal and R. Srikant." Mining sequential patterns". In ICDE'95, Taipei, Taiwan, pp. 3-14, Mar. 1995.

[14]R. Srikant and R. Agrawal. "Mining sequential patterns: Generalizations and performance improvements". In EDBT'96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology,pp.3-17,"doi:10.1007/ BFb0014140"

[15]Synthetic graph generated by IBM Quest Synthetic Data Generation Code for Associations and Sequential Patterns. [http://www.cse.ust.hk/graphgen/].

[16]X. Yan and J. Han. "CloseGraph: Mining closed frequent graph patterns". In KDD'03, Washington, D.C., August 2003, "doi:10.1145/956750.956784".

[17]X. Yan, J. Han, and R. Afshar. "CloSpan: Mining closed sequential patterns in large datasets," In SDM'03, San Fransisco, CA, pp. 166-177 May 2003.

[18]Xifeng Yan, Jiawei Han. "gSpan: Graph-Based Substructure Pattern Mining," In Proc ICDM 2002, pp. 721-724, "doi:10.1109/ICDM.2002.1184038".

[19]Yan X, Zhou XJ, Han J. "Mining closed relational graphs with connectivity constraints". In Proc of ACM SIGKDD international conference on knowledge discovery in databases (KDD'05), Chicago, IL, pp. 324–333, 2005, "doi:10.1145/1081870.1081908".