A New Classification Algorithm for Data Stream

Full Text (PDF, 178KB), PP.32-39

Views: 0 Downloads: 0

Author(s)

Li Su 1,* Hong-yan Liu 2 Zhen-Hui Song 3

1. Xi’an University of Technology, Xi’an, China

2. Changqing Oilfield Company, Xi’an, China

3. ShiJiaZhuang Vocational Technology Institute ShiJiaZhuang, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2011.04.05

Received: 10 May 2011 / Revised: 12 Jun. 2011 / Accepted: 1 Jul. 2011 / Published: 8 Aug. 2011

Index Terms

Data streams, associative classification, frequent itemsets

Abstract

Associative classification (AC) which is based on association rules has shown great promise over many other classification techniques on static dataset. Meanwhile, a new challenge have been proposed in that the increasing prominence of data streams arising in a wide range of advanced application. This paper describes and evaluates a new associative classification algorithm for data streams AC-DS, which is based on the estimation mechanism of the Lossy Counting (LC) and landmark window model. And AC-DS was applied to mining several datasets obtained from the UCI Machine Learning Repository and the result show that the algorithm is effective and efficient.

Cite This Paper

Li Su, Hong-yan Liu, Zhen-Hui Song, "A New Classification Algorithm for Data Stream", International Journal of Modern Education and Computer Science(IJMECS), vol.3, no.4, pp.32-39, 2011. DOI:10.5815/ijmecs.2011.04.05

Reference

[1]B Babcock, S Babu, M Datar,et al•Models and issues in datastreams systems [C]•The 21st ACM SIGACT-SIGMOD-SIGART Symp on Priciples of Database Systems, Madison,2002
[2]P Domingos, G Hulten•Mining high-speed data streams [C]•The Assoiciation for Computing Machinery 6th Int’l Conf onKnowledge Discovery and Data Minings, Boston, 2000
[3]R Jin, G Agrawal•Efficient decision tree construction on streaming data [C]•The ACM SIGKDD 9th Int’l Conf on Knowledge Discovery and Data Mining, Washington, 2003
[4]S Muthukrishnan•Data streams: Algorithms and applications[C]•The 14th Annual ACM-SIAM Symp on Discrete Algorithms, Baltimore, MD, USA, 2003
[5]H Wang, W Fan, P Yu,et al•Mining concept-drifting datastreams using ensemble classifiers [C]•The 9th ACM Int’lConf on Knowledge Discovery and Data Mining (SIGKDD),Washington, 2003
[6]Q H Xie•An efficient approach for mining concept-drifting datastreams: [Master dissertation][D]•Tainan, China: NationalUniversity of Tainan, 2004
[7]M Guetova, Holldobter, H P Storr•Incremental fuzzy decisiontrees [C]•The 25th German Conf on Artificial Intelligence(KI2002), Aachen, Germany, 2002
[8]Yang Yidong, Sun Zhihui, Zhang Jing•Finding outliers in dis-tributed data streams based on kernel density estimation [J]•Journal of Computer Research and Development, 2005, 42(9):1498-1504 (in Chinese)
[9]Qian Jiangbo, Xu Hongbing, Dong Yisheng,et al•A windowjoin optimization algorithm based on minimum spanning tree[J]•Journal of Computer Research and Development, 2007, 44(6): 1000-1007 (in Chinese)
[10]R. Agrawal, T. Imielinski and A. Swami. “Mining association rules between sets of items in large databases”. In Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C, May 1993.
[11]Gurmeet Singh Manku and Rajeev Movtwani, “Approximate Frequency Counts over Data Streams”. Proceedings of the 28th VLDB conference, Hong Kong, China, 2002.
[12]Yu J, Chong Z, Lu H et al. “False positive or false negative: ming frequent itemsets from high speed transactional data streams. In: Nascimento et al.(eds) Proceedings of the thirtieth international conference on very large data bases”, Toronto, Canada, September 3-August 31, 2004, pp 204-215.
[13]Li H, Lee S, Shan M. “An efficient algorithm for mining frequent itemsets over the entire history of data streams”. Proceedings of the first international workshop on konwledge discovery in data streams, Pisa, Italy, 2004.
[14]Pedro Domingos and Geoff Hulten. “Mining high-speed data streams”,In Proceedings of the sixth ACM SIGKDD international conference on knowledg discovery and data ming, pape 71-80, Boston, MA, 2000. ACM Press.
[15]Chuancong Gao, Jianyong Wang. Direct Mining of Discriminative Patterns for Classifying Uncertain Data
[16]Agrawal R, Imilinski T, Swami A. Mining Association Rules Between Sets of Items in Large Database[ C] −Proceedings of the ACM SIGMOD Conference on Management of Data. Washington DC, 1993: 207-216
[17]Han J, Pei J , Yin Y. Mining frequent pat terns with out candidate generation[C] −Proceedings of the 2000 ACM SIGM OD International Conference on Management of Data. Dallas, TX, 2000:1-12
[18]B. Liu, W. Hsu, and Y. Ma. “Integrating classification and association rule mining”. In KDD 98, New York, NY, Aug.1998.
[19]B. Liu, Y. Ma, and C.-K. Wong, “Improving an association rule based classifier,” in Proc.4th Eur. Conf. Principles Practice Knowledge Discovery Databases(PKDD-2000),2000.
[20]R. Agrawal and R.Srikant, “Fast algorithms for mining association rules”. In Proc. 20th Int. Conf. Very Large Data Bases(VLDB), 1994, pp.1-12
[21]D. J. Newman, S. Hettich, C. Blake, and C. Merz, “UCI Repository of Machine Learning Databases”. Berleley, CA: Dept. Information Comput. Sci., University of California,1998.
[22]R. Kohavi, D. Sommerfield, and J. Dougherty, “MLC++: A machine learning library in C++,” in Proc.6th Int. Conf. Tools Artificial Intelligence, New Orleans, LA, 1994, pp.740–743.