Mining Data Streams using Option Trees

Full Text (PDF, 239KB), PP.49-54

Views: 0 Downloads: 0

Author(s)

B.Reshma Yusuf 1,* P.Chenna Reddy 1

1. JNTUA college of engineering, pulivendula, Andhra Pradesh, INDIA

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2012.08.06

Received: 29 Dec. 2011 / Revised: 2 Mar. 2012 / Accepted: 11 May 2012 / Published: 8 Aug. 2012

Index Terms

Data streams, hoeffding trees, option trees, large databases

Abstract

In today's applications, evolving data streams are stored as very large databases; the databases which grow without limit at a rate of several million records per day. Data streams are ubiquitous and have become an important research topic in the last two decades. Mining these continuous data streams brings unique opportunities, but also new challenges. For their predictive nonparametric analysis, Hoeffding-based trees are often a method of choice, which offers a possibility of any-time predictions. Although one of their main problems is the delay in learning progress due to the presence of equally discriminative attributes. Options are a natural way to deal with this problem. In this paper, Option trees which build upon regular trees is presented by adding splitting options in the internal nodes to improve accuracy, stability and reduce ambiguity. Results based on accuracy and processing speed of algorithm under various memory limits is presented. The accuracy of Hoeffding Option tree with Hoeffding trees under circumstantial conditions is compared.

Cite This Paper

B.Reshma Yusuf, P.Chenna Reddy, "Mining Data Streams using Option Trees", International Journal of Computer Network and Information Security(IJCNIS), vol.4, no.8, pp.49-54, 2012. DOI:10.5815/ijcnis.2012.08.06

Reference

[1]P. Domingos and G. Hulten, "Mining High Speed Data Streams", in Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, 2000.
[2]P. Domingos and G. Hulten. A General Framework for Mining Massive Data Streams.
[3]Manish Mehta, Rakesh Agarwal, and Jorma Rissanen. "SLIQ : A fast scalable classifier for data mining". In Extending Database Technology, 1996.
[4]John Shafer, Rakesh Agarwal, and Manish Mehta. "SPRINT : A scalable parallel classifier for data mining ". In International Conference on Very Large Databases. 1996.
[5]Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams. In KDD, pages 97–106, 2001.
[6]Eric Bauer and Ron Kohavi. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
[7]Bernhard Pfarhringer, Goeffrey Holmes, and Richard Kirkby. "New Options for Hoeffding trees". 2007.
[8]Ron Kohavi and Clayton Kunz, "Option Decision trees with majority votes". In International Conference on Machine Learning.
[9]Richard Kirkby, "Improving Hoeffding Trees", University of Waikato, 2007.
[10]Dariusz Brzezinski, "Mining data streams using concept drift", Poznan University of Technolgy, 2010