The Comparison of Machine Learning Algorithms on Online Classification of Network Flows

Full Text (PDF, 147KB), PP.7-11

Views: 0 Downloads: 0

Author(s)

Keji Wei 1 Shaolong Cao 1 Jian Yu 1

1. Xi'an Jiaotong Univ, P.R.China

* Corresponding author.

DOI: https://doi.org/10.5815/ijwmt.2012.02.02

Received: 3 Jan. 2012 / Revised: 6 Feb. 2012 / Accepted: 7 Mar. 2012 / Published: 15 Apr. 2012

Index Terms

Online classification, network flow, statistical feature, feature selection, classifier

Abstract

Online classification of network flows is a process that captures packets generated by network applications and identifies types of network applications (or flows) in real time. There are three key issues about online classification: observation window size, feature selection, and classification algorithms.
In this paper, by collecting five types of typical network flow data as the experiment sample data, the authors found observation window size 7 is the best for the sample data and most classifiers. The authors proposed a full feature set based on the standard feature set which reflects statistical features of network flows. Using five commonly used feature selection methods, the authors identified the most effective features could be reduced from 56 original features to 11 effective features. Lastly, according to special need for online classification, the authors studied 11 different classifiers on their classification accuracy, model construction time, and classification speed. The results show that C4.5 and JRip are the two best algorithms for online classification.

Cite This Paper

Keji Wei,Shaolong Cao,Jian Yu,"The Comparison of Machine Learning Algorithms on Online Classification of Network Flows", IJWMT, vol.2, no.2, pp.7-11, 2012. DOI: 10.5815/ijwmt.2012.02.02

Reference

[1]A Moore,K Papagiannaki.Toward the Accurate Identification of Network Applications[C].Passive and Active Measurements Workshop, Boston, USA, 2005.

[2]A Moore,D Zuev.Internet traffic classification using Bayesian analysis techniques[C].Proceedings of the 2005 Conference on Measurement and Modeling of Computer Systems,New York,2005: 50-60. 

[3]N Williams,S Zander,G Armitage.Evaluating Machine Learning Algorithms for Automated Network Application Identification[R]. CAIA Technical Report, April 2006.

[4]YL Ma,ZJ Qian.Study of information Network Traffic Identification Based on C4.5 Algorithm[C]. WiCOM '08. 4th International Conference,2008:1-5.

[5]W Yu, SZ Yu.Supervised Learning Real-time Traffic Classifiers[J].Journal of Networks, 2009, 4(7):622-629.

[6]W Yu, SZ Yu.Machine Learned Real-time Traffic Classifiers[C].Intelligent Information Technology Application,2008,3:449-454.

[7]YL ma,ZJ Qian,GC Shou.Study on Preliminary performance of Algorithms for Network Traffic Identification[C].Computer Science and Software Engineering,2008,1:629-633.

[8]N Williams,S Zander.A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification[C].Special Interest Group on Data Communication Computer Communication Review,2006,36(5):5-16.

[9]J Teixeira.Feature Selection with a General Hybrid Algorithm[D].Ottawa:SITE,2004.

[10]NetMate[OL]. http://www.ip-measurement.org/.

[11]IANA[EB/OL].http://www.iana.org/assignments/port-numbers

[12]NLANR traces[OL]: http://www.wand.net.nz/wits/.

[13]Weka 3.6.1[OL]. http://www.cs.waikato.ac.nz/ml/weka/.