Ensemble Feature Selection and Classification of Internet Traffic using XGBoost Classifier

Full Text (PDF, 581KB), PP.37-44

Views: 0 Downloads: 0

Author(s)

N Manju 1,* B S Harish 2 V Prajwal 2

1. Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, 570006, India

2. Department of Information Science and Engineering, JSS Science and Technology University, Mysuru, 570006, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2019.07.06

Received: 4 May 2019 / Revised: 20 May 2019 / Accepted: 28 May 2019 / Published: 8 Jul. 2019

Index Terms

Identification, Classification, Feature Selection, Internet Traffic

Abstract

Identification and classification of internet traffic is most important in network management to ensure Quality of Service (QoS). However, existing machine learning models tend to produce unsatisfactory results when applied with imbalanced datasets involving multiple classes. There are two reasons for this: the models have a bias towards classes which have more samples and they also tend to predict only the majority class data as features of the minority class are often treated as noise and therefore ignored. Thus, there is a high probability of misclassification of the minority class compared with the majority class. Therefore, in this paper, we are proposing an ensemble feature selection based on the tree approach and ensemble classification model using XGboost to enhance the performance of classification. The proposed model achieves better classification accuracy compared to other tree based classifiers.

Cite This Paper

N Manju, B S Harish, V Prajwal, "Ensemble Feature Selection and Classification of Internet Traffic using XGBoost Classifier", International Journal of Computer Network and Information Security(IJCNIS), Vol.11, No.7, pp.37-44, 2019. DOI:10.5815/ijcnis.2019.07.06

Reference

[1]Cho, K., Fukuda, K., Esaki, H., and Kato, A. “The impact and implications of the growth in residential user-to-user traffic”, in: ACM SIGCOMM Computer Communication Review, Vol. 36, No.4, pp. 207-218, 2006.
[2]Roesch, M., “Snort: Lightweight intrusion detection for networks”, in: Lisa, Vol. 99, No. 1, pp. 229-238, 1999.
[3]Paxson, V., “Bro: a system for detecting network intruders in real-time”, Computer networks, 31(23-24), pp. 2435-2463, 1999.
[4]Stewart, L., Armitage, G., Branch, P., and Zander, S., “An architecture for automated network control of QoS over consumer broadband links”, 2005.
[5]Baker, F., Foster, B., and Sharp, C., “Cisco architecture for lawful intercept in IP networks”, (No. RFC 3924), 2004.
[6]Kim, H., Claffy, K. C., Fomenkov, M., Barman, D., Faloutsos, M., and Lee, K, “Internet traffic classification demystified: myths, caveats, and the best practices”, in: Proceedings of the 2008 ACM CoNEXT conference, pp. 11, 2008.
[7]Paxson, V., “Empirically derived analytic models of wide-area TCP connections”, IEEE/ACM Transactions on Networking (TON), Vol. 2, No. 4, pp. 316-336, 1994.
[8]Dewes, C., Wichmann, A., and Feldmann, A., “An analysis of Internet chat systems”, in: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pp. 51-64, 2003.
[9]WANG, R. Y., Zhen, L. I. U., and ZHANG, L., “Method of data cleaning for network traffic classification”, The Journal of China Universities of Posts and Telecommunications, Vol. 21, No. 3, pp.35-45, 2014.
[10]Lin, P., Yu, X. Y., Liu, F., and LEI, Z. M., “A network traffic classification algorithm based on flow statistical characteristics”, Journal of Beijing University of Posts and Telecommunications, Vol. 31, No. 2, pp. 15-19, 2008.
[11]Min, L. I. U. Q. L. I. U. Z., “Study on Internet Traffic Classification Using Machine Learning”, Computer Science, 12, 008.2010.
[12]Williams, N., Zander, S., & Armitage, G., “A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification”, in: ACM SIGCOMM Computer Communication Review, Vol. 36, No. 5, pp. 5-16, 2006.
[13]Soysal, M., and Schmidt, E. G., “Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison”, Performance Evaluation, Vol. 67, No. 6, pp. 451-467, 2010.
[14]Yuan, R., Li, Z., Guan, X., and Xu, L.. “An SVM-based machine learning method for accurate internet traffic classification”, Information Systems Frontiers, Vol. 12, No. 2, pp.149-156, 2010.
[15]Alshammari, R., and Zincir-Heywood, A. N., “Identification of VoIP encrypted traffic using a machine learning approach”, Journal of King Saud University-Computer and Information Sciences, Vol. 27, No. 1, pp.77-92, 2015.
[16]Di Mauro, M., and Di Sarno, C., “Improving SIEM capabilities through an enhanced probe for encrypted Skype traffic detection”, Journal of Information Security and Applications, Vol. 38, pp. 85-95, 2018.
[17]Ducange, P., Mannarà, G., Marcelloni, F., Pecori, R., amd Vecchio, M., “A novel approach for internet traffic classification based on multi-objective evolutionary fuzzy classifiers”, in: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1-6, 2017.
[18]Yamansavascilar, B., Guvensan, M. A., Yavuz, A. G., and Karsligil, M. E., “Application identification via network traffic classification”, in: International Conference on Computing, Networking and Communications (ICNC), pp. 843-848, 2017.
[19]Ertam, F., and Avcı, E., “A new approach for internet traffic classification: GA-WK-ELM”, Measurement, Vol. 95, pp. 135-142, 2017.
[20]Zhen, L. I. U., and Qiong, L. I. U., “Studying cost-sensitive learning for multi-class imbalance in Internet traffic classification”, The Journal of China Universities of Posts and Telecommunications, Vol. 19, No. 6, pp. 63-72, 2012.
[21]Peng, L., Zhang, H., Chen, Y., and Yang, B., “Imbalanced traffic identification using an imbalanced data gravitation-based classification model”, Computer Communications, Vol. 102, pp. 177-189, 2017.
[22]Zhao, J. J., Huang, X. H., Qiong, S. U. N., and Yan, M. A., “Real-time feature selection in traffic classification”, The Journal of China Universities of Posts and Telecommunications, Vol. 15, pp. 68-72, 2008.
[23]Bolon-Canedo, V., Sanchez-Marono, N., and Alonso-Betanzos, A., “Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset”, Expert Systems with Applications, Vol. 38, No. 5, pp. 5947-5957, 2011.
[24]Zhang, H., Lu, G., Qassrawi, M. T., Zhang, Y., and Yu, X., “Feature selection for optimizing traffic classification”, Computer Communication, Vol. 35, No. 12, pp. 1457-1471, 2012.
[25]Liu, Z., and Liu, Q., “Balanced feature selection method for Internet traffic classification”, IET networks, Vol. 1, No. 2, pp. 74-83, 2012.
[26]Zhen, L., and Qiong, L., “A new feature selection method for internet traffic classification using ml”, Physics Procedia, Vol. 33, pp. 1338-1345, 2012.
[27]Sun, M., Chen, J., Zhang, Y., and Shi, S, A new method of feature selection for flow classification, Physics Procedia, 24, 2012, pp. 1729-1736.
[28]Fahad, A., Tari, Z., Khalil, I., Habib, I., and Alnuweiri, H., “Toward an efficient and scalable feature selection approach for internet traffic classification”, Computer Networks, Vol. 57, No. 9, pp. 2040-2057, 2013.
[29]Fahad, A., Tari, Z., Khalil, I., Almalawi, A., and Zomaya, A. Y., “An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion”, Future Generation Computer Systems, Vol. 36, pp. 156-169, 2014.
[30]Liu, Z., Wang, R., Tao, M., and Cai, X., “A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion”, Neurocomputing, Vol. 168, pp. 365-381, 2015.
[31]Shi, H., Li, H., Zhang, D., Cheng, C., and Wu, W., “Efficient and robust feature extraction and selection for traffic classification”, Computer Networks, Vol. 119, pp. 1-16, 2017.
[32]Shafiq, M., Yu, X., and Wang, D., “Robust Feature Selection for IM Applications at Early Stage Traffic Classification Using Machine Learning Algorithms”, in: IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 239-245, 2017.
[33]Ghofrani, F., Keshavarz-Haddad, A., and Jamshidi, A., “A new probabilistic classifier based on decomposable models with application to internet traffic”, Pattern Recognition, Vol. 77, pp. 1-11, 2018.