An Efficient Machine Learning Based Classification Scheme for Detecting Distributed Command & Control Traffic of P2P Botnets

Full Text (PDF, 232KB), PP.9-18

Views: 0 Downloads: 0

Author(s)

Pijush Barthakur 1,* Manoj Dahal 2 Mrinal Kanti Ghose 1

1. Department of Computer Science and Engineering, Sikkim Manipal Institute of Technology, Sikkim, India

2. Novell IDC, Bagmane Tech Park, C V Ramannagar, Bangalore, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2013.10.02

Received: 1 Jun. 2013 / Revised: 10 Jul. 2013 / Accepted: 2 Sep. 2013 / Published: 8 Oct. 2013

Index Terms

Botnet, Peer- to- Peer (P2P), WEKA, Linear support vector machine, J48, Bayesnet, ROC curve, AUC

Abstract

Biggest internet security threat is the rise of Botnets having modular and flexible structures. The combined power of thousands of remotely controlled computers increases the speed and severity of attacks. In this paper, we provide a comparative analysis of machine-learning based classification of botnet command & control(C&C) traffic for proactive detection of Peer-to-Peer (P2P) botnets. We combine some of selected botnet C&C traffic flow features with that of carefully selected botnet behavioral characteristic features for better classification using machine learning algorithms. Our simulation results show that our method is very effective having very good test accuracy and very little training time. We compare the performances of Decision Tree (C4.5), Bayesian Network and Linear Support Vector Machines using performance metrics like accuracy, sensitivity, positive predictive value(PPV) and F-Measure. We also provide a comparative analysis of our predictive models using AUC (area under ROC curve). Finally, we propose a rule induction algorithm from original C4.5 algorithm of Quinlan. Our proposed algorithm produces better accuracy than the original decision tree classifier.

Cite This Paper

Pijush Barthakur, Manoj Dahal, Mrinal Kanti Ghose, "An Efficient Machine Learning Based Classification Scheme for Detecting Distributed Command & Control Traffic of P2P Botnets", International Journal of Modern Education and Computer Science (IJMECS), vol.5, no.10, pp.9-18, 2013. DOI:10.5815/ijmecs.2013.10.02

Reference

[1]E. Florio and M. Ciubotariu, Peerbot: Catch me if you can, Symantec Security Response,Tech. Rep., April 2007.
[2]S Stover, D Dittrich, J Hernandez, S Dietrich, “Analysis of the Storm and Nugache Trojans: P2P is here ”, in USENIX December 2007, Volume 32, Number 6.
[3]G. Sinclair, C. Nunnery, B. Byung and H. Kang, “The Waledac Protocol: The How and Why” Proc. 4th International Conference on Malicious and Unwanted Software(MALWARE 09), IEEE Press, Feb. 2010.
[4]Wen-Hwa Liao, Chia-Ching Chang, ”Peer to Peer Botnet Detection Using Data Mining Scheme”, International Conference on Internet Technology and Applications, 2010.
[5]Guofei Gu, Vinod Yegneswaran, Phillip Porras, Jennifer Stoll, and Wenke Lee, “Active Botnet Probing to Identify Obscure Command and Control Channels” in Annual Computer Security Applications Conference,2009.
[6]Craig A. Schiller, Jim Binkley, David Harley, Gadi Evron, Tony Bradley, Carsten Willems, Michael Cross, “BOTNETS THE KILLER WEB APP”, Syngress Publishing Inc.,2007.
[7]Kevin Gennuso Shedding Light on Security Incidents Using Network Flows, The SANS Institute 2012.
[8]Carl Livadas, Robert Walsh, David Lapsley, W. Timothy Strayer, “Using Machine Learning Techniques to Identify Botnet Traffic” in 2nd IEEE LCN Workshop on Network Security (WoNS'2006).
[9]David Zhao, Issa Traoré, Ali A. Ghorbani, Bassam Sayed, Sherif Saad, Wei Lu: Peer to Peer Botnet Detection Based on Flow Intervals. SEC 2012, pp. 87-102, 2012.
[10]Wernhuar Tarng, Li-Zhong Den, Kuo-Liang Ou, Mingteh Chen, “The Analysis and Identification of P2P Botnet’s Traffic Flows”, International Journal of Communication Network and Information Security(IJCNIS), Vo. 3, No. 2, August 2011.
[11]Pijush Barthakur, Manoj Dahal, Mrinal Kanti Ghose,”A Framework for P2P Botnet Detection using SVM”, in the 4th International conference on Cyber-Enabled Distributed Computing and Knowledge Discovery(CyberC), 2012.
[12]H. Choi, H. Lee, H. Lee, and H. Kim, “Botnet Detection by Monitoring Group Activities in DNS Traffic,” in Proc. 7th IEEE International Conference on Computer and Information Technology (CIT 2007), 2007, pp.715-720.
[13]Ricardo Villamarín-Salomón, José Carlos Brustoloni, “Identifying Botnets Using Anomaly Detection Techniques Applied to DNS Traffic”, in IEEE CCNC proceedings,2008.
[14]Sandeep Yadav, Ashwath Kumar Krishna Reddy, A.L. Narasimha Reddy, Supranamaya Ranjan ,”Detecting Algorithmically Generated Domain-Flux Attacks with DNS Traffic Analysis.”, 2012.
[15]Mohammad M. Masud, Tahseen Al-khateeb, Latifur Khan, Bhavani Thuraisingham, Kevin W. Hamlen. Flow Based Identification of Botnets Traffic by Mining Multiple Log Files. In Distributed Framework and Applications, 2008. DFmA 2008.
[16]Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. In 17th USENIX Security Symposium,2008.
[17]Hossein Rouhani Zeidanloo, Farhoud Hosseinpour, Farhood Farid Etemad,“New Approach for Detection of IRC and P2P Botnets”, International Journal of Computer and Electrical Engineering, Vol. 2, No. 6, December 2010.
[18]Huy Hang, Xuetao Wei, Michalis Faloutsos, Tina Eliassi-Rad, “Entelecheia: Detecting P2P P2P bots with Structured Graph Analysis”, 19th USENIX conference Botnets in their Waiting Stage”, IFIP Networking 2013.
[19]Shishir Nagaraja, Prateek Mittal, Chi-Yao Hong, Matthew Caesar, Nikita Borisov, “BotGrep: Finding on Security, 2010.
[20]Babak Rahbarinia, Roberto Perdisci, Andrea Lanzi, Kang Li, “PeerRush: Mining for Unwanted P2P Traffic”, in proceedings of 10th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2013), July, 2013.
[21]Huabo Li, Guyu Hu, Jian Yuan, Haiguang Lai. “P2P Botnet Detection based on Irregular Phased Similarity”, in Second International Conference on Instrumentation, Measurement, Computer, Communication and Control(IMCCC), 2012.
[22]http://www.wireshark.org/.
[23]M. M. Masud, J. Gao, L. Khan, J. Han and B.Thuraisingham,” Mining Concept-Drifting Data Stream to Detect Peer to Peer Botnet Traffic”, Univ. of Texas at Dallas, Tech. Report# UTDCS-05-08(2008).
[24]J. R. Quinlan, “C4.5: Programs for Machine Learning”, San Mateo CA:Morgan Kaufman, 1993.
[25]Remco R. Bouckaert, “Bayesian Network Classifiers in Weka for Version 3-5-7 ”, The University of Waikato, 2008.
[26]http://www.androidadb.com/source/weka-3-7-4/weka-src/src/main/java/weka/classifiers/bayes/net/search/local/GeneticSearch.java.html.
[27]T. Joachims, "A Support Vector Method for Multivariate Performance Measures",Proceedings of the International Conference on Machine Learning (ICML), 2005.
[28]Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807–814, Corvalis, Oregon,2007.
[29]Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin, ”LIBLINEAR: A Library for Large Linear Classification”, Journal of Machine Learning Research 9(2008).
[30]http://www.cs.waikato.ac.nz/ml/weka/
[31]Nitesh V. Chawla, Kevin W.Bowyer, Lawrence O. Hall, W. Philip Kegelmeyer, ”SMOTE: Synthetic Minority Over-sampling TEchnique” in Journal of Artificial Intelligence Research, Volume 16, 321-357,2002.
[32]Emil Brissman, Kajsa Eriksson,”Classification: Grafted Decision Trees”, Linkoping University, 2011.
[33]Ling, C., Huang, J., & Zhang, H. Auc: a better measure than accuracy in comparing learning algorithms. Proceedings of Canadian Artificial Intelligence Conference. (2003).
[34]Bradley, A.P, ”The use of the area under the ROC curve in the evaluation of machine learning algorithms”, Pattern Recognition 30(1997), 1145-1159.
[35]Lutz Hamel,”Model Assessment with ROC curves”, The Encyclopedia of Data Warehousing and Mining, 2nd Edition, Idea Group Publishers, 2008.
[36]J. R. Quinlan and R. L. Rivest, “Inferring decision trees using the minimum description length principle,” Information and computation, vol.80, no.3, pp.227-248, 1989.