Detecting Remote Access Network Attacks Using Supervised Machine Learning Methods

Full Text (PDF, 599KB), PP.48-61

Views: 0 Downloads: 0

Author(s)

Samuel Ndichu 1,* Sylvester McOyowo 1 Henry Okoyo 1 Cyrus Wekesa 2

1. Maseno University, School of Computing and Informatics, Private Bag, Maseno, Kenya

2. University of Eldoret, School of Engineering, Eldoret, Kenya

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2023.02.04

Received: 4 Mar. 2022 / Revised: 7 Jul. 2022 / Accepted: 14 Sep. 2022 / Published: 8 Apr. 2023

Index Terms

Remote Access, Virtual Private Network, Encrypted Network Traffic, Network Attacks, Machine Learning

Abstract

Remote access technologies encrypt data to enforce policies and ensure protection. Attackers leverage such techniques to launch carefully crafted evasion attacks introducing malware and other unwanted traffic to the internal network. Traditional security controls such as anti-virus software, firewall, and intrusion detection systems (IDS) decrypt network traffic and employ signature and heuristic-based approaches for malware inspection. In the past, machine learning (ML) approaches have been proposed for specific malware detection and traffic type characterization. However, decryption introduces computational overheads and dilutes the privacy goal of encryption. The ML approaches employ limited features and are not objectively developed for remote access security. This paper presents a novel ML-based approach to encrypted remote access attack detection using a weighted random forest (W-RF) algorithm. Key features are determined using feature importance scores. Class weighing is used to address the imbalanced data distribution problem common in remote access network traffic where attacks comprise only a small proportion of network traffic. Results obtained during the evaluation of the approach on benign virtual private network (VPN) and attack network traffic datasets that comprise verified normal hosts and common attacks in real-world network traffic are presented. With recall and precision of 100%, the approach demonstrates effective performance. The results for k-fold cross-validation and receiver operating characteristic (ROC) mean area under the curve (AUC) demonstrate that the approach effectively detects attacks in encrypted remote access network traffic, successfully averting attackers and network intrusions.

Cite This Paper

Samuel Ndichu, Sylvester McOyowo, Henry Okoyo, Cyrus Wekesa, "Detecting Remote Access Network Attacks Using Supervised Machine Learning Methods", International Journal of Computer Network and Information Security(IJCNIS), Vol.15, No.2, pp.48-61, 2023. DOI:10.5815/ijcnis.2023.02.04

Reference

[1]Yan, F., Jian-Wen, Y. and Lin, C. (2015). Computer Network Security and Technology Research, Seventh International Conference on Measuring Technology and Mechatronics Automation, PP. 293-296, DOI: 10.1109/ICMTMA.2015.77.
[2]Ndichu, S., McOyowo, S. and Wekesa, C. (2016). A Review of Security Vulnerabilities, Controls and Models in Networked Environments, International Journal of Latest Research in Engineering and Technology (IJLRET), ISSN: 2454-503, Volume 02, Issue 08, August 2016, PP. 06-14.
[3]Stefan, A. (2000). The base-rate fallacy and the difficulty of intrusion detection. ACM Transactions on Information System Security, Volume 3, Issue 3 (Aug. 2000), PP. 186–205, DOI: https://doi.org/10.1145/357830.357849.
[4]Breiman, L. (2001). Random Forests, Machine Learning, Volume 45, PP. 5–32 (2001). https://doi.org/10.1023/A:1010933404324.
[5]Cha, S. and Kim, H. (2016). Detecting Encrypted Traffic: A Machine Learning Approach, International Workshop on Information Security Applications, WISA 2016. PP. 54-65.
[6]Draper-Gil, G., Lashkari, A., Mamun, M. and Ghorbani, A. (2016). Characterization of Encrypted and VPN Traffic Using Time-related Features, In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP'16), PP. 407-414. http://www.unb.ca/cic/datasets/vpn.html.
[7]Alshammari, R. and Zincir-Heywood A. N. (2009). Machine Learning-Based Encrypted Traffic Classification: Identifying SSH and Skype, Computational Intelligence for Security and Defense Applications (CISDA), IEEE.
[8]Tabatabaei, T. S., Adel, M., Karray, F. and Kamel, M. (2012). Machine Learning-Based Classification of Encrypted Internet Traffic, In: Perner P. (eds) Machine Learning and Data Mining in Pattern Recognition, MLDM 2012, Lecture Notes in Computer Science, Volume 7376, Springer, Berlin, Heidelberg, PP. 578-592, https://doi.org/10.1007/978-3-642-31537-4_45.
[9]Anderson, B. and McGrew, D. (2017). Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity, KDD'17 Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, PP. 1723-1732.
[10]Emmanuel, G. D., Joseph, S. B., Haruna, C., Shafi'i M. A., Adebayo, O. A. and Opeyemi E. A. (2019). Machine learning for e-mail spam filtering: review, approaches and open research problems, Heliyon, Volume 5, Issue 6, 2019, e01802, ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2019.e01802, https://www.sciencedirect.com/science/article/pii/S2405844018353404.
[11]Nassif, A. B., Talib, M. A., Nasir Q. and Dakalbab, F. M. (2021). Machine Learning for Anomaly Detection: A Systematic Review, in IEEE Access, Volume. 9, PP. 78658-78700, DOI: 10.1109/ACCESS.2021.3083060.
[12]GavriluĊ£, D., CimpoeĊŸu, M., Anton, D. and Ciortuz, L. (2009). Malware detection using machine learning, International Multiconference on Computer Science and Information Technology, PP. 735-741, DOI: 10.1109/IMCSIT.2009.5352759.
[13]Bru con. (2017). Detecting Malware even when it is Encrypted – Machine Learning for Network HTTPS Analysis, Bru con Security Conference.
[14]Lashkari, H., Arash. (2018). CICFlowmeter-V4.0 (formerly known as ISCXFlowMeter) is a network traffic Bi-flow generator and analyzer for anomaly detection. https://github.com/ISCX/CICFlowMeter. 10.13140/RG.2.2.13827.20003.
[15]Lashkari, A. H., Draper-Gil, G., Mamun, M. S. I. and Ghorbani, A A. (2017). Characterization of Tor Traffic Using Time-Based Features, In the proceeding of the 3rd International Conference on Information System Security and Privacy, SCITEPRESS, Porto, Portugal, 2017.
[16]Forest of trees-based ensemble methods. (Accessed October 2021). Those methods include random forests and extremely randomized trees, Compute the importance of each feature, https://github.com/scikit-learn/scikit-learn/blob/0abd95f742efea826df82458458fcbc0f9dafcb2/sklearn/ensemble/forest.py#L360.
[17]Pedregosa, F., Varoquaux, G., Gramfort, A. et. al. (2011). Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, Volume 12, PP. 2825–2830.
[18]Buitinck, L., Louppe, G., Blondel, M. et. al. (2013). API design for machine learning software: experiences from the scikit-learn project, In ECMLPKDD Workshop: Languages for Data Mining and Machine Learning, PP. 108–122.
[19]Zhang H. and Li, D. (2007). Naïve Bayes Text Classifier, IEEE International Conference on Granular Computing (GRC 2007), PP. 708-708, DOI: 10.1109/GrC.2007.40.
[20]Sammut C., Webb G.I. (2011). Logistic Regression, Encyclopedia of Machine Learning, Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_493.
[21]Cortes, C., and Vapnik, V. (1995). Support-vector networks, Machine Learning, Volume 20, Issue 3, PP. 273–297, https://doi.org/10.1007/BF00994018.
[22]Cunningham, P. and Delany, S. J. (2021). K-Nearest Neighbour Classifiers - A Tutorial, ACM Computing Surveys, Association for Computing Machinery (ACM), Number 6, Volume 54, PP. 1–25, DOI 10.1145/3459665, http://dx.doi.org/10.1145/3459665.
[23]Entezari-Maleki, R., Rezaei, A. and Minaei-Bidgoli, B. (2009). Comparison of Classification Methods Based on the Type of Attributes and Sample Size, Journal of Convergence Information Technology, Volume 4, Number 3, September, PP. 94-102.
[24]Han, J., Kamber, M. and Pei, J. (2011). Data Mining: Concepts and Techniques (3rd. ed.), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[25]Sharafaldin, I., Lashkari, A. H. and Ghorbani, A. A. (2018). Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization, 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal.
[26]Samuel Ndichu, Sylvester McOyowo, Henry Okoyo, Cyrus Wekesa, "A Remote Access Security Model based on Vulnerability Management", International Journal of Information Technology and Computer Science, Vol.12, No.5, pp.38-51, 2020.
[27]Jolliffe I. (2011). Principal Component Analysis, In: Lovric M. (eds) International Encyclopedia of Statistical Science, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_455.
[28]Jolliffe, I. T. and Cadima, J. (2016). Principal component analysis: a review and recent developments, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374: 20150202, http://doi.org/10.1098/rsta.2015.0202.
[29]SaiSindhuTheja, R. and Gopal, K. S. (2020). A machine learning-based attack detection and mitigation using a secure SaaS framework, Journal of King Saud University - Computer and Information Sciences, ISSN 1319-1578, https://doi.org/10.1016/j.jksuci.2020.10.005.
[30]De Lucia, M. J., Maxwell, P. E., Bastian, N. D., Swami, A., Jalaian, B. and Leslie, N. (2021). Machine learning raw network traffic detection, Proc. SPIE 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, 117460V, https://doi.org/10.1117/12.2586114.
[31]Francisco, F., Frederico, S., Agostinho, J., Genoveva, v s. and Luiz S. (2019). Smart Detection: An Online Approach for DoS/DDoS Attack Detection Using Machine Learning, Security and Communication Networks, PP. 1-15. 10.1155/2019/1574749.
[32]He, Z., Zhang, T. and Lee, R. B. (2017). Machine Learning-Based DDoS Attack Detection from Source Side in Cloud, IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), PP. 114-120, doi: 10.1109/CSCloud.2017.58.
[33]Kumar, A., Glisson, W. and Hyuk, C. (2020). Network Attack Detection Using an Unsupervised Machine Learning Algorithm, 10.24251/HICSS.2020.795.