Ensem_SLDR: Classification of Cybercrime using Ensemble Learning Technique

Full Text (PDF, 265KB), PP.81-90

Views: 0 Downloads: 0

Author(s)

Hemakshi Pandey 1,* Riya Goyal 1 Deepali Virmani 1 Charu Gupta 1

1. Department of Computer Science Engineering, Bhagwan Parshuram Institute of Technology, New Delhi-110089, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2022.01.07

Received: 8 Jun. 2021 / Revised: 21 Aug. 2021 / Accepted: 14 Oct. 2021 / Published: 8 Feb. 2022

Index Terms

Cybercrime, Bag of Words, Ensemble Learning, Machine Learning, Natural Language Processing

Abstract

With the advancement of technology, cybercrimes are surging at an alarming rate as miscreants pour into the world's modern reliance on the virtual platform. Due to the accumulation of an enormous quantity of cybercrime data, there is huge potential to analyze and segregate the data with the help of Machine Learning. The focus of this research is to construct a model, Ensem_SLDR which can predict the relevant sections of IT Act 2000 from the compliant text/subjects with the aid of Natural Language Processing, Machine Learning, and Ensemble Learning methods. The objective of this paper is to implement a robust technique to categorize cybercrime into two sections, 66 and 67 of IT Act 2000 with high precision using ensemble learning technique. In the proposed methodology, Bag of Words approach is applied for performing feature engineering where these features are given as input to the hybrid model Ensem_SLDR. The proposed model is implemented with the help of model stacking, comprising Support Vector Machine (SVM), Logistic Regression, Decision Tree, and Random Forest and gave better performance by having 96.55 % accuracy, which is higher and reliable than the past models implemented using a single learning algorithm and some of the existing hybrid models. Ensemble learning techniques enhance model performance and robustness. This research is beneficial for cyber-crime cells in India, which have a repository of detailed information on cybercrime including complaints and investigations. Hence, there is a need for model and automation systems empowered by artificial intelligence technologies for the analysis of cybercrime and their classification of its sections.

Cite This Paper

Hemakshi Pandey, Riya Goyal, Deepali Virmani, Charu Gupta, "Ensem_SLDR: Classification of Cybercrime using Ensemble Learning Technique", International Journal of Computer Network and Information Security(IJCNIS), Vol.14, No.1, pp.81-90, 2022. DOI: 10.5815/ijcnis.2022.01.07

Reference

[1] van Banerveld M, Kechadi M-T, Le-Khac N-A (2016) A Natural Language Processing Tool for White Collar Crime Investigation. In: Hameurlain A, Küng J, Wagner R, et al (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIII. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 1–22

[2] Deshpande DrAdvMrsN, V. P. Institute of Management Studies and Research, Sangli, Affiliated to Shivaji University, Kolhapur, Maharashtra, India (2018) A Brief Study on Cyber Crimes and IT Act in India. Int J Trend Sci Res Dev Special Issue:141–149. https://doi.org/10.31142/ijtsrd18693

[3] Ngejane CH, Mabuza-Hocquet G, Eloff JHP, Lefophane S (2018) Mitigating Online Sexual Grooming Cybercrime on Social Media Using Machine Learning: A Desktop Survey. In: 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD). IEEE, Durban, South Africa, pp 1–6

[4] Haidar B, Chamoun M, Serhrouchni A (2017) A Multilingual System for Cyberbullying Detection: Arabic Content Detection using Machine Learning. Adv Sci Technol Eng Syst J 2:275–284. https://doi.org/10.25046/aj020634

[5] Sudha TS, Rupa C (2019) Analysis and Evaluation of Integrated Cyber Crime Offences. In: 2019 Innovations in Power and Advanced Computing Technologies (i-PACT). pp 1–6

[6] Ch R, Gadekallu TR, Abidi MH, Al-Ahmari A (2020) Computational System to Classify Cyber Crime Offenses using Machine Learning. Sustainability 12:4087. https://doi.org/10.3390/su12104087

[7] Džeroski S, Ženko B (2004) Is Combining Classifiers with Stacking Better than Selecting the Best One? Mach Learn 54:255–273. https://doi.org/10.1023/B:MACH.0000015881.36452.6e

[8] Kumari S, Saquib Z, Pawar S (2018) Machine Learning Approach for Text Classification in Cybercrime. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). IEEE, Pune, India, pp 1–6

[9] Andleeb S, Ahmed R, Ahmed Z, Kanwal M (2019) Identification and Classification of Cybercrimes using Text Mining Technique. In: 2019 International Conference on Frontiers of Information Technology (FIT). IEEE, Islamabad, Pakistan, pp 227–2275

[10] Department of Computer Science, Christ University, Bengaluru-560029, India, Cardoza C, Wagh R, Department of Computer Science, Christ University, Bengaluru-560029, India (2017) Text analysis framework for understanding cyber-crimes. Int J Adv Appl Sci 4:58–63. https://doi.org/10.21833/ijaas.2017.010.010

[11] Lekha KC, Prakasam S (2017) Data mining techniques in detecting and predicting cyber crimes in banking sector. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). IEEE, Chennai, pp 1639–1643

[12] Fauzi MA, Yuniarti A (2018) Ensemble Method for Indonesian Twitter Hate Speech Detection. Indones J Electr Eng Comput Sci 11:294. https://doi.org/10.11591/ijeecs.v11.i1.pp294-299

[13] Ubing AA, Kamilia S, Abdullah A, et al (2019) Phishing Website Detection: An Improved Accuracy through Feature Selection and Ensemble Learning. Int J Adv Comput Sci Appl 10:. https://doi.org/10.14569/IJACSA.2019.0100133

[14] Ingole P, Bhoir S, Vidhate AV (2018) Hybrid Model for Text Classification. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, Coimbatore, pp 450–458

[15] Kanakaraj M, Guddeti RMR (2015) Performance Analysis of Ensemble Methods on Twitter Sentiment Analysis using NLP Techniques. 2

[16] Han P, Shen S, Wang D, Liu Y The Influence of Word Normalization in English Document Clustering. 5

[17] Wang F, Wang Z, Li Z, Wen J-R (2014) Concept-based Short Text Classification and Ranking. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM ’14. ACM Press, Shanghai, China, pp 1069–1078

[18] Bian W, Wang C, Ye Z, Yan L (2019) Emotional Text Analysis Based on Ensemble Learning of Three Different Classification Algorithms. In: 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). IEEE, Metz, France, pp 938–941

[19] Khanday AMUD, Rabani ST, Khan QR, et al (2020) Machine learning based approaches for detecting COVID-19 using clinical text data. Int J Inf Technol 12:731–739. https://doi.org/10.1007/s41870-020-00495-9

[20] Li Y, Chen W (2020) A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics 8:1756. https://doi.org/10.3390/math8101756

[21] Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38:223–230. https://doi.org/10.1016/j.eswa.2010.06.048

[22] Brown G (2010) Ensemble Learning. In: Sammut C, Webb GI (eds) Encyclopedia of Machine Learning. Springer US, Boston, MA, pp 312–320

[23] O. O. Olasehinde, O. V. Johnson and O. C. Olayemi, "Evaluation Of Selected Meta Learning Algorithms For The Prediction Improvement Of Network Intrusion Detection System," 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), Ayobo, Nigeria, 2020, pp. 1-7, doi: 10.1109/ICMCECS47690.2020.240893.

[24] Amit Pandey, Achin Jain,"Comparative Analysis of KNN Algorithm using Various Normalization Techniques", International Journal of Computer Network and Information Security, Vol.9, No.11, pp.36-42, 2017.

[25] Shubham Bauskar, Vijay Badole, Prajal Jain, Meenu Chawla, " Natural Language Processing based Hybrid Model for Detecting Fake News Using Content-Based Features and Social Features", International Journal of Information Engineering and Electronic Business, Vol.11, No.4, pp. 1-10, 2019.

[26] Volodymyr Tolubko, Viktor Vyshnivskyi, Vadym Mukhin, Halyna Haidur, Nadiia Dovzhenko, Oleh Ilin, Volodymyr Vasylenko, "Method for Determination of Cyber Threats Based on Machine Learning for Real-Time Information System", International Journal of Intelligent Systems and Applications, Vol.10, No.8, pp.11-18, 2018.

[27] Semih Sevim, Sevinç İlhan Omurca, Ekin Ekinci, "An Ensemble Model using a BabelNet Enriched Document Space for Twitter Sentiment Classification", International Journal of Information Technology and Computer Science, Vol.10, No.1, pp.24-31, 2018.

[28] Raghad Khweiled, Mahmoud Jazzar, Derar Eleyan, "Cybercrimes during COVID -19 Pandemic ", International Journal of Information Engineering and Electronic Business, Vol.13, No.2, pp. 1-10, 2021.

[29] Dimple Tiwari, Nanhay Singh, "Ensemble Approach for Twitter Sentiment Analysis", International Journal of Information Technology and Computer Science, Vol.11, No.8, pp.20-26, 2019.

[30] Mohammad Mojaveriyan, Hossein Ebrahimpour-komleh, Seyed jalaleddin Mousavirad,"IGICA: A Hybrid Feature Selection Approach in Text Categorization", International Journal of Intelligent Systems and Applications, Vol.8, No.3, pp.42-47, 2016.

[31] Bodunde Akinyemi, Oluwakemi Adewusi, Adedoyin Oyebade, "An Improved Classification Model for Fake News Detection in Social Media", International Journal of Information Technology and Computer Science, Vol.12, No.1, pp.34-43, 2020.

[32] Yasin Görmez, Yunus E. Işık, Mustafa Temiz, Zafer Aydın, "FBSEM: A Novel Feature-Based Stacked Ensemble Method for Sentiment Analysis’ Comments in E-Government", International Journal of Information Technology and Computer Science, Vol.12, No.6, pp.11-22, 2020.