Evaluation of Machine Learning Techniques for Email Spam Classification

Full Text (PDF, 754KB), PP.35-42

Views: 0 Downloads: 0

Author(s)

Mahmoud Jazzar 1,* Rasheed F. Yousef 1 Derar Eleyan 1

1. Palestine Technical University – Kadoorie, Faculty of Graduate Studies, Tulkarem, P.O. Box 7, Palestine

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2021.04.04

Received: 3 Feb. 2021 / Revised: 17 Feb. 2021 / Accepted: 7 Mar. 2021 / Published: 8 Aug. 2021

Index Terms

Spam, spam filtering, machine learning algorithms, email classification.

Abstract

Electronic mail (Email) is one of the official and very common way of exchanging data and information over digital and electronic devices. Millions of users worldwide use email to exchange data and information between email servers. On the other hand, unwanted emails or spam became phenomenon challenging major companies and organizations due to the volume of spam which is increasing dramatically every year. Spam is annoying and may contain harmful contents. In addition, spam consume computers, servers, and network resources, causes harmful bottleneck, effect on computing memory and speed of digital devices. Moreover, the time consumed by the users to remove unwanted emails is huge. There are many methods developed to filter spam like keyword matching blacklist/whitelist and header information processing. Though, classical methods like blocking the source to prevent the spam are not effective. This study demonstrates and reviews the performance evaluation of the most popular and effective machine learning techniques and algorithms such as Support Vector Machine, ANN, J48, and Naïve Bayes for email spam classification and filtering. In con conclusion, support vector machine performs better than any individual algorithm in term of accuracy. This research contributes on the for the development of methods and techniques for better detection and prevention of spam.

Cite This Paper

Mahmoud Jazzar, Rasheed F. Yousef, Derar Eleyan, " Evaluation of Machine Learning Techniques for Email Spam Classification", International Journal of Education and Management Engineering (IJEME), Vol.11, No.4, pp. 35-42, 2021. DOI: 10.5815/ijeme.2021.04.04

Reference

[1]Statista, “Global Spam Volume,” https://www.statista.com/statistics/420391/spam-email-traffic-share/, Retrieved Dec 18, 2020.

[2]E. G. Dada, J. S. Bassi, H. Chiroma, S. M. Abdulhamid, A. O. Adetunmbi, & E. O. Ajibuwa, “Machine learning for email spam filtering: review, approaches and open research problems," Heliyon, 5(6), e01802. https://doi.org/10.1016/j.heliyon.2019.e01802

[3]B. Yu, Z. Xu, “A Comparative Study for Content-Based Dynamic Spam Classification Using Four Machine Learning Algorithms,” Knowledge-Based Systems, 21(4), 355–362. https://doi.org/10.1016/j.knosys.2008.01.001

[4]K. Tretyakov, “Machine Learning Techniques in Spam Filtering,” Data Mining Problem-oriented Seminar, MTAT.03.177, May 2004, pp. 60-79.

[5]A. Bhowmick, S. M. Hazarika, “E-Mail Spam Filtering: A Review of Techniques and Trends,” In Lecture Notes in Electrical Engineering, Springer Singapore, 2017; pp 583–590.

[6]S. M. Abdulhamid, M. Shuaib, O. Osho, I. Ismaila, J. K. Alhassan,"Comparative Analysis of Classification Algorithms for Email Spam Detection," International Journal of Computer Network and Information Security (IJCNIS), Vol.10, No.1, pp.60-67, 2018.DOI: 10.5815/ijcnis.2018.01.07 

[7]M. Zavvar, M. Rezaei, S. Garavand,"Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine,"International Journal of Modern Education and Computer Science (IJMECS), Vol.8, No.7, pp.68-74, 2016.DOI: 10.5815/ijmecs.2016.07.08 

[8]B. Nazlı, Y. Gültepe, H. Altural, " Classification of Coronary Artery Disease Using Different Machine Learning Algorithms, " International Journal of Education and Management Engineering (IJEME), Vol.10, No.4, pp.1-7, 2020. DOI: 10.5815/ijeme.2020.04.01 

[9]O. Oluwatoyin, A. Bodunde, G. Titus, A. Ganiyu, "An Improved Machine Learning-Based Short Message Service Spam Detection System," International Journal of Computer Network and Information Security (IJCNIS), Vol.11, No.12, pp.40-48, 2019. DOI: 10.5815/ijcnis.2019.12.05 

[10]D. K. Renuka, T. Hamsapriya, M. R. Chakkaravarthi and P. L. Surya, "Spam Classification Based on Supervised Learning Using Machine Learning Techniques," 2011 International Conference on Process Automation, Control and Computing, Coimbatore, 2011, pp. 1-7, doi: 10.1109/PACC.2011.5979035.

[11]A. W. Awad, “Machine Learning Methods for Spam E-Mail Classification,” International Journal of Computer Science and Information Technology, 3(1), 173–184. https://doi.org/10.5121/ijcsit.2011.3112

[12]UCI,  “UCI Machine Learning Repository,” https://archive.ics.uci.edu/ml/index.php, Retrieved Dec 12, 2020.

[13]M. Shajideen and B. V., "Spam Filtering: A Comparison Between Different Machine Learning Classifiers," 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, 2018, pp. 1919-1922, doi: 10.1109/ICECA.2018.8474778.

[14]S. Jukic, J. Azemovic, D. keco, and J. Kevric, “Comparison if Machine Learning Techniques in Spam E-mail Classification,” Southeast Europe Journal of Soft Computing,” Vol. 4 No.1, March 2015. 

[15]A.S. Aski, and K. N. Sourati, “Proposed efficient algorithm to filter spam using machine learning techniques,” Pacific Science Review A: Natural Science and Engineering, 18(2), 145–149. https://doi.org/10.1016/j.psra.2016.09.017