E-Mail Spam Detection Using Refined MLP with Feature Selection

Full Text (PDF, 855KB), PP.42-52

Views: 0 Downloads: 0

Author(s)

Harjot Kaur 1,* Er. Prince Verma 1

1. CT Group of Institution/CSE, Jalandhar, 144041, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2017.09.05

Received: 28 Mar. 2016 / Revised: 20 Jun. 2016 / Accepted: 22 Aug. 2017 / Published: 8 Sep. 2017

Index Terms

Data Mining, Knowledge Discovery (KDD) Process, E-Mail, Spam, Ham, Spam Filter, NGram based feature selection, Multi-Layer Perceptron Neural Network (MLP-NN) and Support Vector Machine (SVM) classification algorithms.

Abstract

Electronic Mail (E-mail) has established a significant place in information user’s life. E-Mails are used as a major and important mode of information sharing because emails are faster and effective way of communication. Email plays its important role of communication in both personal and professional aspects of one’s life. The rapid increase in the number of account holders from last few decades and the increase in the volume of emails have generated various serious issues too. Emails are categorised into ham and spam emails. From past decades spam emails are spreading at a tremendous rate. These spam emails are illegitimate and unwanted emails that may contain junk, viruses, malicious codes, advertisements or threat messages to the authenticated account holders. This serious issue has generated a need for efficient and effective anti-spam filters that filter the email into spam or ham email. Spam filters prevent the spam emails from getting into user’s inbox. Email spam filters can filter emails on content base or on header base. Various spam filters are labelled into two categories learning and non-machine learning techniques. This paper will discuss the process of filtering the emails into spam and ham using various techniques.

Cite This Paper

Harjot Kaur, Er. Prince Verma, " E-Mail Spam Detection Using Refined MLP with Feature Selection ", International Journal of Modern Education and Computer Science(IJMECS), Vol.9, No.9, pp. 42-52, 2017. DOI:10.5815/ijmecs.2017.09.05

Reference

[1]B. Yu and Z. Xu, “A comparative study for content-based dynamic spam classification using four machine learning algorithms,” Knowledge-Based System-Elsevier, vol. 21, pp. 355–362, May 2008.
[2]T. A. Almeida and A. Yamakami, “Content-Based Spam Filtering,” in International Joint Conference on Neural Networks (IJCNN) - IEEE, pp. 1-7, 2010.
[3]L. Firte, C. Lemnaru, R. Potolea, “Spam Detection Filter using KNN Algorithm and Resampling,” in 6th International Conference on Intelligent Computer Communication and Processing- IEEE, pp.27-33, 2010.
[4]D. K. Renuka, T. Hamsapriya, M. R. Chakkaravarthi and P. L. Surya, “Spam Classification Based on Supervised Learning Using Machine Learning Techniques,” in 2011 International Conference on Process Automation, Control and Computing- IEEE, pp. 1–7, 2011.
[5]R. Shams and R. E. Mercer, “Classifying Spam Emails using Text and Readability Features,” in International Conference on Data Mining (ICDM) -IEEE, pp. 657–666, 2013.
[6]M. Rathi and V. Pareek, “Spam Email Detection through Data Mining - A Comparative Performance Analysis,” International Journal of Modern Education and Computer Science (IJMECS), vol. 12, pp. 31-39, December 2013.
[7]A. Harisinghaney, A. Dixit, S. Gupta, and Anuja Arora, “Text and Image based Spam Email Classification using KNN, Naïve Bayes and Reverse DBSCAN Algorithm,” in International Conference on Reliability, Optimization and Information Technology (ICROIT)-IEEE, pp.153-155, 2014.
[8]S. P. Teli and S. K. Biradar, “Effective Email Classification for Spam and Non- spam,” International Journal of Advanced Research in Computer and Software Engineering, vol. 4, June 2014.
[9]I. Alsmadi and I. Alhami, “Clustering and classification of email contents,” Journal of King Saud University - Computer and Information Science -Elsevier, vol. 27, no. 1, pp. 46–57, January 2015.
[10]A. S. Aski and N. K. Sourati, “Proposed efficient algorithm to filter spam using machine learning techniques,” Pacific Science Review- A Natural Science Engineering- Elsevier., vol. 18, no. 2, pp. 145–149, July 2016.
[11]M. Prilepok, and P. Berek, “Spam Detection Using Data Compression And Signatures And Signatures,” Cybernetics and Systems: An International Journal, vol. 44, pp. 533–549, August 2014.
[12]G. Kaur, R. K. Gurm, “A Survey on Classification Techniques in Internet Environment”, International Journal of Advance Research in Computer and Communication Engineering, vol. 5, no. 3, pp. 589–593, March 2016.
[13]P. Verma and D. Kumar, “Association Rule Mining Algorithm’s Variant Analysis,” International Journal of Computer Application (IJCA), vol. 78, no. 14, pp. 26–34, September 2013.
[14]Rekha and S. Negi, “A Review on Different Spam Detection Approaches,” International Journal of Engineering Trends and Technology (IJETT), vol.11, no.6, May 2014
[15]Z. Elberrichi and B. Aljohar, “N-grams in Texts Categorization,” Scientific Journal of King Faisal University (Basic and Applied Sciences), vol. 8, no. 2, pp. 25–39, 2007.
[16]D. Jurafsky and J. H. Martin, “N-Gram,” Speech and Language Processing, 2014.
[17]J. Clark, I. Koprinska and J.Poon, “A Neural Network-Based Approach to Automated email classification,” in WIC International Conference on Web Intelligence –IEEE, 2003.
[18]S. Karamizadeh, S. M. Abdullah, M. Halimi, J. Shayan, and M. J. Rajabi, “Advantage and drawback of support vector machine functionality,” in 1st International Conference on Computer Communication and Control Technology- IEEE, pp. 63–65, June 2014.
[19]A. Fatima, N. Nazir, and M. G. Khan, "Data Cleaning in Data Warehouse: A Survey of Data Pre-processing Techniques and Tools", International Journal of Information Technology and Computer Science (IJITCS), Vol.9, No.3, pp.50-61, 2017. DOI: 10.5815/ijitcs.2017.03.06
[20]M. Iqbal, M. M. Abid, M. Ahmad, and F. Khurshid,"Study on the Effectiveness of Spam Detection Technologies", International Journal of Information Technology and Computer Science (IJITCS), Vol.8, No.1, pp.11-21, 2016. DOI: 10.5815/ijitcs.2016.01.02.
[21]A. Naik, “Density Based Clustering Algorithm,” 06-Dec-2010.[Online].Available:https://sites.google.com/site/dataclusteringalgorithms/density-based-clustering-algorithm. [Accessed: 15-Jan-2017].
[22]M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software,” SIGKDD Exploration Newsletter., vol. 11, no. 1, pp. 1-10, 2009.
[23]R. Ng and J. Han,” Efficient and Effective Clustering Method for Spatial Data Mining,” In - 20th VLDB Conference, pp. 144-155, Santiago, Chile,1994.
[24]Cios, K. J., W. Pedrycz, et al., Data Mining Methods for Knowledge Discovery, vol. 458, Springer Science & Business Media, 2012.
[25]S. Dixit, and N. Gwal, "An Implementation of Data Pre-Processing for Small Dataset," International Journal of Computer Application (IJCA), vol. 10, no. 6, pp. 28-3, Oct. 2014.
[26]S. Singhal and M. Jena, “A Study on WEKA Tool for Data Pre-processing, Classification and Clustering,” International Journal of Innovative Technology and Exploration Engineering, vol. 2, no. 6, pp. 250–253, May 2013.
[27]O. Y. Alshamesti, and I. M. Romi, “Optimal Clustering Algorithms for Data Mining” Int. Journal of Info. Eng. and Electron. Bus. (IJIEEB), vol. 5, no. 2, pp. 22-27, Aug 2013. “DOI: 10.5815/ijieeb.2013.02.04’’.