Seeja K. R

Work place: Department of Computer Science & Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India

E-mail: krseeja@gmail.com

Website:

Research Interests: Bioinformatics, Data Mining, Data Structures and Algorithms, Algorithm Design

Biography

Seeja.K.R received her Ph.D. degree in Computer Science from Jamia Hamdard University, New Delhi, India, in July 2010. She is currently working as associate professor in the Department of Computer Science & Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India. Her research interests include data mining, algorithm design, bioinformatics and NP-Complete problems.

Author Articles
Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection

By Masoumeh Zareapoor Seeja K. R

DOI: https://doi.org/10.5815/ijieeb.2015.02.08, Pub. Date: 8 Mar. 2015

Dimensionality reduction is generally performed when high dimensional data like text are classified. This can be done either by using feature extraction techniques or by using feature selection techniques. This paper analyses which dimension reduction technique is better for classifying text data like emails. Email classification is difficult due to its high dimensional sparse features that affect the generalization performance of classifiers. In phishing email detection, dimensionality reduction techniques are used to keep the most instructive and discriminative features from a collection of emails, consists of both phishing and legitimate, for better detection. Two feature selection techniques - Chi-Square and Information Gain Ratio and two feature extraction techniques – Principal Component Analysis and Latent Semantic Analysis are used for the analysis. It is found that feature extraction techniques offer better performance for the classification, give stable classification results with the different number of features chosen, and robustly keep the performance over time.

[...] Read more.
Other Articles