An Effective Text Classifier using Machine Learning for Identifying Tweets’ Polarity Concerning Terrorist Connotation

Full Text (PDF, 333KB), PP.19-29

Views: 0 Downloads: 0

Author(s)

Norah AL-Harbi 1,* Amirrudin Bin Kamsin 2

1. Faculty of Computer Science and Information Teqnology, Taif University, Taif, Saudi Arabia

2. Faculty of Computer Science and Information Teqnology, University of Malaya, Kuala Lumpur, Malysia

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2021.05.02

Received: 14 Apr. 2021 / Revised: 5 Jun. 2021 / Accepted: 27 Jun. 2021 / Published: 8 Oct. 2021

Index Terms

Twitter, machine language, Terrorism, Arabic, messaging

Abstract

Terrorist groups in the Arab world are using social networking sites like Twitter and Facebook to rapidly spread terror for the past few years. Detection and suspension of such accounts is a way to control the menace to some extent. This research is aimed at building an effective text classifier, using machine learning to identify the polarity of the tweets automatically. Five classifiers were chosen, which are AdB_SAMME, AdB_SAMME.R, Linear SVM, NB, and LR. These classifiers were applied on three features namely S1 (one word, unigram), S2 (word pair, bigram), and S3 (word triplet, trigram). All five classifiers evaluated samples S1, S2, and S3 in 346 preprocessed tweets. Feature extraction process utilized one of the most widely applied weighing schemes tf-idf (term frequency-inverse document frequency).The results were validated by four experts in Arabic language (three teachers and an educational supervisor in Saudi Arabia) through a questionnaire. The study found that the Linear SVM classifier yielded the best results of 99.7 % classification accuracy on S3 among all the other classifiers used. When both classification accuracy and time were considered, the NB classifier demonstrated the performance on S1 with 99.4% accuracy, which was comparable with Linear SVM. The Arab world has faced massive terrorist attacks in the past, and therefore, the research is highly significant and relevant due to its specific focus on detecting terrorism messages in Arabic. The state-of-the-art methods developed so far for tweets classification are mostly focused on analyzing English text, and hence, there was a dire need for devising machine learning algorithms for detecting Arabic terrorism messages. The innovative aspect of the model presented in the current study is that the five best classifiers were selected and applied on three language models S1, S2, and S3. The comparative analysis based on classification accuracy and time constraints proposed the best classifiers for sentiment analysis in the Arabic language.

Cite This Paper

Norah AL-Harbi, Amirrudin Bin Kamsin, "An Effective Text Classifier using Machine Learning for Identifying Tweets’ Polarity Concerning Terrorist Connotation", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.5, pp.19-29, 2021. DOI:10.5815/ijitcs.2021.05.02

Reference

[1] Ali, G. A. (2016). Identifying Terrorist Affiliations through Social Network Analysis Using Data Mining Techniques.‏
[2] Alsaedi, N. (2017). Event Identification in Social Media using Classification-Clustering Framework (Doctoral dissertation, Cardiff University). Retrieved from https://orca.cf.ac.uk/100998/1/2017alsaedinphd.pdf.
[3] Alshari, E. M., Azman, A., Doraisamy, S., Mustapha, N., & Alkeshr, M. (2017, August). Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features. In 28th International Workshop on Database and Expert Systems Applications (DEXA), held on 29 August 2017 (pp. 123-126). IEEE. Retrieved from https://www.uni-weimar.de/medien/webis/events/tir-17/tir17-talks/Eissa2017_improvement-of-sentiment-analysis-based-on-clustering-of-word2vec-featuers_presentation.pdf.
[4] Ashcroft, M., Fisher , A., Kaati, L., & Omer, E. (2015, september). Detecting Jihadists messages on Twitter. Retrieved August 15, 2020, from researchgate.net: http://www.researchgate.net.
[5] Beninati, J. A. (2016). Examining the cyber operations of ISIS (Doctoral dissertation, Utica College).
[6] Choi, D., Ko (Fraiwan, 2020), B., Kim, H., & Kim, P. (2014). Text analysis for detecting terrorism-related articles on the web. Journal of Network and Computer Applications, 38, 16-21.‏
[7] Fraiwan, M. (2020, April 10). identification of markers in twitter data foe data extraction. Retrieved August 15, 2020, from sciencedirect.com: https://www.sciencedirect.com.
[8] Gates, S., & Podder, S. (2015). Social media, recruitment, allegiance and the Islamic State. Perspectives on Terrorism, 9(4), 107-116.
[9] Magdy, W., Darwish, K., & Weber, I. (2015). # FailedRevolutions: Using Twitter to study the antecedents of ISIS support. arXiv preprint arXiv:1503.02401.‏
[10] Omer, E. (2015). Using machine learning to identify jihadist messages on Twitter. Retrieved from https://www.diva-portal.org/smash/get/diva2:846343/FULLTEXT01.pdf.
[11] Oh, O., Agrawal, M., & Rao, H. R. (2011). Information control and terrorism: Tracking the Mumbai terrorist attack through twitter. Information Systems Frontiers, 13(1), 33-43.
[12] Salmi, Abdul Latif (2016). The violence of language in the speech of extremist organizations "ISIS" a model of research in semantic and rhetorical mechanisms. Journal of Semiconductors ( pp. 76-95).
[13] Witmer, E. W. (2016). Terror on Twitter: A Comparative Analysis of Gender and the Involvement in Pro-Jihadist Communities on Twitter. Retrieved from https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=1008&context=sociology_masrp.
[14] Donia Gamal, Marco Alfonse, El-Sayed M. El-Horbaty, Abdel-Badeeh M.Salem, "Twitter Benchmark Dataset for Arabic Sentiment Analysis", International Journal of Modern Education and Computer Science, Vol.11, No.1, pp. 33-38,
2019.
[15] Sudhir Kumar Sharma, Ximi Hoque,"Sentiment Predictions using Support Vector Machines for Odd-Even Formula in Delhi", International Journal of Intelligent Systems and Applications, Vol.9, No.7, pp.61-69, 2017.
[16] Shah Zaib, Muhammad Asif, Maha Arooj, "Development of Aggression Detection Technique in Social Media", International Journal of Information Technology and Computer Science, Vol.11, No.5, pp.40-46, 2019.
[17] Arnisha Akhter, Uzzal K. Acharjee, Md Masbaul A. Polash," Cyber Bullying Detection and Classification using Multinomial Naïve Bayes and Fuzzy Logic", International Journal of Mathematical Sciences and Computing, Vol.5, No.4, pp.1-12, 2019.
[18] S. Shead, “Facebook, Tiktok won’t Lift Ban on Posts that Promote Taliban after the Fall of Afghanistan,” 2021. Retrieved from https://www.cnbc.com/2021/08/17/taliban-content-banned-on-facebook-instagram-whatsapp.html
[19] P. Hall, “Why Twitter Allows the Taliban to Maintain Accounts on its platform,” 2021. Retrieved from https://finance.yahoo.com/news/why-twitter-allows-taliban-maintain-142546810.html