Marina Azer; Mohamed Taha; Hala H. Zayed; Mahmoud Gadallah

Credibility Detection on Twitter News Using Machine Learning Approach

Full Text (PDF, 284KB), PP.1-10

Views: 0 Downloads: 0

Author(s)

Marina Azer ^1,* Mohamed Taha ² Hala H. Zayed ² Mahmoud Gadallah ¹

1. Modern Academy for Computer Science and Management Technology, Computer Science Department, Cairo, 11434, Egypt

2. Benha University, Faculty of Computers and Artificial intelligence, Computer Science Department, Benha, 13518, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2021.03.01

Received: 24 Jan. 2021 / Revised: 11 Feb. 2021 / Accepted: 16 Mar. 2021 / Published: 8 Jun. 2021

Index Terms

Twitter, Credibility Detection, Machine Learning, Content-Based Features, User-Based Features

Abstract

Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), K-nearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.

Cite This Paper

Marina Azer, Mohamed Taha, Hala H. Zayed, Mahmoud Gadallah, "Credibility Detection on Twitter News Using Machine Learning Approach", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.3, pp.1-10, 2021. DOI:10.5815/ijisa.2021.03.01

Reference

[1]Sitaula, Niraj, et al. "Credibility-based fake news detection." Disinformation, Misinformation, and Fake News in Social Media. Springer, Cham, 2020. 163-182.
[2]H. Allcott, and M. Gentzkow. "Social media and fake news in the 2016 election." Journal of economic perspectives Vol. 31, No. 2, pp. 211- 36, 2017.
[3]Z. Ashktorab, C. Brown, M. Nandi, and A. Culotta. "Tweedr: Mining twitter to inform disaster response." In ISCRAM, 2014.
[4]R. El Ballouli, W. El-Hajj, A. Ghandour, S. Elbassuoni, H. Hajj, and K. Shaban. "CAT: Credibility Analysis of Arabic Content on Twitter." In Proceedings of the Third Arabic Natural Language Processing Workshop, pp. 62-71, 2017. Nb
[5]A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. "Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy." In Proceedings of the 22nd international conference on World Wide Web, pp. 729-736. ACM, 2013.
[6]C. Castillo, M. Mendoza, and B. Poblete. "Information credibility on twitter." In Proceedings of the 20th international conference on World wide web, pp. 675-684. ACM, 2011.
[7]C. Castillo, M. Mendoza, and B. Poblete. "Predicting information credibility in timesensitive social media." Internet Research Vol. 23, No. 5, pp. 560-588, 2013.
[8]A. Gupta, and P. Kumaraguru. "Credibility ranking of tweets during high impact events." In Proceedings of the 1st workshop on privacy and security in online social media, p. 2, ACM, 2012.
[9]K. Lorek, J. Suehiro-Wiciński, M. JankowskiLorek, and A. Gupta. "Automated credibility assessment on Twitter." Computer Science Vol. 16, No. 2, pp. 157-168, 2015.
[10]A. Zubiaga, M. Liakata, and R. Procter. "Exploiting context for rumour detection in social media." In International Conference on Social Informatics, pp. 109-123. Springer, Cham, 2017.
[11]N. Hassan, W. Gomaa, G. Khoriba, and M. Haggag. "Supervised Learning Approach for Twitter Credibility Detection." In 2018 13th International Conference on Computer Engineering and Systems (ICCES), pp. 196-201. IEEE, 2018.
[12]S. Sabbeh, and S. Baatwah. "Arabic news credibility on twitter: an enhanced model using hybrid features.", journal of theoretical & applied information technology Vol. 96, No. 8, 2018.
[13]Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Information credibility on twitter." Proceedings of the 20th international conference on World wide web. ACM, 2011.
[14]Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013): 560-588.
[15]Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during high impact events." Proceedings of the 1st workshop on privacy and security in online social media. ACM, 2012.
[16]Zubiaga, Arkaitz, Maria Liakata, and Rob Procter. "Learning reporting dynamics during breaking news for rumour detection in social media." arXiv preprint arXiv:1610.07363 (2016).
[17]Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during high impact events." Proceedings of the 1st workshop on privacy and security in online social media. ACM, 2012.
[18]Lorek, Krzysztof, et al. "Automated credibility assessment on Twitter." Computer Science 16.2) (2015): 157-168.
[19]El Ballouli, Rim, et al. "CAT: Credibility Analysis of Arabic Content on Twitter." WANLP 2017 (co-located with EACL 2017) (2017): 62.
[20]Alrubaian, Majed, et al. "Reputation'based credibility analysis of Twitter social network users." Concurrency and Computation: Practice and Experience 29.7 (2017): e3873.
[21]Zubiaga, Arkaitz, et al. "Pheme dataset of rumours and non-rumours." Figshare. Dataset (2016).
[22]Zubiaga, Arkaitz, et al. "Detection and resolution of rumours in social media: A survey." ACM Computing Surveys (CSUR) 51.2 (2018): 1-36.
[23]Sedhai, Surendra, and Aixin Sun. "Semi-supervised spam detection in Twitter stream." IEEE Transactions on Computational Social Systems 5.1 (2017): 169-175.
[24]Sato, Koichi, Junbo Wang, and Zixue Cheng. "Credibility Evaluation of Twitter-Based Event Detection by a Mixing Analysis of Heterogeneous Data." IEEE Access 7 (2018): 1095-1106.
[25]Kula, Sebastian, et al. "Sentiment analysis for fake news detection by means of neural networks." International Conference on Computational Science. Springer, Cham, 2020.
[26]Akinyemi, Bodunde, Oluwakemi Adewusi, and Adedoyin Oyebade. "An Improved Classification Model for Fake News Detection in Social Media.", international journal of Information Technology and Computer Science, Vol.12, No.1, PP.34-43, 2020.
[27]Shubham Bauskar, Vijay Badole, Prajal Jain, Meenu Chawla "Natural language processing based hybrid model for detecting fake news using content-based features and social features." International Journal of information Engineering and Electronic Business, Vol.11, No.1, PP.1-10, 2019.
[28]Ali M. Meligy, Hani M. Ibrahim, Mohamed F. Torky" Identity Verification Mechanism for Detecting Fake Profiles in Online Social Networks." International Journal of Computer Network and Information Security, Vol.7, No.1, PP.31-39, 2014.
[29]Priya Gupta, Aditi Kamra, Richa Thakral, Mayank Aggarwal, Sohail Bhatti, Vishal Jain "A Proposed Framework to Analyze Abusive Tweets on the Social Networks." International Journal of Modern Education & Computer Science, Vol. 10, No. 1, PP.46-56, 2018.
[30]Naznin Sultana, Sellappan Palaniappan, "Deceptive Opinion Detection Using Machine Learning Techniques", International Journal of Information Engineering and Electronic Business, Vol.12, No.1, pp. 1-7, 2020.
[31]Hassan, Noha Y., et al. "Supervised learning approach for twitter credibility detection." 2018 13th International Conference on Computer Engineering and Systems (ICCES). IEEE, 2018.
[32]Zubiaga, Arkaitz, Maria Liakata, and Rob Procter. "Exploiting context for rumour detection in social media." International Conference on Social Informatics. Springer, Cham, 2017.
[33]Shu, Kai, et al. "Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media." Big Data 8.3 (2020): 171-188.
[34]Shahi, Gautam Kishore, and Durgesh Nandini. "FakeCovid--A Multilingual Cross-domain Fact Check News Dataset for COVID-19." arXiv preprint arXiv:2006.11343 (2020).
[35]Zhou, Xinyi, et al. "Recovery: A multimodal repository for covid-19 news credibility research." Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020.
[36]Memon, Shahan Ali, and Kathleen M. Carley. "Characterizing covid-19 misinformation communities using a novel twitter dataset." arXiv preprint arXiv:2008.00791 (2020).

International Journal of Intelligent Systems and Applications (IJISA)