Work place: Modern Academy for Computer Science and Management Technology, Computer Science Department, Cairo, 11434, Egypt
E-mail: mgadallah1956@gmail.com
Website:
Research Interests: Image Processing, Pattern Recognition, Natural Language Processing, Computer Vision
Biography
Mahmoud Gadallah received the B.Sc. in electrical engineering in 1979, the M.Sc. in 1984 Faculty of Engineering, Cairo University, and Ph.D. in 1991 from Cranfield Institute of Technology (Cranfield University now), United Kingdom. He is now a professor at the modern academy for computer science and Management Technology, Cairo, Egypt. He has worked on several research topics as image processing, pattern recognition, computer vision, and natural language processing.
By Marina Azer Mohamed Taha Hala H. Zayed Mahmoud Gadallah
DOI: https://doi.org/10.5815/ijisa.2021.03.01, Pub. Date: 8 Jun. 2021
Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), K-nearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals