Work place: Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
E-mail: hanan.hindy@cis.asu.edu.eg
Website:
Research Interests: Intrusion Detection System
Biography
Hanan Hindy is a Lecturer at the Computer Science department at the Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt. Hanan did her Ph.D. at the Division of Cyber-Security at Abertay University, Scotland, UK. Hanan received her bachelor’s degree with honors (2012) and master’s (2016) degrees in Computer Science from the Faculty of Computer and Information Sciences at Ain Shams University, Cairo, Egypt. Her research interests include Machine and Deep Learning, Intrusion Detection Systems, and Cyber Security.
By Zeyad Ahmed Mariam Zeyada Youssef Amin Donia Gamal Hanan Hindy
DOI: https://doi.org/10.5815/ijeme.2023.05.03, Pub. Date: 8 Oct. 2023
Machine Reading Comprehension (MRC), known as the ability of computers to read and understand unstructured text and then answer questions, is still an open research field. MRC is considered one of the most research-demanding sub-tasks in Natural Language Processing (NLP) and Natural Language Understanding (NLU). MRC introduces multiple research challenges. One of these challenges is that the models should be trained to answer all questions and abstain from answering when the answer is not covered in the given context. Another challenge lies in dataset availability. These challenges are amplified for non-Latin-based languages; Arabic as an example. Currently, available Arabic MCR datasets are either small-sized high-quality collections or large-sized low-quality datasets. Additionally, they do not include unanswerable questions. This lack of resources depicts the model as incapable of real-world deployments. To tackle these challenges, this paper proposes a novel large-size high-quality Arabic MRC dataset that includes unanswerable questions, named “Arabic-SQuAD v2.0'”. The dataset consists of 96051 triplets {question, context, answer} in an attempt to help enrich the field of Arabic-MRC. Furthermore, a Machine Learning (ML)-based model is introduced that is capable of effectively solving Arabic MRC-with-unanswerable questions. The results of the proposed model are satisfactory and comparable with Latin-based language models. Furthermore, the results show a significant improvement of the current state-of-the-art Arabic MRC. To be exact, the model scores 71.49 F1-score and 65.12 Exact Match (EM). This proposed dataset and implementation pave the way to further Arabic MRC; aiming to reach a state when MRC models could mimic human text reasoning.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals