Closed Domain Question Answering System Tailored for Crime Events Using Deep Learning for Both Statistical and Contextualized Responses

PDF (1002KB), PP.47-65

Views: 0 Downloads: 0

Author(s)

Dipti Pawade 1,* Sonali Patil 1 Chaitanya Bandiwdekar 1 Siddhesh Bagwe 1 Pooja Kaulgud 1 Aditi Kulkarni 1

1. Department of Information Technology, K J Somaiya College of Engineering, Mumbai, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2024.04.04

Received: 6 Apr. 2024 / Revised: 24 May 2024 / Accepted: 22 Jun. 2024 / Published: 8 Aug. 2024

Index Terms

Question Answer System, Statistical Questions, Contextual Questions, Criminal activity, RoBERTa, GPT-3

Abstract

The goal of the question-answering system is to respond to user queries expressed in natural language. Unlike search engines, the closed domain question answering systems are specialized to specific domains, providing concise and precise answers often derived from structured data. This paper focuses on a question-answering system tailored for crime events, capable of addressing both statistical and contextual inquiries. In terms of crime statistics, the fine-tuned GPT-3 model outperforms the USE, TAPAS, TAPEX, and GPT-3 models, while for context-based crime-related queries, the fine-tuned RoBERTa model surpasses the BERT and RoBERTa models. This system is capable of providing the responses in natural language format, supplemented with relevant data visualizations. The models are train on Q2A and NewsQA datasets while it is tested on NCRB and NewsTimes datasets. The Q2A and NCRB datasets are used for statistical queries while NewsQA and NewsTimes datasets are used for contextual inquiries. The paper presents an analysis of various models and showcases results for sample case studies. Such a system can prove valuable in applications where users seek to study criminal cases or gather pertinent insights for specific cases. Furthermore, it can assist in understanding patterns and trends in criminal events, particularly concerning geospatial information. Linking crime event-based question-answering systems to geospatial information facilitates exploration of niche areas and furnishes precise details about local crime with minimal hype and hence worth exploring.

Cite This Paper

Dipti Pawade, Sonali Patil, Chaitanya Bandiwdekar, Siddhesh Bagwe, Pooja Kaulgud, Aditi Kulkarni, "Closed Domain Question Answering System Tailored for Crime Events Using Deep Learning for Both Statistical and Contextualized Responses", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.16, No.4, pp. 47-65, 2024. DOI:10.5815/ijieeb.2024.04.04

Reference

[1]Allam, Ali Mohamed Nabil, and Mohamed Hassan Haggag. "The question answering systems: A survey." International Journal of Research and Reviews in Information Sciences (IJRRIS) 2, no. 3 (2012).
[2]Diefenbach, Dennis, Vanessa Lopez, Kamal Singh, and Pierre Maret. "Core techniques of question answering systems over knowledge bases: a survey." Knowledge and Information systems 55 (2018): 529-569.
[3]Bouziane, Abdelghani, Djelloul Bouchiha, Noureddine Doumi, and Mimoun Malki. "Question answering systems: survey and trends." Procedia Computer Science 73 (2015): 366-375.
[4]Ojokoh, Bolanle, and Emmanuel Adebisi. "A review of question answering systems." Journal of Web Engineering (2018): 717-758.
[5]Chen, Wei. "Developing a Framework for Geographic Question Answering Systems Using GIS, Natural Language Processing, Machine Learning, and Ontologies." PhD diss., The Ohio State University, 2014.
[6]Ferrés Domènech, Daniel. "Knowledge-based and data-driven approaches for geographical information access." (2017).
[7]Mai, Gengchen, Krzysztof Janowicz, Rui Zhu, Ling Cai, and Ni Lao. "Geographic question answering: Challenges, uniqueness, classification, and future directions." AGILE: GIScience series 2 (2021): 8.
[8]Kabiraj, Pintu. "Crime in India: A spatio-temporal analysis." GeoJournal 88, no. 2 (2023): 1283-1304.
[9]Das, Upal, and Amit K. Biswas. "Incidence of Crime and its Determinants: A Study on Indian States." Available at SSRN 4472648 (2023).
[10]Kia, Mahsa Abazari, Aygul Garifullina, Mathias Kern, Jon Chamberlain, and Shoaib Jameel. "Adaptable closed-domain question answering using contextualized CNN-attention models and question expansion." IEEE Access 10 (2022): 45080-45092.
[11]Soares, Marco Antonio Calijorne, and Fernando Silva Parreiras. "A literature review on question answering techniques, paradigms and systems." Journal of King Saud University-Computer and Information Sciences 32, no. 6 (2020): 635-646.
[12]Dwivedi, Sanjay K., and Vaishali Singh. "Research and reviews in question answering system." Procedia Technology 10 (2013): 417-424.
[13]Y. Lin, H. Ji, Z. Liu, and M. Sun, ``Denoising distantly supervised open domain question answering,'' in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, 2018, pp. 1736_1745.
[14]Pawade, Dipti, Avani Sakhapara, Isha Joglekar, and Deepanshu Vangani. "Implementation of Open Domain Question Answering System." In International Conference on Data Management, Analytics and Innovation, pp. 499-507. Springer, Singapore, 2023.
[15]Yang, Wei, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. "End-to-end open-domain question answering with bertserini." arXiv preprint arXiv:1902.01718 (2019).
[16]Karpukhin, Vladimir, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. "Dense passage retrieval for open-domain question answering." arXiv preprint arXiv:2004.04906 (2020).
[17]Seo, Minjoon, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, and Hannaneh Hajishirzi. "Real-time open-domain question answering with dense-sparse phrase index." arXiv preprint arXiv:1906.05807 (2019).
[18]C. Qu, L. Yang, C. Chen, M. Qiu, W. B. Croft, and M. Iyyer, ``Open retrieval conversational question answering,'' in Proc. 43rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., Jul. 2020, pp. 539-548.
[19]Hu, Zhangning. "Question answering on SQuAD with BERT." CS224N Report, Stanford University. Accessed (2019): 12-01.
[20]Schwager, Sam, and John Solitario. "Question and answering on SQuAD 2.0: BERT is all you need." ArXiv e-prints of (2019).
[21]Chen, Wei. "Developing a Framework for Geographic Question Answering Systems Using GIS, Natural Language Processing, Machine Learning, and Ontologies." PhD diss., The Ohio State University, 2014.
[22]Punjani, Dharmen, Kuldeep Singh, Andreas Both, Manolis Koubarakis, Iosif Angelidis, Konstantina Bereta, Themis Beris et al. "Template-based question answering over linked geospatial data." In Proceedings of the 12th workshop on geographic information retrieval, pp. 1-10. 2018.
[23]Hamzei, Ehsan, Haonan Li, Maria Vasardani, Timothy Baldwin, Stephan Winter, and Martin Tomko. "Place questions and human-generated answers: A data analysis approach." In International Conference on Geographic Information Science, pp. 3-19. Springer, Cham, 2019.
[24]Hamzei, Ehsan, Martin Tomko, and Stephan Winter. "Translating Place-Related Questions to GeoSPARQL Queries." In Proceedings of the ACM Web Conference 2022, pp. 902-911. 2022.
[25]Li, Haonan, Ehsan Hamzei, Ivan Majic, Hua Hua, Jochen Renz, Martin Tomko, Maria Vasardani, Stephan Winter, and Timothy Baldwin. "Neural factoid geospatial question answering." Journal of Spatial Information Science 23 (2021): 65-90.
[26]Kamdi, Rohini P., and Avinash J. Agrawal. "Keywords based closed domain question answering system for indian penal code sections and indian amendment laws." International Journal of Intelligent Systems and Applications 7, no. 12 (2015): 54.
[27]S. Pudaruth, R. P. Gunputh, K. M. S. Soyjaudah and P. Domun, “A Question Answer System for the Mauritian Judiciary,” 2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI), Dubai, 2016, pp. 201-205. doi: 10.1109/ISCMI.2016.47.
[28]Shubhangi C. Tirpude, Dr. A.S. Alvi , “Closed Domain Keyword based Question Answering System for Legal Documents of IPC Sections & Indian Laws”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 3, Issue 6, June 2015, Pp 5299-5311.
[29]Pawade, Dipti, Avani Sakhapara, Hussain Ratlamwala, Siddharth Mishra, Samreen Shaikh, and Dhrumil Mehta. "Implementation of smart legal assistance system in accordance with the Indian Penal Code using similarity measures." In Advances in Computing and Data Sciences: Third International Conference, ICACDS 2019, Ghaziabad, India, April 12–13, 2019, Revised Selected Papers, Part II 3, pp. 440-449. Springer Singapore, 2019.
[30]Kim, Suhong, Param Joshi, Parminder Singh Kalsi, and Pooya Taheri. "Crime analysis through machine learning." In 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 415-420. IEEE, 2018.
[31]Edholm, Emma. "Property Crime in The City and County of San Francisco 2016-2017: Applying GIS to Crime Pattern Theory." (2019).
[32]Yang, Dingqi, et al. "CrimeTelescope: crime hotspot prediction based on urban and social media data fusion." World Wide Web 21 (2018): 1323-1347.
[33]Jimoh, Fatai. "Real time crime prediction using social media." PhD diss., 2023.
[34]Das, Upal, and Amit K. Biswas. "Incidence of Crime and its Determinants: A Study on Indian States." Available at SSRN 4472648 (2023).
[35]Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2016. Newsqa: A machine comprehension dataset. arXiv preprint arXiv:1611.09830 (2016).
[36]Deepthi, Godavarthi, and A. Mary Sowjanya. "Query-Based Retrieval Using Universal Sentence Encoder." Revue d'Intelligence Artificielle 35, no. 4 (2021).
[37]Herzig, Jonathan, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. "TaPas: Weakly supervised table parsing via pre-training." arXiv preprint arXiv:2004.02349 (2020).
[38]Liu, Q., Chen, B., Guo, J., Ziyadi, M., Lin, Z., Chen, W. and Lou, J.G., 2021. TAPEX: Table pre-training via learning a neural SQL executor. arXiv preprint arXiv:2107.07653.
[39]Yang, Z., Gan, Z., Wang, J., Hu, X., Lu, Y., Liu, Z. and Wang, L., 2022, June. An empirical study of gpt-3 for few-shot knowledge-based vqa. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 3, pp. 3081-3089).
[40]Zaib, M., Tran, D.H., Sagar, S., Mahmood, A., Zhang, W.E. and Sheng, Q.Z., 2021. BERT-CoQAC: BERT-based conversational question answering in context. In Parallel Architectures, Algorithms and Programming: 11th International Symposium, PAAP 2020, Shenzhen, China, December 28–30, 2020, Proceedings 11 (pp. 47-57). Springer Singapore.
[41]Pearce, K., Zhan, T., Komanduri, A. and Zhan, J., 2021. A comparative study of transformer-based language models on extractive question answering. arXiv preprint arXiv:2110.03142.