Work place: Hindustan Institute of Technology and Science, Chennai, India
E-mail: angelinag@hindustanuniv.ac.in
Website:
Research Interests: Computer Architecture and Organization, Information Security, Information Systems, Data Mining, Information Retrieval, Data Structures and Algorithms
Biography
Dr. Angelina Geetha is working as a Professor and Dean – Engineering and Technology in Hindustan Institute of Technology and Science, Padur, Chennai, India. She has received her PhD from Anna University, Chennai, Tamilnadu in 2008. Her research interests are Machine Learning methods, Data mining, Information Retrieval.
By M. Rajasekar Angelina Geetha
DOI: https://doi.org/10.5815/ijmsc.2023.01.02, Pub. Date: 8 Feb. 2023
Information Extraction is an essential task in Natural Language Processing. It is the process of extracting useful information from unstructured text. Information extraction helps in most of the NLP applications like sentiment analysis, named entity recognition, medical data extraction, features extraction from research articles, feature extraction from agriculture, etc. Most of the applications in information extraction are performed by machine learning models. Many research work shave been carried out on machine learning based information extraction from various domain texts in English such as Bio medical, Share market, Weather, Business, Social media, Agriculture, Engineering, and Tourism. However domain specific information extraction for a particular regional language is still a challenge. There are different types of classification algorithms. However, for a selected domain to select the appropriate classification algorithm is very difficult. In this paper three famous classification algorithms are selected to do information extraction by classifying the Gynecological domain data in Tamil Language. The main objective or this research work is to analyze the machine learning methods which is suitable for Tamil domain specific text documents. There are 1635 documents being involved in classification task to extract the features by these selected three algorithms. By evaluating the classification task of each model it has been found that the Naive Bayes classification model provides highest accuracy value (84%) for the gynecological domain data. The F1-Score, Error rate and Execution time also evaluated for the selected machine learning models. The evaluation of performance has proved that the Naïve Bayes classification model gives optimal results. It has been concluded that the Naïve Bayes classification model is the best model to classify the gynaecological domain text in Tamil language
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals