Anjali Mohapatra

Work place: International Institute of Information Technology, Bhubaneswar, Odisha, India

E-mail: anjali@iiit-bh.ac.in

Website:

Research Interests: Pattern Recognition, Bioinformatics

Biography

Dr. Anjali Mohapatra received a Msc. degree in Computer Science Engineering from Utkal University, Bhubaneswar, India in 2001. She has also received a phD in Computer Science from Utkal University, Bhubaneswar, India in 2008. She is currently a HOD of the Computer Science Department at the International Institute of Information Technology, Bhubaneswar, India. She is also an assistant professor in the Computer Science department at the International Institute of Information Technology, Bhubaneswar, India. Her research activities have been focused on computational biology,Bioinformatics and pattern recognition.

Author Articles
An Integrated Pipeline with Internal Image Processing for Efficient Image to Text to Speech Conversion

By Shreyas Reddy Rashmi Ranjan Das Anjali Mohapatra

DOI: https://doi.org/10.5815/ijem.2023.06.01, Pub. Date: 8 Dec. 2023

Optical Character Recognition Systems (OCR) is a tool that helps computers read text from pictures of papers. It makes it easier for machines to understand what the words say without needing a person to read it out loud. It allows for easy digitizing of historical documents, archival material, and medical records thereby saving on their retrieval times. However, the accuracy of OCR systems heavily relies on the quality of the input images. To negate the contribution of the quality of input images to the accuracy of OCR systems, in this paper, we propose an integrated image pre-processing pipeline integrated with the OCR systems that enhances the quality of input images for efficient image to text conversion. This method results in an easily understandable text output with a lower Character Error Rate (CER) in comparison to the current methods. In addition, we explore a technique for converting text from a document or image into machine-readable form and then converting it to audio output using gTTS, a Python library that interfaces with Google Translate's text-to-speech API. We assess the effectiveness of this approach and illustrate that it substantially enhances OCR precision when compared to other existing methods. This paper presents a clear overview of the growth phases and significant obstacles, accompanied by compelling comparisons of results achieved through various methods.

[...] Read more.
Other Articles