Analysis of the Error Pattern of HMM based Bangla ASR

Full Text (PDF, 696KB), PP.1-9

Views: 0 Downloads: 0

Author(s)

Shourin R. Aura 1,* Md. Jakaria Rahimi 1 Oli Lowna Baroi 1

1. Ahsanullah University of Science and Technology, Dhaka, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2020.01.01

Received: 4 Nov. 2019 / Revised: 12 Nov. 2019 / Accepted: 19 Nov. 2019 / Published: 8 Feb. 2020

Index Terms

ASR, HMM, HTK, MFCC, Speech Recognition, Dictionary

Abstract

Speech Recognition research has been ongoing for more than 80 years. Various attempts have been made to develop and improve speech recognition process around the world. Research on ASR by machine has attracted much attention over the last few decades. Bengali is largely spoken all over the world. There are lots of scopes yet to explore in the research regarding offline automatic Bangla speech recognition system. In our work, a moderate size speech corpus and a HMM based speech recognizer have been built to analyze the error pattern. Audio recordings have been collected from different persons in both quiet and noisy area. Live test has been carried out also to check the performance of the model individually. The percentage of the error and the percentage of correction with the created models are presented in this paper along with the results obtained during the live test. Finally, the results are analyzed to get the error pattern needed for future development.

Cite This Paper

Shourin R. Aura, Md. J. Rahimi, Oli L. Baroi, " Analysis of the Error Pattern of HMM based Bangla ASR", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.12, No.1, pp. 1-9, 2020. DOI: 10.5815/ijigsp.2020.01.01

Reference

[1]Pelton, Gordon E. Voice Processing, McGraw-Hill International Edition, 1993.

[2]Ganesh Tiwari, “Text Prompted Remote Speaker Authentication: Joint Speech and Speaker Recognition/Verification System”.

[3]Athiramenon.G, Anjusha.V.K, “Analysis of Feature Extraction Methods for Speech Recognition”, In: IJISET International Journal of Innovative Science, Engineering  and Technology, Vol. 4 Issue 4, April 2017.  

[4]Melanie Pinola, “Speech Recognition Through the Decades: How we Ended Up With Siri”, PCWorld.

[5]Website.[Online]Available: http://aboutworldlanguages.com/bengali

[6]Gruhn R.E., Minker W., Nakamura S., “Statistical Pronunciation Modeling for Non-Native Speech Processing”, Chapter -2 Available: http://www.springee.con/978-3-642-19585-3

[7]Website [Online] Available: http://what-when-how.com/video-search-engines/speech-recognition-audio-processing-video-search-engines/

[8]Natural Language Processing Website. [Online] Available: http://language.worldofcomputing.net/about

[9]Eslam Mansour mohammed, Mohammed Sharafsayed, Abdalla Mohammed Moselhy and Abdelaziz Alsayed Abdelnaiem, “LPC and MFCC Performance Evaluation with Artificial Neutral Network for Spoken Language Identification”. In: International Journal of Signal Processing, Image Processing and Pattern Recognition, Vol. 6, No. 3, pp. 55-56, June, 2013

[10]Choudhury, Farzana & Maksud Shamma, Tasneem & Rafiq, Umana & Rahman Shuvo, Hasan & Alam, Shahnewaz (2016), “Development of Bengali Automatic Speech Recognizer and Analysis of Error Pattern”. In: International Journal of Scientific and Engineering Research.7.58.

[11]Arpit Aggarwal, Tanvi Sahay, Mahesh Chandra, “Performance Evaluation of Artificial Neural Networks for Isolated Hindi Digit Recognition with LPC and MFCC”. In: 2015 International Conference on Advanced Computing and Communication Systems (ICACCS-2015), Jan. 05 – 07, 2015, Coimbatore, INDIA

[12]Prabhakar V. Mhadse and Amol C. Wani, “Automation System using Speech Recognition for Speaker Dependency using MFCC,” Proc. of the Intl. Conf. on Advances in Computing and Communication (ICACC-2013), pp. 75-79, 27-28 April 2013, Mumbai, Maharashtra, India.

[13]Eslam Mansour Mohammed, Mohammed Sharaf Sayed, Abdallah Mohammed Moselhy and Abdelaziz Alsayed Abdelnaiem, “LPC and MFCC Performance Evaluation with Artificial Neural Network for Spoken Language Identification,” IJSPIPPR, vol. 6, no. 3, pp. 55-66, June 2013.

[14]Abdul Syafiq B Abdull Sukor, “SPEAKER IDENTIFICATION SYSTEM USING MFCC PROCEDURE AND NOISE REDUCTION METHOD”. Available: https://pdfs.semanticscholar.org/8312/c229a7ed6a8b8456490e1f831b5 3727f36f8.pdf

[15]Abdelmajid H. Mansour, Gafar Zen Alabdeen Salh, Khalid A. Mohammed, “Voice Recognition using Dynamic Time Warping and Mel-Frequency Cepstral Coefficients Algorithms”. In: International Journal of Computer Applications (0975–8887) Vol. 116 –No. 2, April 2015