Gaurav D. Saxena; Nafees A. Farooqui; Saquib Ali

Extricate Features Utilizing Mel Frequency Cepstral Coefficient in Automatic Speech Recognition System

Full Text (PDF, 575KB), PP.14-21

Views: 0 Downloads: 0

Author(s)

Gaurav D. Saxena ¹ Nafees A. Farooqui ^2,* Saquib Ali ²

1. Department of Computer Science, Kamla Nehru Mahavidyalaya, Nagpur, India

2. Department of Computer Science, Era University, Lucknow, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijem.2022.06.02

Received: 5 Aug. 2022 / Revised: 20 Sep. 2022 / Accepted: 31 Oct. 2022 / Published: 8 Dec. 2022

Index Terms

Features, Mel filter banks processing, Mel frequency cepstral coefficients, Sound file, Speech Recognition.

Abstract

As of late, Automatic speech recognition has advanced on account of instruments, for example, natural language processing, and deep learning, among others. It is a framework or put in another way, a gadget that changes a raw signal into computer comprehensible text. The genuine creation of speech is comprised of changes in air pressure that outcomes in pressure wave that our ear and cerebrum comprehend. The vocal tract is utilized to deliver a human speech, which is adjusted by teeth, tongue, and lips. Speech recognition alludes to a machine's ability to perceive human speech and transform it into a computer comprehensible text. Speech recognition is a magnificent illustration of good interaction between humans and computers. In this paper, we introduce the process to extricate the feature from the signal utilizing Mel-frequency cepstral coefficients. Mel-frequency cepstral coefficients are a genuinely far wide and proficient methodology for feature extraction from a sound file. This technique improved the speech recognition process and removes the distortion in the voice. In this manuscript we applied the Mel-frequency filtration process to improve speech and remove the background noise. the Therefore, the proposed methodology gives better performance in the automated speech recognition system.

Cite This Paper

Gaurav D. Saxena, Nafees A. Farooqui, Saquib Ali, "Extricate Features Utilizing Mel Frequency Cepstral Coefficient in Automatic Speech Recognition System", International Journal of Engineering and Manufacturing (IJEM), Vol.12, No.6, pp. 14-21, 2022. DOI:10.5815/ijem.2022.06.02

Reference

[1]Bharthi B, Deepalakshmi V, NelsonI. A neural network-based speech recognition system for isolated Tamil words. In Proceedings of International Conference on Neural Networks and Artificial Intelligence, Brest, Belarus 2006 Jun.

[2]Rajput N, Nanavati AA. Speech in Mobile and Pervasive Environments. John Wiley & Sons; 2012 Jan 26.

[3]Du JX, Guo YL, Zhai CM. Recognizing Complex Events in Real Movies by Audio Features. In International Conference on Intelligent Computing 2012 Jul 25 (pp. 218-223). Springer, Berlin, Heidelberg.

[4]Beigi H. Speaker recognition. In Fundamental of Speaker Recognition 2011 (pp.543-559). Springer, Boston, MA.

[5]Pangaonkar S, Panat A. A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis. ICDSMLA2019. 2020 (pp.534-549).

[6]Dhonde SB, Chaudhari A, Jagade SM. Integration of Mel-frequency cepstral coefficients with log energy and temporal derivatives for text-independent speaker identification. In Proceedings of the international Conference on Data Engineering and Communication Technology 2017 (pp. 791-797). Springer, Singapore.

[7]Mohdiwale S, Sahu TP. Nearest Neighbor Classification Approach for Bilingual Speaker and Gender Recognition. In Advances in Biometrics 2019 (pp. 249-266). Springer, Cham.

[8]Goyal S, Batra N, Batra NK. An integrated Approach to Home Security and Safety Systems. CRC Press; 2021 Oct 14.

[9]El-Samie FE. Information security for automatic speaker identification. Information security for automatic speaker identification. 2011:1-22.

[10]Kalpana Chowdhary M, Jude Hemanth D. Deep Learning Approach for Speech Emotion Recognition. In Data Analytics and Management 2021 (pp. 367-376). Springer, Singapore.

[11]Fayek H. Speech processing for machine learning: Filter banks, Mel-frequency cepstral coefficients MFCCS) and what's in between. URL: https://haythamfayek.com/2016/04/21/speechprocessingfor-machine-learning.html. 2016 Apr.

[12]Tree Spirit: Illegal logging detection and alerting system using audio identification over an IoT network-Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Fast-Fourier Transformation-2-Fast-Fourier-Transformation-To increase-the-performance_fig6_323281289[accessed Mar 6, 2022].

[13]Srivastava S, Chaudhary G, Shukla C. Text-Independent Speaker Recognition Using Deep Learning. In Concepts and Real-Time Applications of Deep Learning 2021 (pp. 41-51). Springer, Cham.

[14]Li L, Li Y, Wang Z, Li X, Shi G. A Reliable Voice Perceptual Hash Authentication Algorithm. In International Conference on Mobile Multimedia Communications 2021 Jul 23 (pp. 253-263). Springer, Cham.

[15]Kamble VV, Deshmukh RR, Karwankar AR, Ratnaparkhee VR, Annadate SA. Emotion recognition for instantaneous Mrathi Spoken Words. In Proceedings of the 3rd international Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 2015 (pp. 335-346)

[16]Music and Speech Analysis Using the 'Bach' Scale Filter Bank- Scientific Figure on ResearchGate. Available from: https://wwww.researchgate.net/figure/The-Mel-scale-as-a-linear-and-b-semilog-plots_fig3_282609758 [accessed 6 Mar 2022].

[17]A Mel-Filter bank and MFCC-based Neural Network Approach to Train the Houston Toad Call Detection System Design- Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/An-example-of-Filter-Bank-on-Mel-Scale-12_fig3_330477843 [accessed Mar 6, 2022].

[18]Isolated speech recognition using MFCC and DTW. International Journal & Magazine of Engineering, Technology, Management and Research. April 2017.

[19]Mohamad Jamil MH, AI-Haddad SA, Kyun Ng C. A flexible speech recognition system for cerebral palsy disabled. In International Conference on Informatics Engineering and Information Science 2011 Nv 14 (pp. 42-55). Springer, Berlin, Heidelberg.

[20]Coskun H, Yigit T. Artificial Intelligence Applications on Classification of Heart Sounds. In Nature-Inspired Intelligent Techniques for Solving Biomedical Engineering Problems 2018 (pp. 146-183). IGI Global.

[21]Kasiviswanathan U, Kushwaha A, Sharma S. Development of human speech signal-based intelligent human-computer interface for driving a wheelchair in enhancing the quality -of-life of the persons. In Intelligent Systems for Healthcare Management and Delivery 2019 (pp. 21-60). IGI Global.

[22]Patel, I., 2010. Speech recognition using HMM with MFCC-An analysis using frequency specral decomposion technique. Signal & Image Processing: An International Journal (SIPIJ) Vol, 1.

[23]Ranjan, R., & Thakur, A. (2019). Analysis of feature extraction techniques for speech recognition system. International Journal of Innovative Technology and Exploring Engineering, 8(7C2), 197-200.

[24]Abakarim, F., & Abenaou, A. (2022). Comparative study to realize an automatic speaker recognition system. International Journal of Electrical and Computer Engineering, 12(1), 376-382.

International Journal of Engineering and Manufacturing (IJEM)