IJIGSP Vol. 12, No. 2, 8 Apr. 2020
Cover page and Table of Contents: PDF (size: 464KB)
Full Text (PDF, 464KB), PP.1-8
Views: 0 Downloads: 0
IEMOCAP, MFCC, MMSE, Noisy Signal, SNR, Speech
Speech is one of the most natural and fundamental means of human computer interaction and the state of human emotion is important in various domains. The recognition of human emotion is become essential in real world application, but speed signal is interrupted with various noises from the real world environments and the recognition performance is reduced by these additional signals of noise and emotion. Therefore this paper focuses to develop emotion recognition system for the noisy signal in the real world environment. Minimum Mean Square Error, MMSE is used as the enhancement technique, Mel-frequency Cepstrum Coefficients (MFCC) features are extracted from the speech signals and the state of the arts classifiers used to recognize the emotional state of the signals. To show the robustness of the proposed system, the experimental results are carried out by using the standard speech emotion database, IEMOCAP, under various SNRs level from 0db to 15db of real world background noise. The results are evaluated for seven emotions and the comparisons are prepared and discussed for various classifiers and for various emotions. The results indicate which classifier is the best for which emotion to facilitate in real world environment, especially in noisiest condition like in sport event.
Htwe Pa Pa Win, Phyo Thu Thu Khine, " Emotion Recognition System of Noisy Speech in Real World Environment", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.12, No.2, pp. 1-8, 2020. DOI: 10.5815/ijigsp.2020.02.01
[1]Causse, M., Dehais, F., Péran, P., Sabatini, U., and Pastor, J., “The effects of emotion on pilot decision-making: A neuroergonomic approach to aviation safety”. Transportation research part C: emerging technologies, 33, 272-281, 2013.
[2]Leila Kerkeni, Youssef Serrestou, Mohamed Mbarki, Kosai Raoof, Mohamed Ali Mahjoub and Catherine Cleder (March 25th 2019). Automatic Speech Emotion Recognition Using Machine Learning [Online First], IntechOpen, DOI: 10.5772/intechopen.84856.
[3]Panikos Heracleous et.al, “Speech Emotion Recognition in Noisy and Reverberant Environments”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 23-26 Oct. 2017, DOI: 10.1109/ACII.2017.8273610
[4]Tayseer M. F. Taha, Ahsan Adeel and Amir Hussain, “A Survey on Techniques for Enhancing Speech”, International Journal of Computer Applications · February 2018 DOI: 10.5120/ijca2018916290
[5]Sun et al. , “Decision tree SVM model with Fisher feature selection for speech emotion recognition”, EURASIP Journal on Audio, Speech, and Music Processing (2019) 2019:2 https://doi.org/10.1186/s13636-018-0145-5
[6]Xixin Wu, et.al, “Speech Emotion Recognition Using Capsule Networks”, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 12-17 May 2019, DOI: 10.1109/ICASSP.2019.8683163
[7]Michael Neumann, Ngoc Thang Vu, “Improving Speech Emotion Recognition With Unsupervised Representation Learning On Unlabeled Speech”, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 12-17 May 2019, DOI: 10.1109/ICASSP.2019.8682541
[8]Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, “Direct Modelling of Speech Emotion from Raw Speech”, DOI: https://arxiv.org/abs/1904.03833
[9]Evaggelos Spyrou 1,2,3,_, Rozalia Nikopoulou 4, Ioannis Vernikos 2 and Phivos Mylonas, “Emotion Recognition from Speech Using the Bag-of-VisualWords on Audio Segment Spectrograms”, Technologies 2019, 7, 20; doi:10.3390/technologies7010020
[10]Htwe Pa Pa Win and Phyo Thu Thu Khine, “Speech Enhancement Techniques for Noisy Speech in Real World Environments”, Proceedings of the 17th International Conference on Computer Applications, pp.238-244, 27th – 28th February, 2019.
[11]Farah Chenchah and Zied Lachiri, “Speech emotion recognition in noisy environment”, 2nd International Conference on Advanced Technologies for Signal and Image Processing - ATSIP'2016, March 21-24, 2016, Monastir, Tunisia
[12]Bashirpour and Geravanchizadeh, “Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments”, EURASIP Journal on Audio, Speech, and Music Processing (2018) 2018:9, https://doi.org/10.1186/s13636-018-0133-9
[13]He L, “Stress and emotion recognition in natural speech in the work and family environments”, Ph.D. thesis, Department of Electrical Engineering, RMIT University, Melbourne, November 2010.
[14]Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”, Journal of Computing, Volume 2, Issue3, March 2010, ISSN 2151-9617.
[15]C. Busso, M. Bulut, C. Lee, A.Kazemzadeh, E. Mower, S. Kim, J. Chang, S. Lee, and S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database”, Journal of Language Resources and Evaluation, vol. 42, no. 4, pp. 335-359, December 2008.