Vocal Emotion Recognition Based on HMM and GMM for Mandarin Speech

Full Text (PDF, 218KB), PP.25-31

Views: 0 Downloads: 0

Author(s)

Sun Menghan 1,* Jiang Baochen 1 Yuan Jing 1

1. School of Mechanical, Electrical& Information Engineering, Shandong University at Weihai, Weihai, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2012.03.04

Received: 15 Dec. 2011 / Revised: 19 Jan. 2012 / Accepted: 23 Feb. 2012 / Published: 29 Mar. 2012

Index Terms

Speech Emotion Recognition, HMM, GMM

Abstract

The recognition of emotions from speech is a challenging issue. In this paper, two Hidden Markov Model-based vocal emotion classifiers are trained and evaluated by an emotional mandarin speech corpus based on Mel-Frequency Cepstral Coefficient features. Up to 6 basic emotion models including angry, fear, happy, sad, neutral and surprise are built under different parameters and the influence of parameter set is investigated. A statistical comparison of the two emotion recognition methods are discussed as well. The overall results reveal that the GMM classifier outperforms HMM classifier taking both computation complexity and recognition rate into consideration with the highest recognition rate of 72.34%.

Cite This Paper

Sun Menghan, Jiang Baochen, Yuan Jing,"Vocal Emotion Recognition Based on HMM and GMM for Mandarin Speech", IJEME, vol.2, no.3, pp.25-31, 2012. DOI: 10.5815/ijeme.2012.03.04 

Reference

[1]R.W. Picard, “Affective computing,” MIT Press, Cambridge, 1997

[2]W. Li, Y.H. Zhang and Y.Z. Fu, “Speech emotion recognition in E-learning system based on affective computing,” Third Internetional Conference on Natural Computation, vol.5, pp.809-813, 2007.

[3]Nicholson, K. Takahashi and R.Nakatsu, “Emotion recognition in speech using neural networks,” Neural Computing and Applications, vol.9, pp.290-296, December 2000.

[4]T.L. Pao, Y.T Chen, J.H. Yeh and Y.H. Chang, "Emotion recognition and evaluation of mandarin speech using weighted D-KNN classification," Master Thesis, Tatung University, 2005.

[5]Y.Wang, S.F. DU and Y.Z. Zhan, “Adaptive and optimal classification of speech emotion recognition,” Fourth International Conference on Natural Computation, vol.5, pp.407-411, 2008.

[6]W. J. Han, H.F. Li and C.Y. Guo, "A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition," WRI Global Congress on Intelligent Systems, vol.2, pp.145-149, 2009.

[7]R. Bock, D. Hubner, A. Wendemuth, "Determining optimal signal features and parameters for HMM-based emotion classification," 15th IEEE Mediterranean Electrotechnical Conference, pp.1586-1590, 2010.

[8]B. Schuller, G. Rigoll and M. Lang, "Hidden markov model-based speech emotion recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.II-1-4, 2003.

[9]A. B. Kandali, A. Routray and T. K. Basu, “Emotion recognition from Assamese speeches using MFCC features and GMM classifier,” 2008 IEEE Region 10 Conference, pp.1-5, 2008.

[10]D.A. Reynolds, T.F. Quatieri and R.B. Dunn, “Speaker verification using adapted gaussian mixture models,” Digital Signal Processing, vol.10, pp.19-41, 2000.

[11]S. Ser, C. Ling and L.Y. Zhu, “A hybrid PNN-GMM classification scheme for speech emotion recognition,” 19th International Conference on Pattern Recognition, pp.1-4, 2008.

[12] S. Yong, G. Evermann, M. Gales, T. Hain and D. Kershaw, “The HTK Book,” Cambridge University Engineering Department, 2006.

[13]D.N. Jiang and L.H. Cai, “Speech emotion reocognition using acoustic features,” Tsinghua Univ (Sci & Tech), vol 46 No.1, pp.86-89, 2006.

[14] T.L. Pao, Y.T Chen, J.H. Yeh and P.J. Li, “Mandarin emotional speech recognition based on SVM and NN,” 18th International Conference on Pattern Recognion,” vol.1, pp.1096-1100, 2006.

[15]V.A. Petrushin, “Emotion recognition in speech signal: experimental study, development, and application,” Sixth International Conference on Spoken Language Processing, pp.222-225, 2000.