Speaker Recognition in Mismatch Conditions: A Feature Level Approach

Full Text (PDF, 672KB), PP.37-43

Views: 0 Downloads: 0

Author(s)

Sharada V Chougule 1,* Mahesh S. Chavan 2

1. Finolex Academy of Management & Technology, Ratnagiri, Maharashtra, India

2. KIT’s College of Engineering, Kolhapur, Maharashtra, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2017.04.05

Received: 2 Dec. 2016 / Revised: 25 Jan. 2017 / Accepted: 7 Mar. 2017 / Published: 8 Apr. 2017

Index Terms

Feature extraction, Speaker recognition, Segmental features

Abstract

Mismatch in speech data is one of the major reasons limiting the use of speaker recognition technology in real world applications. Extracting speaker specific features is a crucial issue in the presence of noise and distortions. Performance of speaker recognition system depends on the characteristics of extracted features. Devices used to acquire the speech as well as the surrounding conditions in which speech is collected, affects the extracted features and hence degrades the decision rates. In view of this, a feature level approach is used to analyze the effect of sensor and environment mismatch on speaker recognition performance. The goal here is to investigate the robustness of segmental features in speech data mismatch and degradation. A set of features derived from filter bank energies namely: Mel Frequency Cepstral Coefficients (MFCCs), Linear Frequency Cepstral Coefficients (LFCCs), Log Filter Bank Energies (LOGFBs) and Spectral Subband Centroids (SSCs) are used for evaluating the robustness in mismatch conditions. A novel feature extraction technique named as Normalized Dynamic Spectral Features (NDSF) is proposed to compensate the sensor and environment mismatch. A significant enhancement in recognition results is obtained with proposed feature extraction method.

Cite This Paper

Sharada V Chougule, Mahesh S. Chavan,"Speaker Recognition in Mismatch Conditions: A Feature Level Approach", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.9, No.4, pp.37-43, 2017. DOI: 10.5815/ijigsp.2017.04.05

Reference

[1]Amirreza Shirani and Ahmad Reza Naghsh Nilchi, “Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier”, I.J. Image, Graphics and Signal Processing, 2016, 4, 39-45.

[2]Tushar Sahoo and Sabyasachi Patra, “Silence Removal and Endpoint Detection of Speech Signal for Text Independent Speaker Identification”, I.J. Image, Graphics and Signal Processing, 2014, 6, 27-35.

[3]Qi Li, Jinsong Zheng, Augustine Tsai, and Qiru Zhou “Robust end point detection and energy normalization for real-time speech and speaker recognition”, IEEE Transactions On Speech And Audio Processing, Vol. 10, No. 3, March 2002, pp.146-157.

[4]Sharada V Chougule, Mahesh S Chavan , “Channel Robust MFCCs for Contineous Speech Speaker Recognition”, Springer Book Series: Advances in Signal Processing and Intelligent Recognition Systems Volume 264, 2014, pp 557-568.

[5]Ali I., Saha G., “A Robust Iterative Energy Based Voice Activity Detector,” Proceedings, International Conference on Emerging Trends in Engineering and Technology (ICETET), IEEE,2010.

[6]Steven V Devis and Paul Mermelstein, “Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences,” IEEE Transaction on Audio, Speech and Language Processing, vol.4, ISSP-28,no.4, pp.357-366, August 1980.

[7]Wai Nang Chan, Nengheng Zheng, and Tan Lee, “Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 6, pp. 1884-1892, August 2007.

[8]Tomi Kinnuen and Haizhou Li, “An overview of text independent speaker recognition: From features to supervectors,” Speech Communication, 2010.

[9]Joseph P. Campbell, Douglas A. Reynolds, Robert B. Dunn, ‘Fusing High- and Low-Level Features for Speaker Recognition,” EUROSPEECH, pp.2665-2668, 2003.

[10]Lawrence R. Rabiner and Ronald W. Schafer, “Digital processing of speech signals,” Prentice Hall Innternational, 2011.

[11]Sadaoki Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Speech, Audio Processing, vol. ASSP-29, no. 2, pp. 254-272, April 1981. 

[12]Xinhui Zhou, Daniel Garcia-Romero, Ramani Duraiswami, Carol Espy-Wilson and Shihab Shamma, “Linear versus Mel frequency cepstral coefficients for speaker recognition”, ASRU 2011, pp. 559-564.

[13]Homayoon Beigi, “Speaker Recognition: Advancements and Challenges”, New Trends and Developments in Biometrics, Chapter 1, INTECH. 

[14]Jingdong Chen, Yiteng (Arden) Huang, Qi Li andKuldip K. Paliwal, “Recognition of noisy speech using dynamic spectral subband centroids,” IEEE Signal Processing Letters, vol. 11, no. 2, pp. 258-261, February 2004.

[15]K. K. Paliwal, “Spectral centroid features for speech recognition”, Proc. ICASSP, vol. 2, pp.617–620, 1998.

[16]Sharada V Chougule , Mahesh S Chavan, “Robust spectral features for automatic speaker recognition in mismatch condition,” Second International Symposium on Computer Vision and the Internet(VisionNet’15), Procedia Computer Science, 58 ( 2015 ) 272 – 279, 2015.

[17]Chen Yang, Frank K. Soong and Tan Lee, “Static and dynamic spectral features: Their noise robustness and optimal weights for ASR”, IEEE Transactions on Audio, Speech, And Language Processing, vol. 15, no. 3, March 2007, pp.1087-1097.

[18]Haris B C, G. Pradhan, A. Misra, S. R. M. Prasanna, R. K. Das and R. Sinha, “Multivariability speaker recognition database in Indian scenario,” National Communication Conference, IEEE, 2011.

[19]Haris B C, G Pradhan, A Misra, S Shukla, R Sinha and S R M Prasanna, Multi-Variability Speech Database for Robust Speaker Recognition, National Communication Conference , IEEE,2011.

[20]H.S. Jayanna and S.R.M. Prasanna, “An experimental comparision of modeling techniques for speaker recognition under limited data condition”, Sadhana Academy, proceddings in Engineering Sciences (Spinger), vol. 34(3), pp.717-728, Oct.2009.