Significance of Source Information for Text Dependent Speaker Verification

Full Text (PDF, 675KB), PP.42-49

Views: 0 Downloads: 0

Author(s)

Archita Hore 1 S. R. Nirmala 1,* Rohan K. Das 2 Sarfaraz Jelil 2 S. R. M. Prasanna 2

1. Gauhati University Institute of Science and Technology, Guwahati-781014, India

2. Indian Institute of Technology Guwahati, Guwahati-781039, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2018.06.05

Received: 22 Feb. 2018 / Revised: 15 Mar. 2018 / Accepted: 20 Apr. 2018 / Published: 8 Jun. 2018

Index Terms

Mel frequency cepstral coefficients (MFCC), residual Mel frequency cepstral coefficients (RMFCC), dynamic time warping (DTW), Source features

Abstract

This work focuses on text dependent speaker verification system where a source feature specifically residual Mel frequency cepstral coefficients (RMFCC), has been extracted in addition to a vocal tract system feature namely Mel frequency cepstral coefficients (MFCC). The RMFCC features are derived from the LP residuals whereas MFCC features are derived from the cepstral analysis of the speech signal. Thus, these two features have different information about the speaker. A four cohort speaker’s set has been prepared using these two features and dynamic time warping (DTW) is used as the classifier. Performance comparison of the text dependent speaker verification model using MFCC and RMFCC features are enumerated. Experimental results shows that, using RMFCC feature alone do not give satisfactory results in comparison to MFCC. Also, the system’s performance obtained using the MFCC features, is not optimum. So, to improve the performance of the system, these two features are combined together using different combination algorithms. The proposed lowest ranking method yields good performance with an equal error rate (EER) of 7.50%. To further improve the efficiency of the system, the proposed method is combined along with the strength voting and weighted ranking method in the hierarchical combination method to obtain an EER of 3.75%. 

Cite This Paper

Archita Hore, S. R. Nirmala, Rohan K. Das, Sarfaraz Jelil, S. R. M. Prasanna ," Significance of Source Information for Text Dependent Speaker Verification ", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.10, No.6, pp. 42-49, 2018. DOI: 10.5815/ijigsp.2018.06.05

Reference

[1]A. Larcher, K. A. Lee, B. Ma and H. Li, “Imposture classification for text-dependent speaker verification,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014, pp. 739-743.

[2]R. K. Das, S. Jelil and S. R. M. Prasanna, “Development of Multi- Level Speech based Person Authentication System,” Journal of Signal Processing Systems, 2016, 1-13.

[3]H. Khemiri and D. Petrovska-Delacretaz, “Cohort selection for textdependent speaker verification score normalization,” 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Monastir, 2016, pp. 689-692.

[4]A. Larcher, K. A. Lee, B. Ma and H. Li, “Text-dependent speaker verification: Classifiers, databases and RSR2015,” Journal of Speech Communication, vol. 60, pp. 56-77, May, 2014.

[5]A. K. Sarkar and Z. H. Tan, “Text Dependent Speaker Verification Using Un-supervised HMM-UBM and Temporal GMM-UBM,” INTERSPEECH 2016, San Francisco, USA, September 2016.

[6]A. Revathi, R.Ganapathy and Y.Venkataramani, “Text Independent Speaker Recognition and Speaker Independent Speech Recognition Using Iterative Clustering Approach,” International Journal of Computer Science and Information Technologies, Vol 1, No 2, November 2009.

[7]S. P. Choudhury, T. K. Das, P. Saha, R. Hussain and U. Baruah, “Comparative analysis of two different system’s framework for text dependent speaker verification,” International Conference on Circuit, Power and Computing Technologies (ICCPCT), Nagercoil, 2015, pp. 1-5.

[8]B. Yegnanarayana, S. R. M. Prasanna, J. M. Zachariah and C. S. Gupta, “Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 4, pp. 575-582, July 2005.

[9]N. Dave, “Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition,” International Journal For Advance Research In Engineering and Technology, Vol. 1, Issue VI, July 2013.

[10]E. Variani, X. Lei, E. McDermott, I. L. Moreno and J. Gonzalez- Dominguez, “Deep neural networks for small footprint text-dependent speaker verification,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014, pp. 4052-4056.

[11]T. K. Das, S. Misra, S. P. Choudhury, D. K. Sah, U. Baruah and R. H. Laskar, “Comparison of DTW score and warping path for text dependent speaker verification system,” International Conference on Circuit, Power and Computing Technologies (ICCPCT), Nagercoil, 2015, pp. 1-4.

[12]S. Dey, S. Madikeri, M. Ferras and P. Motlicek, “Deep neural network based posteriors for text-dependent speaker verification,” 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 5050-5054. 

[13]D. A. Reynolds, “Gaussian Mixture Models,” Encyclopedia of Biometric Recognition, Springer, Journal Article, February 2008.

[14]R. K. Das, Abhiram B., S R M Prasanna and A. G. Ramakrishnan, “Combining Source and System Information for Limited Data Speaker Verification,” INTERSPEECH 2014, Singapore, 2014, pp 1836-1840.

[15]H. S. Jayanna, “Limited data speaker recognition,” Ph.D. thesis, Indian Institute of Technology, Guwahati, India, 2009.

[16]D. Hosseinzadeh and S. Krishnan, “Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance Using GMMs,” IEEE 9th Workshop on Multimedia Signal Processing, Crete, 2007, pp. 365-368.

[17]D. Pati and S. R. M. Prasanna, “Speaker Information from Subband Energies of Linear Prediction Residual,” National Conference on Communications (NCC), Chennai, 2010, pp. 1-4.

[18]R. K. Das and S. R. M. Prasanna, “Exploring different attributes of source information for speaker verification with limited test data”, The Journal of the Acoustical Society of America, 2016, 140(1), pp. 184-190.

[19]B. B. Andersen, “The mel frequency scale and coefficients”, Available FTP: http://kom.aau.dk Directory: group/04gr742/pdf File: MFCC worksheet.pdf, 2004.

[20]H. Rahali, Z. Hajaiej and N. Ellouze, “Robust Features for Speech Recognition using Temporal Filtering Technique in the Presence of Impulsive Noise”, I.J. Image, Graphics and Signal Processing, 2014, vol. 11, pp. 17-24.

[21]I. Trabelsi, D. B. Ayed and N.  Ellouze “Improved Frame Level Features and SVM Super vectors Approach for The Recognition of Emotional States from Speech: Application to Categorical and Dimensional States”, I.J. Image, Graphics and Signal Processing, 2013, vol. 9, pp. 8-13.

[22]Saloni1, R. K. Sharma, and Anil K. Gupta “Estimation and Statistical Analysis of Physical Task Stress on Human Speech Signal”, I.J. Image, Graphics and Signal Processing, 2016, vol. 10, pp. 29-34.