Silence Removal and Endpoint Detection of Speech Signal for Text Independent Speaker Identification

Full Text (PDF, 429KB), PP.27-35

Views: 0 Downloads: 0

Author(s)

Tushar Ranjan Sahoo 1,* Sabyasachi Patra 1

1. International Institute of Information Technology, Bhubaneswar, Odisha, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2014.06.04

Received: 17 Jan. 2014 / Revised: 5 Mar. 2014 / Accepted: 5 Apr. 2014 / Published: 8 May 2014

Index Terms

End point detection, short time energy, Gaussian distribution, signal to noise ratio, speaker identification, mel frequency cepstral coefficient, Gaussian mixture model

Abstract

In this paper we propose a composite silence removal technique comprising of short time energy and statistical method. The performance of the proposed algorithm is compared with the Short Time Energy (STE) algorithm and the statistical method with varying Signal to Noise Ratio (SNR). In the presence of low SNR the performance of proposed algorithm is highly appreciable in compare to STE and statistical method. We have applied the proposed algorithm in the pre processing stage of speaker identification system. A comparison between the speaker identification rate including and excluding the silence removal technique shows around 20% increase in identification rate by the application of this proposed algorithm.

Cite This Paper

Tushar Ranjan Sahoo, Sabyasachi Patra,"Silence Removal and Endpoint Detection of Speech Signal for Text Independent Speaker Identification", IJIGSP, vol.6, no.6, pp.27-35, 2014. DOI: 10.5815/ijigsp.2014.06.04

Reference

[1]L. Lamel, L. Rabiner, A.E. Rosenberg, J.G. Wilpon, "improved endpoint detector for isolated word recognition", IEEE Transactions on Acoustics, Speech and Signal Processing, Volume:29, Issue: 4, Aug, 1981.

[2]Sen Zhang, Graduate Sch., Chinese Acad. of Sci., Beijing, "an energy-based adaptive voice detection approach", 8th International Conference on Signal Processing , Volume: 1, 2006.

[3]M. Liscombe, A. Asif, "A new method for instantaneous signal period identification by repetitive pattern matching", Multitopic Conference, INMIC 2009. IEEE 13th International, Publication Year: 2009 , Page(s): 1-5.

[4]Deisher, E. Michael, A. S. Spanias, "HMM-based speech enhancement using harmonic modeling", IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97, 1997 , Volume: 2, Page(s): 117 -1178.

[5]A. Hussain, S.A. Samad, Liew Ban Fah, "Endpoint detection of speech signal using neural network", TENCON 2000. Proceedings, Volume: 1, Page(s): 271-274

[6]J. Ramirez, J.C. Segura, J.M. Gorriz, L. Garcia, "Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition", IEEE Transactions on Audio, Speech, and Language Processing, Volume: 15 , Publication Year: 2007 , Page(s): 2177- 2189.

[7]Dong Enqing, Liu Guizhong, Zhou Yatong, Cai Yu, "Voice activity detection based on short-time energy and noise spectrum adaptation", 6th International Conference on Signal Processing, 2002, Volume: 1, Publication Year: 2002, Page(s): 464-467.

[8]D. G. Childers, M. Hand, J. M. Larar, "Silent and Voiced/Unvoied/Mixed Excitation(Four-Way), Classification of Speech", IEEE Transaction on ASSP, Vol-37, No-11, pp. 1771-74, Nov 1989.

[9]Dragos Burileanu1, Lucian Pascalin1, Corneliu Burileanu1 and Mihai Puchiu, "An Adaptive and Fast Speech Detection Algorithm", Proceedings of the Third International Workshop on Text, Speech and Dialogue, 2000, Vol. 1902, pp. 177-182.

[10]G. Saha, Sandipan Chakroborty, Suman Senapat , "A New Silence Removal and Endpoint Detection Algorithm for Speech and Speaker Recognition Applications", Proceedings of the NCC 2005, Jan. 2005.

[11]S. E. Bou-Ghazale and K. Assaleh, "A robust endpoint detection of speech for noisy environments with application to automatic speech recognition", in Proc. ICASSP2002, vol. 4, 2002, pp. 3808–3811.

[12]R.B. Blazek, Wei-Tyng Hong, "Robust Hierarchical Linear Model Comparison for End-of-Utterance Detection under Noisy Environments", International Symposium on Biometrics and Security Technologies (ISBAST), 2012.

[13]M.G. Sumithra, A.K. Devika, "A study on feature extraction techniques for text independent speaker identification", International Conference on Computer Communication and Informatics (ICCCI), 2012, Page(s): 1-5.

[14]Shahzadi Farah, Azra Shamim, "Speaker recognition system using mel-frequency cepstrum coefficients, linear prediction coding and vector quantization", International Conference on Computer,Control & Communication (IC4), 2013, Page(s): 1-5.

[15]H. Ezzaidi, Jean Rouat, "Pitch and MFCC dependent GMM models for speaker identification systems", Canadian Conference on Electrical and Computer Engineering, 2004., Volume: 1, Page(s): 43-46.

[16]D.A. Reynolds, R.C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models", IEEE Transactions on Speech and Audio Processing, Volume: 3, Issue: 1 Publication Year: 1995 , Page(s): 72-83.

[17]Chee-Ming Ting, S.H. Salleh, Tian-Swee Tan A.K. Ariff, "Text independent Speaker Identification using Gaussian mixture model", International Conference on Intelligent and Advanced Systems, ICIAS 2007 , Page(s): 194-198.

[18]Abdul Manan Ahmad, Loh Mun Yee, "Vector quantization decision function for Gaussian Mixture Model based speaker identification", International Symposium on Intelligent Signal Processing and Communications Systems, 2008, ISPACS 2008., Page(s): 1-4.

[19]T.F. Covoes, E.R. Hruschka, "Unsupervised learning of Gaussian Mixture Models: Evolutionary Create and Eliminate for Expectation Maximization algorithm", IEEE Congress on Evolutionary Computation (CEC), 2013, Page(s): 3206 – 3213.

[20]D. A. Reynolds, "An overview of automatic speaker recognition technology", ICASSP, pp. 4072-4075, 2002.

[21]J.S. Garofolo, et. al., .DARPA TIMIT: Acoustic-Phonetic Continuous Speech Corpus, New Jersey: NIST Publications, 1993.