Robust Voice Activity Detection Algorithm based on Long Term Dominant Frequency and Spectral Flatness Measure

Full Text (PDF, 1291KB), PP.50-58

Views: 0 Downloads: 0

Author(s)

Naorem Karline Singh 1,* Yambem Jina Chanu 1

1. Department of Computer Science and Engineering, National Institute of Technology Manipur, 795004, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2017.08.06

Received: 18 Apr. 2017 / Revised: 1 May 2017 / Accepted: 13 May 2017 / Published: 8 Aug. 2017

Index Terms

Voice activity detection, dominant frequency component, spectral flatness measure

Abstract

In this paper, a robust voice activity detection algorithm based on a long-term metric using dominant frequency and spectral flatness measure is proposed. The propose algorithm makes use of the discriminating power of both features to derive the decision rule. This method reduces the average number of speech detection errors. We evaluate its performance using 15 additive noises at different SNRs (-10 dB to 10 dB) and compared with some of the most recent standard algorithms. Experiments show that our propose algorithm achieves the best performance in terms of accuracy rate average over all SNRs and noises.

Cite This Paper

Naorem Karline Singh, Yambem Jina Chanu,"Robust Voice Activity Detection Algorithm based on Long Term Dominant Frequency and Spectral Flatness Measure", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.9, No.8, pp.50-58, 2017. DOI: 10.5815/ijigsp.2017.08.06

Reference

[1]J. Górriz, J. Ramírez, E. W. Lang, C. G. Puntonet, and I. Turias, “Improved likelihood ratio test based voice activity detector applied to speech recognition,” Speech Communication, vol. 52, no. 7, pp. 664–677, 2010.

[2]S. E. Tranter and D. A. Reynolds, “An overview of automatic speaker diarization systems,” IEEE Transactions on audio, speech, and language processing, vol. 14, no. 5, pp. 1557–1565, 2006.

[3]D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted gaussian mixture models,” Digital signal processing, vol. 10, no. 1-3, pp. 19–41, 2000.

[4]D. Freeman, G. Cosier, C. Southcott, and I. Boyd, “The voice activity detector for the pan-european digital cellular mobile telephone service,” pp. 369–372, 1989.

[5]D. Enqing, Z. Heming, and L. Yongli, “Low bit and variable rate speech coding using local cosine transform,” vol. 1, pp. 423–426, 2002.

[6]J. Alam, P. Kenny, P. Ouellet, T. Stafylakis, and P. Dumouchel, “Supervised/unsupervised voice activity detectors for text-dependent speaker recognition on the rsr2015 corpus,” 2014.

[7]Benyassine, E. Shlomot, H.-Y. Su, and E. Yuen, “A robust low complexity voice activity detection algorithm for speech communication systems,” pp. 97–98, 1997.

[8]L. R. Rabiner and M. R. Sambur, “An algorithm for determining the endpoints of isolated utterances,” Bell Labs Technical Journal, vol. 54, no. 2, pp. 297–315, 1975.

[9]T. Kristjansson, S. Deligne, and P. Olsen, “Voicing features for robust speech detection,” Entropy, vol. 2, no. 2.5, p. 3, 2005.

[10]D.-J. Liu and C.-T. Lin, “Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 6, pp. 609–621, 2001.

[11]S. Ahmadi and A. S. Spanias, “Cepstrum-based pitch detection using a new statistical v/uv classification algorithm,” IEEE Transactions on speech and audio processing, vol. 7, no. 3, pp. 333–338, 2010.

[12]J. Ramırez, J. C. Segura, C. Benıtez, A. De La Torre, and A. Rubio, “Efficient voice activity detection algorithms using long-term speech information,” Speech communication, vol. 42, no. 3, pp. 271–287, 2004.

[13]T. Fukuda, O. Ichikawa, and M. Nishimura, “Long-term spectro-temporal and static harmonic features for voice activity detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 834–844, 2010.

[14]P. K. Ghosh, A. Tsiartas, and S. Narayanan, “Robust voice activity detection using long-term signal variability,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 3, pp. 600–613, 2011.

[15]Y. Ma and A. Nishihara, “Efficient voice activity detection algorithm using long-term spectral flatness measure,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2013, no. 1, p. 87, 2013.

[16]T. V. Pham, C. T. Tang, and M. Stadtschnitzer, “Using artificial neural network for robust voice activity detection under adverse conditions,” pp. 1–8, 2009.

[17]P. Estevez, N. Becerra-Yoma, N. Boric, and J. Ramırez, “Genetic programming-based voice activity detection,” Electronics Letters, vol. 41, no. 20, pp. 1141–1143, 2005.

[18]D. Enqing, L. Guizhong, Z. Yatong, and Z. Xiaodi, “Applying support vector machines to voice activity detection,” vol. 2, pp. 1124–1127, 2002.

[19]G. A. N. Anita Ahmad, Fernando Soares Schlindwein, “Comparison of computation time for estimation of dominant frequency of atrial electrograms: Fast fourier transform, blackman tukey, autoregressive and multiple signal classification,” Biomedical Science and Engineering, pp. 843–847, 2010.

[20]C. Valentini-Botinhao et al., “Superseded-noisy speech database for training speech enhancement algorithms and tts models,” 2016.

[21]S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113 – 120, 1979.

[22]A. Davis, S. Nordholm, and R. Togneri, “Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 412–424, 2006.

[23]K. Sjölander and J. Beskow, “Wavesurfer-an open source speech tool.” pp. 464–467, 2000.

[24]A. Varga and H. J. Steeneken, “Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech communication, vol. 12 , no. 3, pp. 247–251, 1993.