Robust Features for Speech Recognition using Temporal Filtering Technique in the Presence of Impulsive Noise

Full Text (PDF, 578KB), PP.17-24

Views: 0 Downloads: 0

Author(s)

Hajer Rahali 1,* Zied Hajaiej 1 Noureddine Ellouze 1

1. Laboratory of Systems and Signal Processing (LSTS) BP 37, Le Belvédère, 1002 Tunis, Tunisie

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2014.11.03

Received: 27 Jun. 2014 / Revised: 2 Aug. 2014 / Accepted: 29 Aug. 2014 / Published: 8 Oct. 2014

Index Terms

Auditory filter, impulsive noise, MFCC, RASTA filter, ARMA filter, HMM\GMM

Abstract

In this paper we introduce a robust feature extractor, dubbed as Modified Function Cepstral Coefficients (MODFCC), based on gammachirp filterbank, Relative Spectral (RASTA) and Autoregressive Moving-Average (ARMA) filter. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC), RASTA and ARMA Frequency Cepstral Coefficients (RASTA-MFCC and ARMA-MFCC) are the three main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC were tested and compared against the original RASTA-MFCC and ARMA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Cite This Paper

Hajer Rahali, Zied Hajaiej, Noureddine Ellouze,"Robust Features for Speech Recognition using Temporal Filtering Technique in the Presence of Impulsive Noise", IJIGSP, vol.6, no.11, pp.17-24, 2014. DOI: 10.5815/ijigsp.2014.11.03

Reference

[1]J. O. Smith III, and J.S. Abel, “Bark and ERB Bilinear Transforms,” IEEE Tran. On speech and Audio Processing, Vol. 7, No. 6, November 1999.

[2]H.G. Musmann, “Genesis of the MP3 audio coding standard,” IEEE Trans. on Consumer Electronics, Vol. 52, pp. 1043 – 1049, Aug. 2006.

[3]H. G. Hirsch, and D. Pearce, “The AURORA Experiment Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Condition,” ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium, France, 2000. 

[4]M. Brookes, “VOICEBOX: Speech Processing Toolbox for MATLAB,” Software, available [Mar, 2011].

[5]E. Ambikairajah, J. Epps, and L. Lin., “Wideband speech and audio coding using gammatone filter banks,” Proc. ICASSP’01, Salt Lake City, USA, May2001, vol.2, pp.773-776.

[6]M.N. Viera, F.R. McInnes, and M.A. Jack, “Robust F0 and Jitter estimation in the Pathological voices,” Proceedings of ICSLP96, Philadelphia, pp.745–748, 1996.

[7]Salhi. L., “Design and implementation of the cochlear filter model based on a wavelet transform as part of speech signals analysis,” Research Journal of Applied Sciences 2 (4): 512-521, 2007?Medwell-Journal 2007.

[8]WEBER F, MANGANARO L, PESKIN B, and SHRIBERG E., “Using prosodic and lexical information for speaker identification,” Proc. ICASSP, Orlando, FL, May 2002.

[9]J. W. Pitton, K. Wang, and B. H. Juang, “Time-frequency analysis and auditory modeling for automatic recognition of speech,” Proc. IEEE, vol. 84, pp. 1199–1214, Sept. 1996. 

[10]E. Loweimi and S. M. Ahadi, “A new group delay-based feature for robust speech recognition,” in Proc. IEEE Int. Conf. on Multimedia & Expo, Barcelona, pp. 1-5, July 2011. 

[11]Irino. T, E. Okamoto, R. Nisimura, Hideki Kawahara and Roy D. Patterson, "A Gammachirp Auditory Filterbank for Reliable Estimation of Vocal Tract Length from both Voiced and Whispered Speech," The 4th Annual Conference of the British Society of Audiology, Keele, UK, 4-6, Sept, 2013. 

[12]C. Kim and R. M. Stern., “Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring,” In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4574-4577, March 2010. 

[13]Daniel PW Ellis and Byunk Suk Lee, “Noise robust pitch tracking by subband autocorrelation classi?cation,” in 13th Annual Conference of the International Speech Communication Association, 2012.

[14]D. Povey, and L. Burget, “The Subspace Gaussian Mixture Model–A Structured Model for Speech Recognition,” Computer Speech & Language, vol. 25, no. 2, pp. 404–439, April 2011.

[15]C.-P. Chen, J. Bilmes and K. Kirchhoff, “Low-Resource Noise-Robust Feature Post-Processing on Aurora 2.0,” pp. 2445-2448, Proceedings of ICSLP 2002.

[16]H. Hermansky and N. Morgan, "RASTA Process ing of Speech," IEEE. Trans. on Speech and Audio Processing, Vol.2, No.4, Oct. 1994.