A Survey on Statistical Based Single Channel Speech Enhancement Techniques

Full Text (PDF, 729KB), PP.69-85

Views: 0 Downloads: 0

Author(s)

Sunnydayal. V 1,* N. Sivaprasad 1 T. Kishore Kumar 1

1. National Institute of Technology Warangal, Warangal-506004, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2014.12.10

Received: 14 Feb. 2014 / Revised: 1 Jun. 2014 / Accepted: 22 Aug. 2014 / Published: 8 Nov. 2014

Index Terms

Speech Enhancement, Wiener Filtering, MMSE Estimator, Bayesian Estimators, Maximum A Posteriori (MAP) Estimators

Abstract

Speech enhancement is a long standing problem with various applications like hearing aids, automatic recognition and coding of speech signals. Single channel speech enhancement technique is used for enhancement of the speech degraded by additive background noises. The background noise can have an adverse impact on our ability to converse without hindrance or smoothly in very noisy environments, such as busy streets, in a car or cockpit of an airplane. Such type of noises can affect quality and intelligibility of speech. This is a survey paper and its object is to provide an overview of speech enhancement algorithms so that enhance the noisy speech signal which is corrupted by additive noise. The algorithms are mainly based on statistical based approaches. Different estimators are compared. Challenges and Opportunities of speech enhancement are also discussed. This paper helps in choosing the best statistical based technique for speech enhancement.

Cite This Paper

Sunnydayal. V, N. Sivaprasad, T. Kishore Kumar, "A Survey on Statistical Based Single Channel Speech Enhancement Techniques", International Journal of Intelligent Systems and Applications(IJISA), vol.6, no.12, pp.69-85, 2014. DOI:10.5815/ijisa.2014.12.10

Reference

[1]D. O’Shaughnessy, Speech Communication: Human and Machine, Addison-Wesley, Reading, MA, 1987.

[2]P.C. Loizou, “Speech Enhancement: Theory and Practice,” 1st Ed. Boca Raton, FL: CRC, 2007. 

[3]L.R. Rabiner, R.W. Schafer, “Digital Processing Of Speech Signals”, Prentice Hall, Englewood Cliffs, NJ, 1978.

[4]T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice.Prentice Hall, 2001

[5]Yi hu, P.C. Loizou, “Subspace approach for enhancing speech corrupted by colored noise”, in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2002. I-573-576.

[6]Jingdong Chen , Benesty, J. , Yiteng Huang Doclo, “New insights into the noise reduction wiener filter,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1218–1234, July 2006.

[7]Amehraye, A., Pastor, D., Tamtaoui, A., “Perceptual improvement of wiener filtering,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process (ICASSP), 2008, pp. 2081-2084

[8]Fei Chen, Loizou, P.C. “Speech enhancement using a frequency-specific composite wiener function,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2010, pp. 4726-4729.

[9]Jingdong Chen, Benesty, J., “ Analysis of the frequency-domain wiener filter with the prediction gain,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2010, pp. 209-212.

[10]Chung-Chien Hsu, Tse-En Lin, Jian-Hueng Chen and Tai-Shih Chi, “Spectro-temporal subband wiener Filter for speech enhancement,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2012, pp. 4001-4004

[11]Feng Huang, Tan Lee , Kleijn, W.B. “Transform domain wiener filter for speech periodicity enhancement,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2012, pp. 4577-4580.

[12]Steven M Kay, “Fundamentals of statistical Signal Processing: estimation Theory” Prentice Hall, New Jersey, 1993.

[13]McAulay, R. , Malpass, M., “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. Acoustic, Speech, Signal. Process., vol. 28, no. 2, pp. 137–145, April 1980.

[14]Yoshioka T, Nakatani T, Hikichi Takafumi Miyoshi, M, “Maximum likelihood approach tospeech enhancement for noisy reverberant signals,” in proc..IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2008, pp. 4585-4588.

[15]Ephraim Y, “Bayesian estimation approach for speech enhancement using hidden markov models,” IEEE Trans. Signal Process., vol. 40, no. 4, pp. 725-735, April. 1992.

[16]Chang Huai You , Soo Ngee Koh , Rahardja, S, “Beta-order mmse spectral amplitude estimation for speech enhancement,” IEEE Trans. On Speech and Audio. Process., vol. 13, no. 4, pp.475–481, July 2005.

[17]Srinivasan S, Samuelsson J. , Kleijn W.B, “Codebook-based bayesian speech enhancement for nonstationary environments,” IEEE Trans. Audio, Speech, and Language Process., vol. 15, no. 2, pp. 441–452, Feb 2007.

[18]Kundu A, Chatterjee S, Sreenivasa Murthy A, Sreenivas T.V, “GMM based bayesian approach to speech enhancement in signal transform domain,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2008, pp. 4893-4896.

[19]Yoshioka T., Miyoshi M., “Adaptive suppression of non-stationary noise by using the variational bayesian method,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2008, pp. 4889-4892.

[20]Plourde E, Champagne B., “Auditory-based spectral amplitude estimators for speech enhancement,” IEEE Trans. Audio, Speech, and Language Process., vol. 16, no. 8, pp. 1614–1623, Nov. 2008.

[21]P. J. Wolfe, S. J. Godsill, “Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Istanbul, Turkey, 2000, pp. 821–824.

[22]P. J. Wolfe, S. J. Godsill, “A perceptually balanced loss function for short-time spectral amplitude estimation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Hong Kong, 2003, pp. 425–428.

[23]P. C. Loizou, “Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 857–869, Sep. 2005.

[24]C. H. You, S. N. Koh, S. Rahardja, “ Beta-order MMSE spectral amplitude estimation for speech enhancement,” IEEE Trans. Speech Audio Process., vol. 13, no. 4, pp. 475–486, Jul. 2005.

[25]Plourde E, Champagne B, “Generalized bayesian estimators of the spectral amplitude for speech enhancement,” IEEE Signal Processing Letters., vol. 16, no. 6, pp.485–488, June. 2009.

[26]Jiucang Hao, Attias H, Nagarajan S, Sejno T.J, “Speech enhancement, gain, and noise spectrum adaptation using approximate bayesian estimation,” IEEE Trans. Audio, Speech, and Language Process., vol. 17, no. 1, pp.24–37, Jan 2009.

[27]Whitehead P.S, Anderson D.V, “Robust bayesian analysis applied to wiener filtering of speech,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2011, pp. 5080-5083.

[28]Plourde E,Champagne B, “Multidimensional STSA estimators for speech enhancement with correlated spectral components,” IEEE Trans. Signal Process., vol. 59, no. 7, pp.3013–3024, July. 2011

[29]Nielsen J.K, Christensen M.G. Jensen S.H, “An approximate bayesian fundamental frequency estimator,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2012, pp. 4617-4620.

[30]Mohammadiha N , Taghia J, Leijon A., “Single channel speech enhancement using bayesian nmf with recursive temporal updates of prior distributions,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2012, pp. 4561-4564.

[31]Ephraim Y, Malah D., “Speech enhancement using a- minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoustic, Speech, Signal. Process., vol. 32, no. 6, pp. 1109–1121, Dec 1984.

[32]J. M. Tribolet, R. E. Crochiere, “Frequency domain coding of speech,” IEEE pans. Acoust., Speech, Signal Processing, vol. ASSP-27, p. 522, Oct. 1979.

[33]R. Zelinski, P. Noll, “Adaptive transform coding of speech signals,” IEEE Pans. Acoust., Speech, Signal Processing, vol. ASSP-25, p. 306, Aug. 1977.

[34]J. E. Porter, S. F. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Roc. IEEE Int. Conf Acoust., Speech, Signal Processing, Mar. 1984, pp. 18A.2.1-18A.2.4.

[35]Ephraim, Y., Malah, D., “Speech enhancement using a minimum mean-square error log- spectral amplitude,” IEEE Trans. Acoustic, Speech, Signal. Process., vol. 33, no. 2, pp. 443–445, Apr. 1985.

[36]Zhong-Xuan Yuan, Soo Ngee Koh, Soon, I.Y., “Speech enhancement based on hybrid algorithm,” IET Electronics Letters, vol. 35, no. 20, pp.1710–1712, Sept. 1999. 

[37]Guo-Hong Ding, Taiyi Huang,; Bo Xu, “Suppression of additive noise using a power spectral density mmse estimator,” IEEE Signal Processing Letters., vol. 11, no.6, pp.585–588, June. 2004.

[38]Li Deng , Droppo J, Acero A., “Enhancement of log mel power spectra of speech using a phase- sensitive model the acoustic environment and sequential estimation of the corrupting noise,” IEEE Trans. Speech and Audio Process., vol. 12, no. 2, pp.133–143, March. 2004.

[39]Li Deng , Droppo, J, Acero, A., “Estimating cepstrum of speech under presence of noise using a joint prior of static and dynamic features,” IEEE Trans. Speech and Audio Process., vol. 12, no. 3, pp.218–233, May. 2004.

[40]Breithaup C., Krawczyk M., Martin R, “Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech ,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2008, pp. 4037-4040.

[41]Uemura Y, Takahashi Yu, Saruwatari H, Shikano K, Kondo K., “Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimation,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2009, pp. 4433-4436.

[42]Yang Lu, Loizou, P.C., “Speech enhancement by combining statistical estimators of speech and noise,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2010, pp. 4754- 4757.

[43]Hasan T, Hasan M.K., “MMSE estimator for speech enhancement considering the constructive and destructive interference of noise,” IET Signal Process., vol. 4, no. 1, pp. 1-11, Feb. 2010.

[44]Yu Gwang Jin , Chul Min Lee, Kiho Cho , “A data-driven residual gain approach for two-stage speech enhancement,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2011, pp. 4752-4755.

[45]Gerkmann T, Krawczyk M, “MMSE-Optimal spectral amplitude estimation given the stft-phase,” IEEE Signal Processing Letter., vol. 20, no. 2, pp.129–132, Feb. 2013.

[46]Wung J, Miyabe S, Biing-Hwang Juang , “Speech enhancement using minimum mean-square error estimation and a post-filter derived from vector quantization of clean speech,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2009, pp. 4657-4660.

[47]Borgstrom B.J, Alwan Abeer, “Log-spectral amplitude estimation with generalized gamma Distributions for speech enhancement,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2011, pp. 4756-4749.

[48]Ephraim Y, Roberts William J.J, “On second-order statistics of log-periodogram with correlated components,” IEEE Signal Processing Letter., vol. 12, no. 9, pp. 625–628, Sept. 2005.

[49]Gazor, S., Wei Zhang, “Speech enhancement employing Laplacian–Gaussian mixture,” IEEE Trans. Speech and Audio Process., vol. 13, no. 5, pp. 896–904, Sept. 2005.

[50]Martin R, “Speech enhancement based on minimum mean-square error estimation and supergaussian priors,” IEEE Trans. Speech and Audio Process, vol. 13, no. 5, pp. 845–856, Sept. 2005.

[51]Erkelens J.S, Hendriks R.C, Heusdens R, JensenJ, “Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors,” IEEE Trans. Audio, Speech, and Language Process., vol. 15, no. 6, pp.1741–1752, Aug. 2007.

[52]Hendriks R.C, Heusdens R, Jensen J, “An MMSE estimator for speech enhancement under a combined stochastic–deterministic speech model,” IEEE Trans. Audio, Speech, and Language Process, vol. 15, no. 2, pp.406–415, Feb. 2007.

[53]Andrianakis Y, White Paul R, “A speech enhancement algorithm based on a chi MRF model of the speech STFT amplitudes,” IEEE Trans. On Audio, Speech, and Language Process., vol. 17, no. 8, pp.1508–1517, Nov. 2009.

[54]I. Cohen, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” IEEE Trans. On Speech Audio Process., vol. 13, no. 5, pp. 870–881, Sep. 2005.

[55]Y. Ephraim, D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust, Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109–1121, Dec. 1984.

[56]E. Zavarehei, S. Vaseghi, and Q. Yan, “Noisy speech enhancement using harmonic-noise model and codebook-based post-processing,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1194–1203, May 2007.

[57]Hendriks R.C, Heusdens R., “On linear versus non-linear magnitude-DFT estimators and the influence of super-gaussian speech priors,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2010, pp. 4750-4753.

[58]Borgstrom B.J, Alwan A, “A unified framework for designing optimal STSA estimators assuming maximum likelihood phase equivalence of speech and noise,” IEEE Trans. Audio, Speech, and Language Process., vol. 19, no. 8, pp.2579–2590, Nov. 2011.

[59]Fodor B, Fingscheidt T, “MMSE speech enhancement under speech presence uncertainty assuming (generalized) gamma speech priors throughout,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2012, pp. 4033-4036.

[60]Hendriks R.C, Martin R, “MAP estimators for speech enhancement under normal and rayleigh inverse gaussian distributions,” IEEE Trans. Audio, Speech, and Language Process., vol. 15, no. 3, pp.918–927, March. 2007.

[61]Guo-Hong Ding, “Maximum a posteriori noise log-spectral estimation based on first-order vector Taylor series expansion,” IEEE Signal Processing Letter, vol. 15, no. 2, pp.158–161, Jan. 2008.

[62]Fodor B, Fingscheidt T , “Speech enhancement using a joint MAP estimator with gaussian mixture model for (non) stationary noise ,” in Proc.IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2011, pp. 4768-4771.

[63]Loizou P.C, “Speech enhancement based on perceptually motivated bayesian estimators of magnitude spectrum,” IEEE Trans. Speech and Audio Process., vol. 13, no. 5, pp.857–869, Sept. 2005.

[64]Plourde E., Champagne B., “Perceptually based speech enhancement using the weighted ß-SA estimator,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2008, pp. 4193-4196.

[65]Nam Soo Kim, Joon-Hyuk Chang, “Spectral enhancement based on global soft decision,” IEEE Signal Processing Letter., vol. 7, no. 5, pp.108–110, May. 2000.

[66]Cohen I., “Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator,” IEEE Signal Processing Letter., vol. 9, no. 4, pp. 113–116, Apr. 2002.

[67]Gerkmann T, Krawczyk M, Martin R, “Speech presence probability estimation based on temporal cepstrum smoothing ,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP) 2010, pp. 4254-4257. 

[68]Zhong-Hua Fu, Jhing-Fa Wang, “Speech presence probability estimation based on integrated time frequency minimum tracking for speech enhancement in adverse environments ,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2010, pp. 4258-4261.

[69]Abramson, A., Cohen I., “Simultaneous detection and estimation approach for speech enhancement,” IEEE Trans. Audio, Speech, and Language Process., vol. 15, no. 8, pp. 2348–2359, Nov. 2007.

[70]Gerkmann T, Breithaupt C, Martin R., “Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors,” IEEE Trans. Audio, Speech, and Language Process., vol. 16, no. 5, pp.910–919, July. 2008.

[71]Cohen I., “Speech enhancement using a noncausal a priori SNR estimator,” IEEE Signal Processing Letters., vol. 11, no. 9, pp.725–728, Sept. 2004.

[72]Cohen I, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” IEEE Trans. Speech and Audio Process., vol. 13, no. 5, pp. 870–881, Sept. 2005.

[73]Richard C. Hendriks, Richard Heusdens, Jesper Jensen “Adaptive Time Segmentation for Improved Speech Enhancement,” IEEE Trans. On audio, speech, and language process. Vol. 14, No. 6, Nov 2006, Page (s): 2064 – 2074.

[74]Plapous C, Marro C, Scalart P, “Improved signal-to-noise ratio estimation for speech enhancement,” IEEE Trans. Audio, Speech, and Language Process., vol. 14, no. 6, pp.2098–2108, Nov. 2006.

[75]Yao Ren , Johnson, M.T. , “An improved snr estimator for speech enhancement,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2008, pp. 4901-4904.

[76]Breithaupt C. Martin R., “Analysis of the decision-directed snr estimator for speech enhancement with respect to low-SNR and transient conditions,” IEEE Trans. Audio, Speech, and Language Process., vol. 19, no. 2, pp. 277–289, Feb. 2011.

[77]Suhadi S, Last C, Fingscheidt T, “A data-driven approach to a priori SNR estimation,” IEEE Trans. Audio, Speech, and Language Process., vol. 19, no. 1, pp.186–195, Jan. 2011.

[78]Pei Chee Yong, Nordholm S, Hai Huyen Dam, “Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation,” in Proc. IEEE Int. conf. Acoustics, Speech and Signal Process. (ICASSP), 2012, pp. 4657-4660.

[79]Chaogang Wu,Bo Li,Jin Zheng, “A Speech Enhancement Method Based on Kalman Filtering,” IJWMT Vol. 1, No. 2, April 2011.

[80]Noureddine Aloui,Ben Nasr Mohamed,Adnane Cherif, “Genetic Algorithm For Designing QMF Banks and Its Application In Speech Compression Using Wavelets,” IJIGSP Vol.5, No.6, May 2013.

[81]Navneet Upadhyay, Abhijit Karmakar, “Spectral Subtractive-Type Algorithms for Enhancement of Noisy Speech: An Integrative Review,” IJIGSP Vol.5, No.11, September 2013.