Fundamental Frequency Extraction by Utilizing Accumulated Power Spectrum based Weighted Autocorrelation Function in Noisy Speech

PDF (1164KB), PP.52-60

Views: 0 Downloads: 0

Author(s)

Nargis Parvin 1,* Moinur Rahman 2 Irana Tabassum Ananna 2 Md. Saifur Rahman 2

1. Department of Computer Science and Engineering, Bangladesh Army International University of Science and Technology (BAIUST), Cumilla, Bangladesh

2. Department of Information and Communication Technology, Comilla University, Cumilla, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2024.03.05

Received: 16 Sep. 2023 / Revised: 8 Nov. 2023 / Accepted: 21 Jan. 2024 / Published: 8 Jun. 2024

Index Terms

Accumulated Power Spectrum, Fundamental Frequency Extraction, Power Spectrum, Weighted Autocorrelation

Abstract

This research suggests an efficient idea that is better suited for speech processing applications for retrieving the accurate pitch from speech signal in noisy conditions. For this objective, we present a fundamental frequency extraction algorithm and that is tolerant to the non-stationary changes of the amplitude and frequency of the input signal. Moreover, we use an accumulated power spectrum instead of power spectrum, which uses the shorter sub-frames of the input signal to reduce the noise characteristics of the speech signals. To increase the accuracy of the fundamental frequency extraction we have concentrated on maintaining the speech harmonics in their original state and suppressing the noise elements involved in the noisy speech signal. The two stages that make up the suggested fundamental frequency extraction approach are producing the accumulated power spectrum of the speech signal and weighting it with the average magnitude difference function. As per the experiment results, the proposed technique appears to be better in noisy situations than other existing state-of-the-art methods such as Weighted Autocorrelation Function (WAF), PEFAC, and BaNa.

Cite This Paper

Nargis Parvin, Moinur Rahman, Irana Tabassum Ananna, Md. Saifur Rahman, "Fundamental Frequency Extraction by Utilizing Accumulated Power Spectrum based Weighted Autocorrelation Function in Noisy Speech", International Journal of Information Technology and Computer Science(IJITCS), Vol.16, No.3, pp.52-60, 2024. DOI:10.5815/ijitcs.2024.03.05

Reference

[1]X. Zhang, H. Zhang, S. Nie, G. Gao and W. Liu, “A Pairwise Algorithm Using the Deep Stacking Network for Speech Separation and Pitch Estimation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, No. 6, pp. 1066- 1078, 2016, doi: 10.1109/ICASSP.2015.7177969.
[2]J. Stahl and P. Mowlaee, "A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, No. 2, pp. 436-450, 2018, doi: 10.1109/TASLP.2017.2779405.
[3]L. Rabiner, M. Cheng, A. Rosenberg and C. McGonegal, "A comparative performance study of several pitch detection algorithms," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 24, No. 5, pp. 399-418, 1976, doi: 10.1109/TASSP.1976.1162846.
[4]K. A. Oh and C. K. Un, "A performance comparison of pitch extraction algorithms for noisy speech," Proceedings under IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 18B4.1–18B4.4, 1984, doi: 10.1109/ICASSP.1984.1172551.
[5]L. Sukhostat and Y. Imamverdiyev, "A comparative analysis of pitch detection methods under the influence of different noise conditions,” Journal of voice, Vol. 29, No. 4, pp. 410-417, 2015, doi: 10.1016/j.jvoice.2014.09.016.
[6]W. J. Hess, "Pitch Determination of Speech Signals," Berlin, Germany: Springer-Verlag, 1983, doi: 10.1007/978-3-642-81926-1.
[7]L. R. Rabiner, "On the use of autocorrelation analysis for pitch detection", IEEE Transaction on Acoustics, Speech, Signal Processing, Vol. ASSP-25, No. 1, pp. 24–33, 1977, doi: 10.1109/TASSP.1977.1162905.
[8]M. Ross, H. Shaffer, A. Cohen, R. Freudberg and H. Manley, "Average magnitude difference function pitch extractor," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 22, No. 5, pp. 353-362, 1974, doi: 10.1109/TASSP.1974.1162598.
[9]Un CK, Yang S, "A pitch extraction algorithm based on LPC inverse filtering and AMDF," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 25, No.6, pp. 353-362, 1977, doi: 10.1109/TASSP.1977.1163005.
[10]R. Chakraborty, D. Sengupta, and S. Sinha, "Pitch tracking of acoustic signals based on average squared mean difference function," Signal, image and video processing, Vol. 3, No. 4, pp. 319–327, 2009, doi: 10.1007/s11760-008-0072-5.
[11]T. Shimamura and H. Kobayashi, "Weighted autocorrelation for pitch extraction of noisy speech,” IEEE Transactions on Speech and Audio Processing, Vol. 9, No. 7, pp. 727-730, 2001, doi: 10.1109/89.952490.
[12]A. De Cheveigne and H. Kawahara, "Yin, a fundamental frequency estimator for speech and music," The Journal of the Acoustical Society of America, Vol. 111, No. 4, pp. 1917–1930, 2002, doi: 10.1121/1.1458024.
[13]A. M. Noll, "Short-time spectrum and cepstrum techniques for vocal-pitch detection," The Journal of the Acoustical Society of America, Vol. 36, No. 2, pp. 296–302, 1964, doi: 10.1121/1.1918949.
[14]S. Ahmadi and A. S. Spanias, "Cepstrum-based pitch detection using a new statistical v/uv classification algorithm," IEEE Transactions on Speech and Audio Processing, Vol. 7, No. 3, pp. 333–338, 1999, doi: 10.1109/89.759042.
[15]Kobayashi H, Shimamura T., "A modified cepstrum method for pitch extraction," Proceedings of IEEE Asia-Pacific International Conference on Circuits and Systems Microelectronics and Integrating Systems (APCCAS), 1998, doi: 10.1109/APCCAS.1998.743751.
[16]Kunieda N, Shimamura T, Suzuki J, "Pitch extraction by using autocorrelation function on the log spectrum," Electronics and Communications in Japan, Part 3, Vol. 83, No.1, pp. 90–98, 2000, doi: 10.1002/(SICI)1520-6440(200001)83.
[17]Lahat M, Niederjohn RJ, Krubsack DA. "A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.35, No. 6, pp. 741-750, 1987, doi: 10.1109/TASSP.1987.1165224.
[18]Hasan MAFMR, Rahman MS, Shimamura T. "Windowless autocorrelation-based cepstrum method for pitch extraction of noisy speech,” Journal of Signal Processing, Vol. 16, No. 3, pp. 231-239, 2012, doi: 10.2299/jsp.16.231.
[19]S. Gonzalez and M. Brookes, "PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 22, No. 2, pp. 518-530, 2014, doi: 10.1109/TASLP.2013.2295918.
[20]N. Yang, H. Ba, W. Cai, I. Demirkol and W. Heinzelman, "BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 22, No. 12, pp. 1833- 1848, 2014, doi: 10.1109/TASLP.2014.2352453.
[21]Hermes DJ, "Measurement of pitch by subharmonic summation," Journal of the Acoustical Society of America, Vol.83, No.1, pp. 257–264, 1988, doi: 10.1121/1.396427.
[22]D. Wang, C. Yu, and J. H. Hansen, "Robust harmonic features for classification-based itch estimation," IEEE/ACM Transaction on Audio, Speech, Language Processing, Vol. 25, No. 5, pp. 952–964, 2017, doi: 10.1109/TASLP.2017. 2667879..
[23]Y. Liu and D. Wang, "Speaker-dependent multi pitch tracking using deep neural networks," The Journal of the Acoustical Society of America, Vol. 141, No. 2, pp. 710–721, 2017, doi: 10.1121/1.4973687.
[24]S. Lin, "Robust Pitch Estimation and Tracking for Speakers Based on Subband Encoding and The Generalized Labeled Multi-Bernoulli Filter," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 27, No. 4, pp. 827-841, 2019, doi: 10.1109/TASLP.2019.2898818.
[25]S. Lin, "A new frequency coverage metric and a new subband encoding model, with an application in pitch estimation," Proceedings of Annual Conference of the International Speech Communication Association, pp. 2147–2151, 2018, doi: 10.21437/Interspeech.2018-2590.
[26]M. S. Rahman, Y. Sugiura, and T Shimamura, "Utilization of windowing effect and accumulated autocorrelation function and power spectrum for pitch detection in noisy environments," IEEJ Transactions on Electrical and Electronic Engineering, Vol. 15, No. 11, pp. 1681–1690, 2020, doi: 10.1002/tee.23238.
[27]Plante F, Meyer G, Ainsworth W, "A fundamental frequency extraction reference database," Proceedings of the Eurospeech, pp. 837–840, 1995, doi: 10.21437/Eurospeech.1995-191.
[28]20 Countries Language Database, NTT Advanced Technology Corp., Jpn, (1988)
[29]Wcng, "Wireless communication networking group, [Online]. Available, http://www.ece.rochester.edu/projects/wcng/code.html".
[30]M. Brookes, "Voicebox toolkit, [Online]. Available, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html"