Work place: Technology Innovation Institute (TII), Abu Dhabi, U.A.E.
E-mail: sivaprasad.nandyala@tii.ae
Website: https://orcid.org/0000-0002-0806-8064
Research Interests:
Biography
Sivaprasad Nandyala currently working as Senior Machine Learning Engineer at Technology Innovation Institute (TII), Abu Dhabi, U.A.E. Dr. Nandyala career includes roles at Eaton Research Labs, Tata Elxsi, Wipro Technologies, Analog Devices and Ikanos communications, covering various technology domains. He has contributed significantly to the field with over 35 research publications and 3 granted patents. Dr. Nandyala earned his Ph.D in speech processing from NITWarangal, India, and was an ERASMUS MUNDUS scholarship recipient for Postdoctoral Research at Politecnico di Milano, Italy.
By Ravi Kumar Kandagatla V. Jayachandra Naidu P. S. Sreenivasa Reddy Sivaprasad Nandyala
DOI: https://doi.org/10.5815/ijigsp.2024.05.02, Pub. Date: 8 Oct. 2024
Deep learning based speech enhancement approaches provides better perceptual quality and better intelligibility. But most of the speech enhancement methods available in literature estimates enhanced speech using processed amplitude, energy, MFCC spectrum, etc along with noisy phase. Because of difficult in estimating clean speech phase from noisy speech the noisy phase is still using in reconstruction of enhanced speech. Some methods are developed for estimating clean speech phase and it is observed that it is complex for estimation. To avoid difficulty and for better performance rather than using Discrete Fourier Transform (DFT) the Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) based convolution neural networks are proposed for better intelligibility and improved performance. However, the algorithms work either features of time domain or features of frequency domain. To have advantage of both time domain and frequency domain here the fusion of DCT and time domain approach is proposed. In this work DCT Dense Convolutional Recurrent Network (DCTDCRN), DST Convolutional Gated Recurrent Neural Network (DSTCGRU), DST Convolution Long Short term Memory (DSTCLSTM) and DST Convolutional Gated Recurrent Neural Network (DSTDCRN) are proposed for speech enhancement. These methods are providing superior performance and less processing difficulty when compared to the state of art methods. The proposed DCT based methods are used further in developing joint time and magnitude based speech enhancement method. Simulation results show superior performance than baseline methods for joint time and frequency based processing. Also results are analyzed using objective performance measures like Signal to Noise Ratio (SNR), Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI).
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals