Work place: Department of ECE, Sri venkateswara College of Engineering & Technology, Chittoor, India
E-mail: jayachandra.velur@gmail.com
Website: https://orcid.org/0009-0006-4965-7793
Research Interests: Signal Processing, Computer Vision, Machine Learning
Biography
V. Jayachandra Naidu received his M.Tech in Communication Engineering from Vellore Institute of Technology, Vellore, Tamil Nadu, India in 2004. Currently, he is working as an Associate Professor in the Department of ECE at Sri Venkateswara College of Engineering and Technology (Autonomous), Chittoor, Andhra Pradesh, India. His research interests include Computer Vision, Machine Learning, Signal Processing, Embedded Systems and IoT.
By Ravi Kumar Kandagatla V. Jayachandra Naidu P. S. Sreenivasa Reddy Sivaprasad Nandyala
DOI: https://doi.org/10.5815/ijigsp.2024.05.02, Pub. Date: 8 Oct. 2024
Deep learning based speech enhancement approaches provides better perceptual quality and better intelligibility. But most of the speech enhancement methods available in literature estimates enhanced speech using processed amplitude, energy, MFCC spectrum, etc along with noisy phase. Because of difficult in estimating clean speech phase from noisy speech the noisy phase is still using in reconstruction of enhanced speech. Some methods are developed for estimating clean speech phase and it is observed that it is complex for estimation. To avoid difficulty and for better performance rather than using Discrete Fourier Transform (DFT) the Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) based convolution neural networks are proposed for better intelligibility and improved performance. However, the algorithms work either features of time domain or features of frequency domain. To have advantage of both time domain and frequency domain here the fusion of DCT and time domain approach is proposed. In this work DCT Dense Convolutional Recurrent Network (DCTDCRN), DST Convolutional Gated Recurrent Neural Network (DSTCGRU), DST Convolution Long Short term Memory (DSTCLSTM) and DST Convolutional Gated Recurrent Neural Network (DSTDCRN) are proposed for speech enhancement. These methods are providing superior performance and less processing difficulty when compared to the state of art methods. The proposed DCT based methods are used further in developing joint time and magnitude based speech enhancement method. Simulation results show superior performance than baseline methods for joint time and frequency based processing. Also results are analyzed using objective performance measures like Signal to Noise Ratio (SNR), Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI).
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals