Work place: Department of ECE, Lakireddy Bali Reddy College of Engineering, Mylavaram, India
E-mail: 2k6ravi@gmail.com
Website:
Research Interests: Speech Synthesis, Speech Recognition
Biography
Ravi Kumar Kandagatla was born in Markapur, India in 1988. He received the Bachelor of Technology degree from Jawaharlal Nehru Technological University, Kakinada in 2009 and received Master of Technology in Digital Electronics and Communication Systems from Jawaharlal Nehru Technological University, Kakinada in 2011. He is Currently working as a Research Scholar at JNTUK, Kakinada and also working as Assistant professor in Lakireddy Balireddy College of Engineering, Mylavaram, India. He has 7 years of teaching experience. He has 4 International publications. His interest area of research is speech processing .
By Ravi Kumar Kandagatla V. Jayachandra Naidu P. S. Sreenivasa Reddy Sivaprasad Nandyala
DOI: https://doi.org/10.5815/ijigsp.2024.05.02, Pub. Date: 8 Oct. 2024
Deep learning based speech enhancement approaches provides better perceptual quality and better intelligibility. But most of the speech enhancement methods available in literature estimates enhanced speech using processed amplitude, energy, MFCC spectrum, etc along with noisy phase. Because of difficult in estimating clean speech phase from noisy speech the noisy phase is still using in reconstruction of enhanced speech. Some methods are developed for estimating clean speech phase and it is observed that it is complex for estimation. To avoid difficulty and for better performance rather than using Discrete Fourier Transform (DFT) the Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) based convolution neural networks are proposed for better intelligibility and improved performance. However, the algorithms work either features of time domain or features of frequency domain. To have advantage of both time domain and frequency domain here the fusion of DCT and time domain approach is proposed. In this work DCT Dense Convolutional Recurrent Network (DCTDCRN), DST Convolutional Gated Recurrent Neural Network (DSTCGRU), DST Convolution Long Short term Memory (DSTCLSTM) and DST Convolutional Gated Recurrent Neural Network (DSTDCRN) are proposed for speech enhancement. These methods are providing superior performance and less processing difficulty when compared to the state of art methods. The proposed DCT based methods are used further in developing joint time and magnitude based speech enhancement method. Simulation results show superior performance than baseline methods for joint time and frequency based processing. Also results are analyzed using objective performance measures like Signal to Noise Ratio (SNR), Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI).
[...] Read more.By Ravi Kumar Kandagatla P.V. Subbaiah
DOI: https://doi.org/10.5815/ijigsp.2019.07.02, Pub. Date: 8 Jul. 2019
Super-Gaussian Based Bayesian Estimators plays significant role in noise reduction. However, the traditional Bayesian Estimators process only DFT spectral amplitude of noisy speech and the phase is left unprocessed. While deriving Bayesian estimators, consideration of phase information provides improved results. The main objective of this paper is twofold. Firstly, the Super-Gaussian based Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators are compared under different noise conditions like White noise, Babble noise, Pink noise, Modulated Pink noise, Factory noise, Car noise, Street noise, F16 noise and M109 noise. Secondly, a novel speech enhancement method is proposed by combining CUP estimators with different NMF approaches and online bases updation. The statistical estimators show less effective results under completely non-stationary assumptions. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires training and/or requires clean speech and noise signals. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches. Such approaches like Posteriori Regularized NMF (PR-NMF), Weibull Rayleigh NMF (WR-NMF), Nakagami Rayleigh (NR-NMF), CUP estimator with Gamma and Generalized Gamma distributions + NMF + Online bases Update (CUP-GG + NMF + OU) and CUP-GG + WR-NMF / NR-NMF + OU are considered for comparison. The objective of this paper is to analyze the performance of speech enhancement methods using Bayesian estimators, NMF approaches, Combination of statistical and NMF approaches. The objective performance measures Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Noise Ratio (SNR), Signal to Distortion Ratio (SDR), Segmental SNR (Seg SNR) are considered for comparison.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals