Trisiladevi C Nagavi

Work place: S. J. College of Engineering, JSS Science and Technology University, Mysore, Karnataka, India

E-mail: trisiladevi@sjce.ac.in

Website:

Research Interests: Signal Processing, Image and Sound Processing, Image Processing

Biography

Dr. Trisiladevi C. Nagavi is an assistant professor in the Department of Computer Science & Engineering, Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru. She graduated from Karnataka University, Dharwad. She obtained her Master’s and Doctoral degree from Visvesvaraya Technological University, Belgaum. Also, she has secured “SECOND RANK” in M.Tech Software Engineering from VTU Belgaum. She has expertise in the area of Audio, Music, Speech and Image Signal Processing, Information Retrieval and Machine Learning. Her research outcome resulted in an Android application to play favorite tunes. It is designed to retrieve songs by listening to user hum based on music melody representationworked on Real Time Text to Speech Conversion and Translation System in Collaboration with All India Institute of Speech and Hearing (AIISH) Mysuru and IEEE Standards Association. Dr. Nagavi has published research papers at national and international journals, conference proceedings as well as chapters of books. models. Dr. Nagavi is an active member of IEEE India Special Interest Group on Communications Disability and 

Author Articles
Speaker Diarization Using Bi-LSTM and Spectral Clustering

By Trisiladevi C Nagavi Samanvitha Sateesha Shreya Sudhanva Sukirth Shivakumar Vibha Hullur

DOI: https://doi.org/10.5815/ijem.2024.03.03, Pub. Date: 8 Jun. 2024

Speaker diarization is the ability to compare, recognize, comprehend and segregate different sound waves on the basis of the identity of the speaker. This work aims to accomplish this process by segmenting, embedding and clustering the extracted features from the speech sample. In this work, Mel-Frequency Cepstral Coefficients (MFCC) are extracted and fed into Bi-Directional Long Short-Term Memory (Bi-LSTM) model for segmentation. Then d- vectors are extracted using pre-trained models from pyannote libraries. Spectral Clustering is used to group and segregate the audio of one speaker from another. The experimentation is carried out on two speaker speech audio files and the results indicate that the diarization is successful. The diarization error rate of 9.4% for a 2-speaker audio file is the lowest DER achieved for the given data set. This indicates the efficiency of the system and also justifies the combination of methods chosen at each step. By considering such exciting technical trends, we believe the work presented in the paper represents a valuable contribution for the community by providing the recent developments using Bi-LSTM and spectral clustering methods, which enables the future development towards speaker diarization.

[...] Read more.
Other Articles