Vibha Hullur

Work place: S. J. College of Engineering, JSS Science and Technology University, Mysore, Karnataka, India

E-mail: hullurvibha@gmail.com

Website:

Research Interests:

Biography

Vibha Hullur is a software engineer at Aris Global. She graduated from Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru in 2022. She has experience in the areas of Machine Learning and Web development. During undergrad, she worked on various projects like Automated Telephone Customer system, Twitter sentiment analysis, Digit recognition, and Driver drowsiness detection etc., Her recent work was an internship at SJCE-STEP where she was part of the front-end development team and built a website named “JEEVAN BINDU”. She likes taking up new ventures and building projects that she hopes will make a difference in the world, tackling one problem at a time. Ms. Vibha had been a part of Make A Difference (MAD) and worked as a Fundraiser contributing to society. She has also served as a Creative Lead at Linux Campus Club, Sri Jayachamarajendra College of Engineering, Mysuru for the year 2021-22.

Author Articles
Speaker Diarization Using Bi-LSTM and Spectral Clustering

By Trisiladevi C Nagavi Samanvitha Sateesha Shreya Sudhanva Sukirth Shivakumar Vibha Hullur

DOI: https://doi.org/10.5815/ijem.2024.03.03, Pub. Date: 8 Jun. 2024

Speaker diarization is the ability to compare, recognize, comprehend and segregate different sound waves on the basis of the identity of the speaker. This work aims to accomplish this process by segmenting, embedding and clustering the extracted features from the speech sample. In this work, Mel-Frequency Cepstral Coefficients (MFCC) are extracted and fed into Bi-Directional Long Short-Term Memory (Bi-LSTM) model for segmentation. Then d- vectors are extracted using pre-trained models from pyannote libraries. Spectral Clustering is used to group and segregate the audio of one speaker from another. The experimentation is carried out on two speaker speech audio files and the results indicate that the diarization is successful. The diarization error rate of 9.4% for a 2-speaker audio file is the lowest DER achieved for the given data set. This indicates the efficiency of the system and also justifies the combination of methods chosen at each step. By considering such exciting technical trends, we believe the work presented in the paper represents a valuable contribution for the community by providing the recent developments using Bi-LSTM and spectral clustering methods, which enables the future development towards speaker diarization.

[...] Read more.
Other Articles