Shreya Sudhanva

Work place: S. J. College of Engineering, JSS Science and Technology University, Mysore, Karnataka, India

E-mail: shreya.sudhanva@gmail.com

Website:

Research Interests: Computer Vision

Biography

Shreya Sudhanva is a graduate student pursuing her masters in Computer Science at Northeastern University, Boston. She graduated from Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru in 2022. She has interned as a developer at companies like Toshiba Software India Pvt Ltd, Reap Benefit (NGO), Mysuru Consulting Group and ExcelSoft Technologies. She has experience in the areas of machine learning and artificial intelligence. She has worked on natural language processing and computer vision related projects like depression detection using textual social media data as well as MRI images, object detection using YOLOv5, twitter sentiment analysis, automatic resume filtering and matching system etc. She has also co-authored a paper based on her findings in the depression detection project at ICEECCOT-2021. Ms. Shreya likes taking up new ventures and building projects that she hopes will make a difference in the world, tackling one problem at a time. She has also served as the secretary at Linux Campus Club, Sri Jayachamarajendra College of Engineering, Mysuru for the year 2021-22. Furthermore, she had been elected as one of the placement secretaries of her class which allowed her to be an interface between her peers and the companies. She enjoys taking up responsibility and keeping herself busy.

Author Articles
Speaker Diarization Using Bi-LSTM and Spectral Clustering

By Trisiladevi C Nagavi Samanvitha Sateesha Shreya Sudhanva Sukirth Shivakumar Vibha Hullur

DOI: https://doi.org/10.5815/ijem.2024.03.03, Pub. Date: 8 Jun. 2024

Speaker diarization is the ability to compare, recognize, comprehend and segregate different sound waves on the basis of the identity of the speaker. This work aims to accomplish this process by segmenting, embedding and clustering the extracted features from the speech sample. In this work, Mel-Frequency Cepstral Coefficients (MFCC) are extracted and fed into Bi-Directional Long Short-Term Memory (Bi-LSTM) model for segmentation. Then d- vectors are extracted using pre-trained models from pyannote libraries. Spectral Clustering is used to group and segregate the audio of one speaker from another. The experimentation is carried out on two speaker speech audio files and the results indicate that the diarization is successful. The diarization error rate of 9.4% for a 2-speaker audio file is the lowest DER achieved for the given data set. This indicates the efficiency of the system and also justifies the combination of methods chosen at each step. By considering such exciting technical trends, we believe the work presented in the paper represents a valuable contribution for the community by providing the recent developments using Bi-LSTM and spectral clustering methods, which enables the future development towards speaker diarization.

[...] Read more.
Other Articles