Samanvitha Sateesha

Work place: S. J. College of Engineering, JSS Science and Technology University, Mysore, Karnataka, India

E-mail: samanvitha.sateesha@gmail.com

Website:

Research Interests:

Biography

Samanvitha Sateesha is a graduate student pursuing her masters in Computer Science at the University of California, San Diego. She graduated from Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru in 2022. During her undergrad, she initially worked on full-stack development and designed websites for various clubs in her college. She also assisted a team to revamp a website during her first internship and managed a software development project. Her primary research is in areas related to artificial intelligence and machine learning. She has worked on text, image and speech data. Her major projects include identifying and classifying depression using textual data and MRI images, extracting information from resumes and matching them with a job description, etc. Ms. Samanvitha has co-authored a paper titled “Naive Bayes Classifier for depression detection using text data” at ICEECCOT-2021. She has interned at Blueyonder,
Reap Benefit and Sitex Digitech. She aims to leverage technology to solve problems in society. She believes that true learning happens when one steps out of comfort zone and is willing to embrace the unknown. Ms. Samanvitha has also served as a student coordinator and SPOC of the Institution’s Innovation Council, JSSSTU.

Author Articles
Speaker Diarization Using Bi-LSTM and Spectral Clustering

By Trisiladevi C Nagavi Samanvitha Sateesha Shreya Sudhanva Sukirth Shivakumar Vibha Hullur

DOI: https://doi.org/10.5815/ijem.2024.03.03, Pub. Date: 8 Jun. 2024

Speaker diarization is the ability to compare, recognize, comprehend and segregate different sound waves on the basis of the identity of the speaker. This work aims to accomplish this process by segmenting, embedding and clustering the extracted features from the speech sample. In this work, Mel-Frequency Cepstral Coefficients (MFCC) are extracted and fed into Bi-Directional Long Short-Term Memory (Bi-LSTM) model for segmentation. Then d- vectors are extracted using pre-trained models from pyannote libraries. Spectral Clustering is used to group and segregate the audio of one speaker from another. The experimentation is carried out on two speaker speech audio files and the results indicate that the diarization is successful. The diarization error rate of 9.4% for a 2-speaker audio file is the lowest DER achieved for the given data set. This indicates the efficiency of the system and also justifies the combination of methods chosen at each step. By considering such exciting technical trends, we believe the work presented in the paper represents a valuable contribution for the community by providing the recent developments using Bi-LSTM and spectral clustering methods, which enables the future development towards speaker diarization.

[...] Read more.
Other Articles