Work place: S. J. College of Engineering, JSS Science and Technology University, Mysore, Karnataka, India
E-mail: sukirth.shivakumar304@gmail.com
Website:
Research Interests:
Biography
Sukirth Shivakumar is a graduate student pursuing his masters in Computer Science at Northeastern University, Boston. He graduated from Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru in 2022. He is interested in game design, virtual and augmented reality and application development. He has worked as a team lead at STEP-SJCE for a mobile application project. He has also worked as a developer in the field of virtual and augmented reality at Celestasi Technologies. His recent work was an internship at Hewlett Packard Enterprise where he was part of the UI development team of Networking product. An extrovert by heart is always good with working in teams. Mr. Sukirth was the anchor for the colleges annual fest JAYCIANA. He is also known to be adventurous and often likes to go on treks with his friends. He had been a part of Sahas, the adventure club of JSS Science and Technology University since 2018.
By Trisiladevi C Nagavi Samanvitha Sateesha Shreya Sudhanva Sukirth Shivakumar Vibha Hullur
DOI: https://doi.org/10.5815/ijem.2024.03.03, Pub. Date: 8 Jun. 2024
Speaker diarization is the ability to compare, recognize, comprehend and segregate different sound waves on the basis of the identity of the speaker. This work aims to accomplish this process by segmenting, embedding and clustering the extracted features from the speech sample. In this work, Mel-Frequency Cepstral Coefficients (MFCC) are extracted and fed into Bi-Directional Long Short-Term Memory (Bi-LSTM) model for segmentation. Then d- vectors are extracted using pre-trained models from pyannote libraries. Spectral Clustering is used to group and segregate the audio of one speaker from another. The experimentation is carried out on two speaker speech audio files and the results indicate that the diarization is successful. The diarization error rate of 9.4% for a 2-speaker audio file is the lowest DER achieved for the given data set. This indicates the efficiency of the system and also justifies the combination of methods chosen at each step. By considering such exciting technical trends, we believe the work presented in the paper represents a valuable contribution for the community by providing the recent developments using Bi-LSTM and spectral clustering methods, which enables the future development towards speaker diarization.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals