Information Technology for Gender Voice Recognition Based on Machine Learning Methods

PDF (1304KB), PP.65-87

Views: 0 Downloads: 0

Author(s)

Victoria Vysotska 1,2 Denys Shavaiev 1 Michal Gregus 3 Yuriy Ushenko 4,* Zhengbing Hu 5 Dmytro Uhryn 4

1. Department of Information Systems and Networks, Lviv Polytechnic National University, Lviv, 79013, Ukraine

2. Osnabrück University, Osnabrück, 49076, Germany

3. Faculty of Managemen, Comenius University Bratislava, Bratislava, 82005, 25, Slovakia

4. Department of Computer Science, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

5. School of Computer Science, Hubei University of Technology, Wuhan, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2024.05.05

Received: 12 Apr. 2024 / Revised: 10 Jun. 2024 / Accepted: 24 Jul. 2024 / Published: 8 Oct. 2024

Index Terms

Voice recognition, Machine learning, Gender recognition, Information technology, Cybersecurity, Authentication, Natural language processing, neural networks, Gender classification models, Voice-by-sound recognition

Abstract

The growing use of social networks and the steady popularity of online communication make the task of detecting gender from posts necessary for a variety of applications, including modern education, political research, public opinion analysis, personalized advertising, cyber security and biometric systems, marketing research, etc. This study aims to develop information technology for gender voice recognition by sound based on supervised learning using machine learning algorithms. A model, methods and means of recognition and gender classification of voice speech samples are proposed based on their acoustic properties and machine learning. In our voice gender recognition project, we used a model built based on the neural network using the TensorFlow library and Keras. The speaker’s voice was analysed for various acoustic features, such as frequency, spectral characteristics, amplitude, modulation, etc. The basic model we created is a typical neural network for text classification. It consists of the input layer, hidden layers, and the output layer. For text processing, we use a pre-trained word vector space such as Word2Vec or GloVe. We also used such techniques as dropout to prevent model overtraining, such activation functions as ReLU (Rectified Linear Unit) for non-linearity, and a softmax function in the last layer to obtain class probabilities. To train a model, we used the Adam optimizer, which is a popular gradient descent optimization method, and the “sparse categorical cross-entropy” loss function, since we are dealing with multi-class classification. After training the model, we saved it to a file for further use and evaluation of new data. The application of neural networks in our project allowed us to build a powerful model that can recognize a speaker’s gender by voice with high accuracy.  The intelligent system was trained using machine learning methods with each of the methods being analysed for accuracy: K-Nearest Neighbours (98.10%), Decision Tree (96,69%), Logistic Regression (98.11%), Random Forest (96.65%), Support Vector Machine (98.26%), neural networks (98.11%). Additional techniques such as regularization and optimization can be used to improve model performance and prevent overtraining.

Cite This Paper

Victoria Vysotska, Denys Shavaiev, Michal Greguš, Yuriy Ushenko, Zhengbing Hu, Dmytro Uhryn, "Information Technology for Gender Voice Recognition Based on Machine Learning Methods", International Journal of Modern Education and Computer Science(IJMECS), Vol.16, No.5, pp. 65-87, 2024. DOI:10.5815/ijmecs.2024.05.05

Reference

[1]Becker K (2017) Dataset, https://www.kaggle.com/datasets/primaryobjects/voicegender?resource=download
[2]Becker K (2016) Identifying the Gender of a Voice using Machine Learning, https://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/
[3]Bernstein JG, Stakhovskaya OA, Jensen KK, Goupell M J (2020) Acoustic hearing can interfere with single-sided deafness cochlear-implant speech perception. Ear and hearing 41(4):747–761. https://doi.org/10.1097/AUD.0000000000000805
[4]Bisikalo O, Boivan O, Khairova N., Kovtun O, Kovtun V (2021) Precision automated phonetic analysis of speech signals for information technology of text-dependent authentication of a person by voice. CEUR Workshop Proceedings 2853:276–288, https://ceur-ws.org/Vol-2853/paper34.pdf
[5]Brown LM (2022) Gendered artificial intelligence in libraries: Opportunities to deconstruct sexism and gender binarism. Journal of Library Administration 62(1):19–30. https://doi.org/10.1080/01930826.2021.2006979
[6]Brownlee J (2020) Support Vector Machine, https://machinelearningmastery.com/support-vector-machines-for-machine-learning/
[7]Cao YT, Daumé III H (2021) Toward gender-inclusive coreference resolution: An analysis of gender and bias throughout the machine learning lifecycle. Computational Linguistics 47(3):615–661. https://doi.org/v10.1162/coli_a_00413
[8]Chen X, Li Z, Setlur S, Xu W (2022) Exploring racial and gender disparities in voice biometrics. Scientific Reports 12(1):3723. https://doi.org/10.1038/s41598-022-06673-y 
[9]Chouchane O, Panariello M, Zari O, Kerenciler I, Chihaoui I, Todisco M, Önen M (2023) Differentially private adversarial auto-encoder to protect gender in voice biometrics. ACM Workshop on Information Hiding and Multimedia Security, 127–132. https://doi.org/10.1145/3577163.3595102
[10]Cornacchia M, Papa F, Sapio B (2020) User acceptance of voice biometrics in managing the physical access to a secure area of an international airport. Technology Analysis & Strategic Management 32(10):1236–1250. https://doi.org/10.1080/09537325.2020.1758655 
[11]DataSet (2023) https://raw.githubusercontent.com/primaryobjects/voice-gender/master/voice.csv
[12]Dumpala SH, Dikaios K, Rodriguez S, Langley R, Rempel S, Uher R, Oore S (2023) Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity. Scientific Reports 13(1):11155. https://doi.org/10.1038/s41598-023-35184-7 
[13]Extracting features/ labels from .wav file to .csv (2023) https://stackoverflow.com/questions/53301682/extracting-features-labels-from-wav-file-to-csv
[14]Fenu G, Marras M (2022) Demographic fairness in multimodal biometrics: A comparative analysis on audio-visual speaker recognition systems. Procedia Computer Science 198:249–254. https://doi.org/10.1016/j.procs.2021.12.236 
[15]Fenu G, Marras M, Medda G, Meloni G. (2021) Fair voice biometrics: Impact of demographic imbalance on group fairness in speaker recognition. Interspeech, International Speech Communication Association, 1892–1896. https://doi.org/10.21437/Interspeech.2021-1857 
[16]Franzoni V (2023) Gender Differences and Bias in Artificial Intelligence. Gender in AI and Robotics: Intelligent Systems Reference Library 235: 27–43. https://doi.org/10.1007/978-3-031-21606-0_2 
[17]Garain A (2020) Gender Recognition from Voice, https://ieee-dataport.org/documents/gender-recognition-voice
[18]Gender Recognition by Voice | 01 | Data Exploration (2023) https://matthew-nm.github.io/pages/projects/gender01_content.html
[19]Gender Recognition by Voice (2020) https://colab.research.google.com/github/shestakoff/hse_se_ml/blob/master/2020/s07-dimred/seminar7-homework.ipynb
[20]Gevin R (2022) Gender Recognition by Voice, https://rpubs.com/ReynaldiGev15/gender-recognition
[21]Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge, MA, USA
[22]Hamid HRMH, Nordin NW, Abdullah NY, Ismail WHW, Abdullah D (2024). Two Factor Authentication: Voice Biometric and Token-Based Authentication. Applied Problems Solved by Information Technology and Software 27–35. https://doi.org/10.1007/978-3-031-47727-0_4 
[23]Hanifa RM, Isa K, Mohamad S (2021) A review on speaker recognition: Technology and challenges. Computers & Electrical Engineering 90:107005. https://doi.org/10.1016/j.compeleceng.2021.107005 
[24]Hipólito I, Winkle K, Lie M (2023) Enactive artificial intelligence: subverting gender norms in human-robot interaction. Frontiers in Neurorobotics 17:1149303. https://doi.org/10.3389/fnbot.2023.1149303 
[25]IBM. Random Forest (2023) https://www.ibm.com/topics/random-forest
[26]IBM. What is a Decision Tree? (2023) https://www.ibm.com/topics/decision-trees
[27]Iloanusi ON, Mbah CC, Ejiogu U, Ezichi SI, Koburu J, Ezika IJ (2019) Gender and age group classification from multiple soft biometrics traits. International Journal of Biometrics 11(4):409–424. https://doi.org/10.1504/IJBM.2019.102883 
[28]Meena T, Sarawadekar K (2020) Gender recognition using in-built inertial sensors of smartphone. REGION 10 CONFERENCE (TENCON), 462-467. https://doi.org/10.1109/TENCON50793.2020.9293797 
[29]Jurafsky D, Martin JH (2020) Speech and Language Processing (3rd ed.), https://web.stanford.edu/~jurafsky/slp3/
[30]Kao CY, Chueh HE (2023) Voice Response Questionnaire System for Speaker Recognition Using Biometric Authentication Interface. Intelligent Automation & Soft Computing 35(1):913–924. https://doi.org/10.32604/iasc.2023.024734 
[31]Kholodna N, Vysotska V, Albota S (2021) A Machine Learning Model for Automatic Emotion Detection from Speech. CEUR Workshop Proceedings 2917:699–713, https://ceur-ws.org/Vol-2917/paper42.pdf 
[32]Kobylyukh L, Rybchak Z, Basystiuk O (2023) Analyzing the Accuracy of Speech-to-Text APIs in Transcribing the Ukrainian Language, CEUR Workshop Proceedings 3396:217–227, https://ceur-ws.org/Vol-3396/paper18.pdf 
[33]Luna JC (2022) Choosing Python or R for Data Analysis? An Infographic, https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
[34]Mewada A (2018) Gender Recognition by Voice Acoustic Parameters, https://www.kaggle.com/code/akshaymewada7/gender-recognition-by-voice-acoustic-parameters
[35]Nunes IRDAL (2021) A Conceptual Model for Gender-Inclusive Requirements (Doctoral dissertation), https://run.unl.pt/bitstream/10362/145198/1/Nunes_2021.pdf 
[36]Perron M, Dimitrijevic A, Alain C (2022) Objective and subjective hearing difficulties are associated with lower inhibitory control. Ear and Hearing 43(6):1904–1916. https://doi.org/10.1097/AUD.0000000000001227 
[37]Prodi N, Visentin C, Borella E, Mammarella IC, Di Domenico A (2019) Noise, age, and gender effects on speech intelligibility and sentence comprehension for 11-to 13-year-old children in real classrooms. Frontiers in psychology 10:2166. https://doi.org/10.3389/fpsyg.2019.02166 
[38]Rabiner L, Juang B (2006) Speech Recognition, Statistical Methods. https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/355_statistical%20speech%20recognition%20proof.pdf 
[39]Rincón C, Keyes O, Cath C (2021) Speaking from experience: Trans/non-binary requirements for voice-activated AI. Proceedings of the ACM on Human-Computer Interaction 5:1–27. https://doi.org/10.1145/3449206 
[40]Rusyn B, Chorniy A (2008) Application of wawelet-transformation in to the system of speech recognition. Modern Problems of Radio Engineering, Telecommunications and Computer Science, 345. https://ieeexplore.ieee.org/abstract/document/5423385 
[41]Sazhok M, Poltieva A, Robeiko V, Seliukh R, Fedoryn D (2021) Punctuation Restoration for Ukrainian Broadcast Speech Recognition System based on Bidirectional Recurrent Neural Network and Word Embeddings. CEUR Workshop Proceedings 2870:300–310, https://ceur-ws.org/Vol-2870/paper25.pdf 
[42]Shakhovska N, Basystiuk O, Shakhovska K (2019) Development of the Speech-to-Text Chatbot Interface Based on Google API. CEUR Workshop Proceedings 2386:212-221, https://ceur-ws.org/Vol-2386/paper16.pdf 
[43]Markowitz JA (2000) Voice biometrics. Communications of the ACM 43.9:66-73. https://dl.acm.org/doi/pdf/10.1145/348941.348995   
[44]Si S, Li Z, Xu W (2021) Exploring demographic effects on speaker verification. In 2021 IEEE Conference on Communications and Network Security (CNS) (pp. 1–2. https://doi.org/10.1109/CNS53000.2021.9729038 
[45]spafe.features.spfeats (2019) https://spafe.readthedocs.io/en/latest/features/spfeats.html
[46]SpeechRecognition (2022) SpeechRecognition 3.10.1, https://pypi.org/project/SpeechRecognition/
[47]Srivastava T (2018) K-Nearest Neighbors, https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
[48]Sruthi ER (2021) Random Forest, https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/
[49]Swaminathan S (2018) Logistic Regression, https://towardsdatascience.com/logistic-regressiondetailed-overview-46c4da4303bc
[50]Szlavi A, Guedes LS (2023) Gender Inclusive Design in Technology: Case Studies and Guidelines. Lecture Notes in Computer Science 14030:343–354. https://doi.org/10.1007/978-3-031-35699-5_25 
[51]Tarres Puertas MI, Merino Millo J, Dorado Castaño AD (2021) Quí-Bot-H2O challenge: Integration of computational thinking with chemical experimentation and robotics through a web-based platform for early ages including gender, inclusive and diversity patterns. International Conference of Education, Research and Innovation, 8361–8365. https://doi.org/10.21125/iceri.2021
[52]Tejale SS, Kute TB (2020) Performance Evaluation of Algorithms for Gender Classification, https://www.ijeast.com/papers/568-573,Tesma411,IJEAST.pdf
[53]Trysnyuk V, Nagornyi Y, Smetanin K, Humeniuk I, Uvarova T (2020) A method for user authenticating to critical infrastructure objects based on voice message identification. Advanced Information Systems 4(3):11–16. https://doi.org/10.20998/2522-9052.2020.3.02 
[54]Tymoshenko K, Vysotska V, Kovtun O, Holoshchuk R, Holoshchuk S (2021) Real-time Ukrainian text recognition and voicing. CEUR Workshop Proceedings 2870:357–387, https://ceur-ws.org/Vol-2870/paper27.pdf 
[55]Vaidya J, Gujar S, Devani H., Makhija R, Naik D (2017) Voice Recognition, https://prezi.com/36iofn4i71oy/voice-recognition/
[56]Verma R (2018) Detailed parameter analysis and tuning, https://www.kaggle.com/code/deadskull7/detailed-parameter-analysis-and-tuning-97-004 
[57]Voice Gender (2021), https://github.com/primaryobjects/voice-gender/blob/master/readme.md
[58]Wu E, Li Z, Xu W (2021) Voice Doppelgänger Susceptibility among Racial and Gender Groups: IEEE CNS 21 Poster. Communications and Network Security, 1–2. https://doi.org/10.1109/CNS53000.2021.9729035 
[59]Yalova K, Babenko M, Yashyna K (2023) Automatic Speech Recognition System with Dynamic Time Warping and Mel-Frequency Cepstral Coefficients. CEUR Workshop Proceedings 3396:141-151, https://ceur-ws.org/Vol-3396/paper11.pdf 
[60]Zäske, R, Skuk VG, Golle J, Schweinberger S R (2020). The Jena Speaker Set (JESS) –A database of voice stimuli from unfamiliar young and old adult speakers. Behavior research methods 52:990–1007. https://doi.org/10.3758/s13428-019-01296-0 
[61]Abbas, S., Alsubai, S., Sampedro, G.A., Abisado M., Almadhor A. S., Kryvinska, N., Zaidi, M.M. (2023) Active Learning for News Article's Authorship Identification. IEEE Access 11, 98415–98426. https://doi.org/10.1109/ACCESS.2023.3310813
[62]Abbasi, A., Javed, A.R., Iqbal, F., Jalil Z., Gadekallu, T.R., Kryvinska, N. (2022) Authorship identification using ensemble learning. Scientific Reports 12(1), 9537. https://doi.org/10.1038/s41598-022-13690-4 
[63]Teslyuk V, Tsmots I, Kryvinska N, Teslyuk T, Opotyak Y, Seneta M, Sydorenko R. (2023). Neuro-controller implementation for the embedded control system for mini-greenhouse. PeerJ Computer Science 9:e1680 https://doi.org/10.7717/peerj-cs.1680
[64]Vysotska, V., Chyrun, L., Chyrun, S., & Soltys, M. (2024). Information technology for textual content author's gender and age determination based on machine learning. CEUR Workshop Proceedings 3723: 498-540, https://ceur-ws.org/Vol-3723/paper27.pdf.