Automatic Spoken Language Recognition with Neural Networks

Full Text (PDF, 534KB), PP.11-17

Views: 0 Downloads: 0

Author(s)

Valentin Gazeau 1,* Cihan Varol 1

1. Department of Computer Science at Sam Houston State University, Huntsville, TX, USA

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2018.08.02

Received: 22 Jun. 2018 / Revised: 4 Jul. 2018 / Accepted: 15 Jul. 2018 / Published: 8 Aug. 2018

Index Terms

Hidden Markov Model, Language Identification, Language Translation, Neural Networks, Support Vector Machine

Abstract

Translation has become very important in our generation as people with completely different cultures and languages are networked together through the Internet. Nowadays one can easily communicate with anyone in the world with the services of Google Translate and/or other translation applications. Humans can already recognize languages that they have priory been exposed to. Even though they might not be able to translate, they can have a good idea of what the spoken language is. This paper demonstrates how different Neural Network models can be trained to recognize different languages such as French, English, Spanish, and German. For the training dataset voice samples were choosed from Shtooka, VoxForge, and Youtube. For testing purposes, not only data from these websites, but also personally recorded voices were used. At the end, this research provides the accuracy and confidence level of multiple Neural Network architectures, Support Vector Machine and Hidden Markov Model, with the Hidden Markov Model yielding the best results reaching almost 70 percent accuracy for all languages.

Cite This Paper

Valentin Gazeau, Cihan Varol, "Automatic Spoken Language Recognition with Neural Networks", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.8, pp.11-17, 2018. DOI:10.5815/ijitcs.2018.08.02

Reference

[1]I. Lopez-Moreno et al, “Automatic Language Identification Using Deep Neural Networks,” 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). Florence, Italy. 2014, 4(9).

[2]D. Cazzani, “Audio Processing in TensorFlow,” Retrieved from: https://towardsdatascience.com/audio-processing-in-tensorflow-208f1a4103aa. 2017, 6(30).

[3]TensorFlow, Retrieved from:  https://www.tensorflow.org/. 2018, 5(12).

[4]D. Pawade, A. Sakhapara, M. Jain, N. Jain and K. Gada, "Story Scrambler – Automatic Text Generation Using Word Level RNN-LSTM", International Journal of Information Technology and Computer Science (IJITCS), Vol.10, No.6, pp.44-53, 2018. DOI: 10.5815/ijitcs.2018.06.05

[5]M. Sreeshakthy, J. Preethi and A. Dhilipan, "A Survey on Emotion Classification From Eeg Signal Using Various Techniques and Performance Analysis", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.12, pp.19-26, 2016. DOI: 10.5815/ijitcs.2016.12.03

[6]B. M. L. Srivastava, H. K. Vydana A. K. Vuppala and M. Shrivastava, “A Language Model Based Approach Towards Large Scale and LightWeight Language Identification Systems,” arXiv preprint arXiv:1510.03602. 2015, 10.

[7]G. Montavon, “Deep Learning for Spoken Language Identification,” NIPS Workshop on Deep Learning for Speech Recognition and Related Applications. 2009

[8]Y. Lei, L. Ferrer, A. Lawson, M. McLaren and N. Scheffer, “Application of Convolutional Neural Networks to Language Identification in Noisy Conditions,” The Speaker and Language Recognition Workshop. Joennsu, Finland. 2014, 6(16-19), pp. 287-292.

[9]H. Lee, Y. Largman, P. Pham and A. Y. Ng, “Unsupervised Feature Learning for Audio Classification using Convolutional Deep Belief Networks,” NIPS'09 Proceedings of the 22nd International Conference on Neural Information Processing Systems Pages. Vancouver, British Columbia, Canada. 2009, 11 (7-10), pp. 1096-1104.

[10]A. Graves, A. Mohamed and G. Hinton, “Speech Recognition with Deep Recurrent Neural Networks,” 2013 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Vancouver, British Columbia, Canada. 2013, 5 (26-31).

[11]A. S. House and E. P. Neuberg, “Toward Automatic Identification of Language of an Utterance. I. Preliminary Methodological Considerations,” Journal of the Acoustical Society of America, 1977, (62,3,), pp. 708-713.

[12]J. T. Foil, “Language identification using noisy speech,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1986, pp. 861–864.

[13]Y. Song et al, “Deep bottleneck network based i-vector representation for language identification,” INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association. Dresden, Germany. 2015, 9 (6-10), pp. 398-402.

[14]S. Jothilakshmi, V. Ramalingam and S. Palanivel, “A Hierarchical Language Identification System for Indian Languages,” Digital Signal Processing, 2012. 22(3), pp. 544-553 

[15]V. Wan and W. M. Campbell, “Support vector machines for verification and identification, in: Neural Networks for Signal Processing X,” Proceedings of the 2000 IEEE Signal Processing Workshop, 2000, pp. 775–784.

[16]A. Ganapathiraju and J. Picone, “Hybrid SVM/HMM architectures for speech recognition,” Speech Transcription Workshop. 2000.

[17]J. C. Platt, “Probabilities for SV machines” A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (Eds.), Advances in Large Margin Classifiers, MIT Press, Cambridge, MA 2000, pp. 61-74

[18]J. Kharroubi, D. Petrovska-Delacretaz and G. Chollet, “Combining GMMs with support vector machines for text-independent speaker verification,” Eurospeech. 2001, pp. 1757–1760 

[19]T. S. Jaakkola and D. Haussler, “Exploiting generative models in discriminative classifiers” M.S. Kearns, S.A. Solla, D.A. Cohn (Eds.), Advances in Neural Information Processing, vol. 11, MIT Press, Cambridge, MA 1998, pp. 487-493

[20]Y. Kumar and N. Singh, "Automatic Spontaneous Speech Recognition for Punjabi Language Interview Speech Corpus", International Journal of Education and Management Engineering (IJEME), Vol.6, No.6, pp.64-73, 2016.DOI: 10.5815/ijeme.2016.06.07

[21]J. Burton, “The Most Spoken Languages in America,” Retrieved from: https://www.worldatlas.com/articles/the-most-spoken-languages-in-america.html. 2017, 4(25).

[22]J. Weston, “Support Vector Machine Tutorial,” Retrieved from: ftp://ftp.umiacs.umd.edu/pub/chenxi/Project%20FTP/Finder_FTP/svmlib/jason_svm_tutorial.pdf . 2014

[23]P. Thamilselvana and J. G. R. Sathiaseelan, “A Comparative Study of Data Mining Algorithms for Image Classification,” International Journal of Education and Management Engineering (IJEME), Vol.2, No.9, pp.1-9, 2015.DOI: 10.5815/ijeme.2015.02.01

[24]K. Tumilaar, Y. Langi and A. Rindengan, “Hidden Markov Model,” De Cartesian, 2015, 4(1)

[25]Pyaudioanalysis 0.1.3. “Python Package Index,” Retrieved from: https://pypi.python.org/pypi/pyAudioAnalysis/. 2018.

[26]Voxforge Free Speech Recognition (Linux, Windows and Mac). Retrieved from: http://www.voxforge.org/. 2018.

[27]Youtube. Retrieved from:  https://www.youtube.com/. 2018.