Comparative Analysis of Three Improved Deep Learning Architectures for Music Genre Classification

Full Text (PDF, 680KB), PP.1-14

Views: 0 Downloads: 0

Author(s)

Quazi Ghulam Rafi 1,* Mohammed Noman 1 Sadia Zahin Prodhan 1 Sabrina Alam 1 Dip Nandi 1

1. American International University-Bangladesh, Dhaka, 1229, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2021.02.01

Received: 13 Sep. 2020 / Revised: 17 Oct. 2020 / Accepted: 5 Nov. 2020 / Published: 8 Apr. 2021

Index Terms

Music information retrieval, music genre classification, deep learning, Convolutional Neural Network, Recurrent Neural Network, Convolutional - Recurrent Neural Network

Abstract

Among the many music information retrieval (MIR) tasks, music genre classification is noteworthy. The categorization of music into different groups that came to existence through a complex interplay of cultures, musicians, and various market forces to characterize similarities between compositions and organize collections is known as a music genre. The past researchers extracted various hand-crafted features and developed classifiers based on them. But the major drawback of this approach was the requirement of field expertise. However, in recent times researchers, because of the remarkable classification accuracy of deep learning models, have used similar models for MIR tasks. Convolutional Neural Net- work (CNN), Recurrent Neural Network (RNN), and the hybrid model, Convolutional - Recurrent Neural Network (CRNN), are such prominently used deep learning models for music genre classification along with other MIR tasks and various architectures of these models have achieved state-of-the-art results. In this study, we review and discuss three such architectures of deep learning models, already used for music genre classification of music tracks of length of 29-30 seconds. In particular, we analyze improved CNN, RNN, and CRNN architectures named Bottom-up Broadcast Neural Network (BBNN) [1], Independent Recurrent Neural Network (IndRNN) [2] and CRNN in Time and Frequency dimensions (CRNN- TF) [3] respectively, almost all of the architectures achieved the highest classification accuracy among the variants of their base deep learning model. Hence, this study holds a comparative analysis of the three most impressive architectural variants of the main deep learning models that are prominently used to classify music genre and presents the three architecture, hence the models (CNN, RNN, and CRNN) in one study. We also propose two ways that can improve the performances of the RNN (IndRNN) and CRNN (CRNN-TF) architectures.

Cite This Paper

Quazi Ghulam Rafi, Mohammed Noman, Sadia Zahin Prodhan, Sabrina Alam, Dip Nandi, "Comparative Analysis of Three Improved Deep Learning Architectures for Music Genre Classification", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.2, pp.1-14, 2021. DOI:10.5815/ijitcs.2021.02.01

Reference

[1]Caifeng Liu, Lin Feng, Guochao Liu, Huibing Wang, and Shenglan Liu. Bottom-up broadcast neural network for music genre classification, 2019.
[2]W. Wu, F. Han, G. Song, and Z. Wang. Music genre classification using independent recurrent neural network. In 2018 Chinese Automation Congress (CAC), pages 192–195, 2018.
[3]Z. Wang, S. Muknahallipatna, M. Fan, A. Okray, and C. Lan. Music classification using an improved crnn with multi-directional spatial dependencies in both time and frequency dimensions. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2019.
[4]M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE, 96(4):668–696, 2008.
[5]P. Mermelstein. Distance measures for speech recognition - psychological and instrumental. Pattern Recognition and Artificial Intelligence (C. H. Chen, ed.), pages 374–388, 1976.
[6]T. Ojala, M. Pietikainen, and D. Harwood. Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In Proceedings of 12th International Conference on Pattern Recognition, volume 1, pages 582–585 vol.1, 1994.
[7]G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5):293–302, 2002.
[8]Y. M. G. Costa, L. S. Oliveira, A. L. Koericb, and F. Gouyon. Music genre recognition using spectrograms. In 2011 18th International Conference on Systems, Signals and Image Processing, pages 1–4, 2011.
[9]Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. Recurrent models of visual attention. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2204–2212. Curran Associates, Inc., 2014.
[10]Jimmy Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. CoRR, abs/1412.7755, 2015.
[11]Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
[12]Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 2048–2057. JMLR.org, 2015.
[13]Nianyin Zeng, Hong Zhang, Baoye Song, Weibo Liu, Yurong Li, and Abdullah M. Dobaie. Facial expression recognition via learning deep sparse autoencoders. Neurocomput., 273(C):643–649, January 2018.
[14]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[15]P. Chiliguano and G. Fazekas. Hybrid music recommender using content-based and social information. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2618– 2622, 2016.
[16]Keunwoo Choi, György Fazekas, and Mark B. Sandler. Automatic tagging using deep convolutional neural networks. CoRR, abs/1606.00298, 2016.
[17]Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing coadaptation of feature detectors. CoRR, abs/1207.0580, 2012.
[18]Tom L. H. Li, Antoni B. Chan, and Andy H. W. Chun. Automatic musical pattern feature extraction using convolutional neural network. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010, pages 546–550, 2010. International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010 ; Conference date: 17-03-2010 Through 19-03-2010.
[19]Thomas Lidy. Parallel convolutional neural networks for music genre and mood classification. 2016.
[20]Christine Senac, Thomas Pellegrini, Florian Mouret, and Julien Pinquier. Music feature maps with convolutional neural networks for music genre classification. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, CBMI ’17, New York, NY, USA, 2017. Association for Computing Machinery.
[21]Hareesh Bahuleyan. Music genre classification using machine learning techniques. CoRR, abs/1804.01149, 2018.
[22]Hansi Yang and Wei-Qiang Zhang. Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks. In Proc. Interspeech 2019, pages 3382–3386, 2019.
[23]K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232, 2017.
[24]Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, abs/1406.1078, 2014.
[25]Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural Computation, 9(8):1735–1780, 1997.
[26]Scott Zhang, Huaping Gu, and Rongbin Li. Music genre classification: Near-realtime vs sequential approach. 2019.
[27]Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, and Wenju Liu. Long short-term memory recurrent neural network based segment features for music genre classification. In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pages 1–5. IEEE, 2016.
[28]Jan Jakubik. Evaluation of gated recurrent neural networks in music classification tasks. In International Conference on Information Systems Architecture and Technology, pages 27–37. Springer, 2017.
[29]Duyu Tang, Bing Qin, and Ting Liu. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1422–1432, 2015.
[30]Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. Convolutional recurrent neural networks for music classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2392–2396. IEEE, 2017.
[31]Sharaj Panwar, Arun Das, Mehdi Roopaei, and Paul Rad. A deep learning approach for mapping music genres. In 2017 12th System of Systems Engineering Conference (SoSE), pages 1–5. IEEE, 2017.
[32]Nicolas Scaringella, Giorgio Zoia, and Daniel Mlynek. Automatic genre classification of music content: a survey. IEEE Signal Processing Magazine, 23(2):133– 141, 2006.
[33]Débora C Corrêa and Francisco Ap Rodrigues. A survey on symbolic data-based music genre classification. Expert Systems with Applications, 60:190–210, 2016.
[34]Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. IEEE transactions on multimedia, 13(2):303–319, 2010.
[35]Snigdha Chillara, AS Kavitha, Shwetha A Neginhal, Shreya Haldia, and KS Vidyullatha. Music genre classification using machine learning algorithms: A comparison. 2019.
[36]D Pradeep Kumar, BJ Sowmya, KG Srinivasa, et al. A comparative study of classifiers for music genre classification based on feature extractors. In 2016 IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), pages 190–194. IEEE, 2016.
[37]George Tzanetakis and Perry Cook. Musical genre classification of audio signals. IEEE Transactions on speech and audio processing, 10(5):293–302, 2002.
[38]Pedro Cano, Emilia Gómez Gutiérrez, Fabien Gouyon, Herrera Boyer, Markus Koppenberger, Bee Suan Ong, Xavier Serra, Sebastian Streich, Nicolas Wack, et al. Ismir 2004 audio description contest. 2006.
[39]Ugo Marchand and Geoffroy Peeters. The extended ballroom dataset. 2016.
[40]Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, pages 18–25, 2015.
[41]Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[42]Joakim Andén and Stéphane Mallat. Deep scattering spectrum. IEEE Transactions on Signal Processing, 62(16):4114–4128, 2014.
[43]Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5457–5466, 2018.
[44]Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. Grid long short-term memory. arXiv preprint arXiv:1507.01526, 2015.
[45]Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, and Xavier Serra. Timbre analysis of music audio signals with convolutional neural networks. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 2744–2748. IEEE, 2017.
[46]Sander Dieleman and Benjamin Schrauwen. End-toend learning for music audio. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6964–6968. IEEE, 2014.
[47]Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. Fma: A dataset for music analysis. arXiv preprint arXiv:1612.01840, 2016.
[48]Athanasios Lykartsis and Alexander Lerch. Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. 10.14279/depositonce-9530, 2015.