Information Technology for Generating Lyrics for Song Extensions Based on Transformers

Full Text (PDF, 1423KB), PP.23-36

Views: 0 Downloads: 0

Author(s)

Oleksandr Mediakov 1 Victoria Vysotska 1,2 Dmytro Uhryn 3 Yuriy Ushenko 3,* Cennuo Hu 4

1. Lviv Polytechnic National University, Lviv, 79013, Ukraine

2. Osnabrück University, Osnabrück, 49076, Germany

3. Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

4. Department of Computer Science, College of Science, Purdue University, West Lafayette, IN 47907, USA

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2024.01.03

Received: 23 Oct. 2023 / Revised: 12 Nov. 2023 / Accepted: 16 Dec. 2023 / Published: 8 Feb. 2024

Index Terms

Transformers, T5 language model, recurrent networks, text generation, author's style

Abstract

The article develops technology for generating song lyrics extensions using large language models, in particular the T5 model, to speed up, supplement, and increase the flexibility of the process of writing lyrics to songs with/without taking into account the style of a particular author. To create the data, 10 different artists were selected, and then their lyrics were selected. A total of 626 unique songs were obtained. After splitting each song into several pairs of input-output tapes, 1874 training instances and 465 test instances were obtained. Two language models, NSA and SA, were retrained for the task of generating song lyrics. For both models, t5-base was chosen as the base model. This version of T5 contains 223 million parameters. The analysis of the original data showed that the NSA model has less degraded results, and for the SA model, it is necessary to balance the amount of text for each author. Several text metrics such as BLEU, RougeL, and RougeN were calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most diverse, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability and a smaller range of values. In total, for comparison, we used 8 different decoding methods for text generation supported by the transformers library, including Greedy search, Beam search, Diverse beam search, Multinomial sampling, Beam-search multinomial sampling, Top-k sampling, Top-p sampling, and Contrastive search. All the results of the lyrics comparison show that the best method for generating lyrics is beam search and its variations, including ray sampling. The contrastive search usually outperformed the usual greedy approach. The top-p and top-k methods do not have a clear advantage over each other, and in different situations, they produced different results.

Cite This Paper

Oleksandr Mediakov , Victoria Vysotska, Dmytro Uhryn, Yuriy Ushenko, Cennuo Hu, "Information Technology for Generating Lyrics for Song Extensions Based on Transformers", International Journal of Modern Education and Computer Science(IJMECS), Vol.16, No.1, pp. 23-36, 2024. DOI:10.5815/ijmecs.2024.01.03

Reference

[1]Ashish Vaswani and others. Attention Is All You Need. URL: https://arxiv.org/abs/1706.03762
[2]Colin Raffel and others. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. URL: https://arxiv.org/abs/1910.10683
[3]Text generation strategies. URL: https://huggingface.co/docs/transformers/v4.29.0/en/generation_strategies
[4]How to generate text: using different decoding methods for language generation with Transformers. URL: https://huggingface.co/blog/how-to-generate
[5]T5. Overview. URL: https://huggingface.co/docs/transformers/model_doc/t5
[6]Ashwin K Vijayakumar and others. Diverse Beam Search: Decoding Diverse Solutions From Neural Sequence Models. URL: https://arxiv.org/abs/1610.02424
[7]T5v1.1. URL: https://huggingface.co/docs/transformers/model_doc/t5v1.1
[8]Lacoste A., Luccioni A., Schmidt V., Dandres T. Quantifying the Carbon Emissions of Machine Learning. URL: https://arxiv.org/abs/1910.09700
[9]Google’s SentencePiece. URL: https://github.com/google/sentencepiece
[10]Subword Neural Machine Translation. URL: https://github.com/rsennrich/subword-nmt
[11]Yonghui Wu and others. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. URL: https://arxiv.org/pdf/1609.08144.pdf
[12]Transformers. State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX. URL: https://huggingface.co/docs/transformers/index
[13]Martín Abadi and others. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. URL: https://www.tensorflow.org/
[14]Song Lyrics Dataset. URL: https://www.kaggle.com/datasets/deepshah16/songlyrics-dataset
[15]Swift T., Desner A. “long story short”. Taylor Swift Music, 2020.
[16]Fan A., Lewis M., Dauphin Y. Hierarchical Neural Story Generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia. Stroudsburg, PA, USA, 2018. URL: https://doi.org/10.18653/v1/p18-1082
[17]Chiang T.-R., Chen Y.-N. Relating Neural Text Degeneration to Exposure Bias. Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Punta Cana, Dominican Republic. Stroudsburg, PA, USA, 2021. URL: https://doi.org/10.18653/v1/2021.blackboxnlp-1.16
[18]Yixuan Su and others. A Contrastive Framework for Neural Text Generation. URL: https://arxiv.org/abs/2202.06417
[19]Ihor Tereikovskyi, Zhengbing Hu, Denys Chernyshev, Liudmyla Tereikovska, Oleksandr Korystin, Oleh Tereikovskyi, "The Method of Semantic Image Segmentation Using Neural Networks", International Journal of Image, Graphics and Signal Processing, Vol.14, No.6, pp. 1-14, 2022.
[20]Chandra Shekhar Tiwari, Vijay Kumar Jha, "Enhancing Security of Medical Image Data in the Cloud Using Machine Learning Technique", International Journal of Image, Graphics and Signal Processing, Vol.14, No.4, pp. 13-31, 2022.
[21]Ramesh M. Kagalkar, "Methodology for Translation of Video Content Activates into Text Description: Three Object Activities Action", International Journal of Image, Graphics and Signal Processing, Vol.14, No.4, pp. 58-69, 2022.