Text to Speech Synthesis for Bangla Language

Full Text (PDF, 931KB), PP.1-9

Views: 0 Downloads: 0

Author(s)

Khandaker Mamun Ahmed 1,* Prianka Mandal 2 B M Mainul Hossain 3

1. Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh

2. Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh

3. Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2019.02.01

Received: 13 Sep. 2018 / Revised: 5 Nov. 2018 / Accepted: 14 Dec. 2018 / Published: 8 Mar. 2019

Index Terms

Synthesis, normalization, dialect, diphone, concatenation, tokenization, romanization

Abstract

Text-to-speech (TTS) synthesis is a rapidly growing field of research. Speech synthesis systems are applicable to several areas such as robotics, education and embedded systems. The implementation of such TTS system increases the correctness and efficiency of an application. Though Bangla is the seventh most spoken language all over the world, uses of TTS system in applications are difficult to find for Bangla language because of lacking simplicity and lightweightness in TTS systems. Therefore, in this paper, we propose a simple and lightweight TTS system for Bangla language. We converted Bangla text to Romanized text based on Bangla graphemes set and by developing a bunch of romanization rules. Besides, an xml-based data representation is developed as a feature of the system. It gives the flexibility to modify the data representation, parsing data and create speech based on one’s own dialect. Our proposed system is very lightweight which takes less processing time and produces a good understandable speech.

Cite This Paper

Khandaker Mamun Ahmed, Prianka Mandal, B M Mainul Hossain, "Text to Speech Synthesis for Bangla Language", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.11, No.2, pp. 1-9, 2019. DOI:10.5815/ijieeb.2019.02.01

Reference

[1]Firoj Alam, Promila Kanti Nath, Mumit Khan (2007 ’Text to speech for Bangla language using festival’, BRAC University
[2]Mukherjee, Sankar and Mandal, Shyamal Kumar Das (2012) ’A Bengali speech synthesizer on Android OS’, Association for Computational Linguistics, Proceedings of the 1st Workshop on Speech and Multimodal Interaction in Assistive Environments, pp.43–46
[3]Hasan, KM Azharul and Islam, Md Sajidul and Mashrur-E-Elahi, GM and Izhar, Mohammad Navid2013 ’Sentiment Recognition from Bangla Text’, Technical Challenges and Design Issues in Bangla Language Processing, pp.315
[4]Languages of India http://censusindia.gov. in/Census_Data_2001/Census_Data_Online/Language/Statement1.h (Accessed 8 August 2017)
[5]K. M. A. Hasan and M. Hozaifa and S. Dutta and R. Z. Rabbi, A framework for Bangla text to speech synthesis, pp.60-64. doi:10.1109/ICCITechn.2014.6997307
[6]Firoj Alam, Promila Kanti Nath, Mumit Khan (2011) ‘Bangla text to speech using festival’,Conference on human language technology for development, pp.154-161
[7]Walker, Willie and Lamere, Paul and Kwok, Philip (2002) ‘FreeTTS: a performance case study’, Sun Microsystems Inc.
[8]History and Development of Speech Synthesis, Helsinki University of Technology, http://research.spa.aalto.fi/publications/theses/lemmetty_mst/chap2.html (Accessed 11 September 2018)
[9]Islam, Md Rafiqul and Saha, Ram Shanker and Hossain, Ashif Rubayat (2009 ’Automatic reading from Bangla PDF document using rule-based concatenative synthesis’, IEEE, pp.521–525
[10]DasMandal, Shyamal Kr and Pal, Barnali (2002 ’Bengali text to speech synthesis system a novel approach for crossing literacy barrier’, CSIYITPA (E)
[11]Mandal, Shyamal Kr Das and Datta, Asoke Kumar (2007) ’Epoch synchronous non-overlap-add (ESNOLA) method-based concatenative speech synthesis system for Bangla’, SSW, pp.351–355
[12]Text Normalization, http://developer.ivona.com/en/ttsresources/text_normalization/text_normalization_en.html, (Accessed 25 July 2017)
[13]Alam, Firoj and Habib, SM and Khan, Mumit (2008 ’Text normalization system for Bangla’, BRAC University
[14]Panchapagesan, K and Talukdar, Partha Pratim and Krishna, N Sridhar and Bali, Kalika and Ramakrishnan, AG (2004 ’Hindi text normalization’, Fifth International Conference on Knowledge Based Computer Systems (KBCS), Citeseer, pp.19–22
[15]David Eugene Smith and Louis Charles Karpinski ’The HinduArabic Numerals’, Fifth International Conference on Knowledge Based Computer Systems (KBCS), http://www.gutenberg.org/ebooks/22599
[16]Bengali at Ethnologue (18th ed., 2015), http://www.ethnologue.com/18/language/ben, (Accessed 23 October 2017)
[17]Schröder, Marc and Trouvain, Jürgen (2003) ’The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching’, International Journal of Speech Technology, vol. 6, No. 4,pp.365–377, issn.1572-8110, https://doi.org/10.1023/A:1025708916924
[18]Eric Moulines and Francis Charpentier (1990) ’Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones’, Speech Communication, vol. 9, No. 5, pp.453 - 467, issn.0167-6393, http://www.sciencedirect.com/science/article/pii/016763939090021Z
[19]Charpentier and E. Moulines (1988) ’Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones’, Text-to-speech algorithms based on FFT synthesis, pp.667-70
[20]Hamon, Christian and Mouline, E and Charpentier, Francis (1989 ’A diphone synthesis system based on time-domain prosodic modifications of speech’, Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on, IEEE, pp.238-241
[21]Taylor, Paul and Black, Alan W and Caley, Richard (1998 ’The architecture of the Festival speech synthesis system’, International Speech Communication Association
[22]Black, Alan W and Lenzo, Kevin A (2001 ’Flite: a small fast run-time synthesis engine’, 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis
[23]Accessibility features on your iPhone, iPad, and iPod touch (Including VoiceOver, Zoom and Invert Colors), https://support.apple.com/en-us/HT204390, (Accessed 11 September 2017)
[24]Accessibility features built into Windows and Microsoft Office, https://www.microsoft.com/en-us/accessibility/, (Accessed 13 September 2017)
[25]Festival Speech Synthesis System, http://festvox.org/festival/, (Accessed 5 August 2017)
[26]Working Group on Speech Understanding and Aging (1988) ‘Speech understanding and aging’, The Journal of the Acoustical Society of America,vol.83,No.3,pp.859–895
[27]MBROLA project voice database for speech synthesis, http://tcts.fpms.ac.be/synthesis/mbrola.html, (Accessed 12 September 2018)
[28]Data files, https://github.com/Mamunahmed33/Bangla-Text-to-Speech/tree/master/Data%20files