Work place: Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, Rion-Patras 26500, Greece
E-mail: alaza@upatras.gr
Website:
Research Interests: Computer systems and computational processes, Speech Recognition, Speech Synthesis
Biography
Alexandros Lazaridis was born in Thessaloniki, in 1981. He graduated in September of 2005 from the Department of Electrical & Computer Engineering at Aristotle University of Thessaloniki, in Greece. He received his PhD at the Department of Electrical and Computer Engineering at the University of Patras, in February of 2011. Currently he is post-doctoral researcher at the University of Patras and non-tenured Lecturer at the University of Western Macedonia and non-tenured Assistant Professor at the Technological Educational Institute of Serres. He is author and co-author in more than 20 papers. His fields of research include Speech Processing, Voice Conversion, Speech Synthesis and Speech Prosody.
By Alexandros Lazaridis Iosif Mporas
DOI: https://doi.org/10.5815/ijitcs.2013.04.03, Pub. Date: 8 Mar. 2013
This paper describes and evaluates four different HSMM (hidden semi-Markov model) training methods for HMM-based synthesis of emotional speech. The first method, called emotion-dependent modelling, uses individual models trained for each emotion separately. In the second method, emotion adaptation modelling, at first a model is trained using neutral speech, and thereafter adaptation is performed to each emotion of the database. The third method, emotion-independent approach, is based on an average emotion model which is initially trained using data from all the emotions of the speech database. Consequently, an adaptive model is build for each emotion. In the fourth method, emotion adaptive training, the average emotion model is trained with simultaneously normalization of the output and state duration distributions. To evaluate these training methods, a Modern Greek speech database which consists of four categories of speech, anger, fear, joy and sadness, was used. Finally, an emotion recognition rate subjective test was performed in order to measure and compare the ability of each of the four approaches in synthesizing emotional speech. The evaluation results showed that the emotion adaptive training achieved the highest emotion recognition rates among four evaluated methods, throughout all four emotions of the database.
[...] Read more.By Alexandros Lazaridis Iosif Mporas Todor Ganchev
DOI: https://doi.org/10.5815/ijisa.2012.08.01, Pub. Date: 8 Jul. 2012
In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR) in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness) plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE) throughout all emotional categories.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals