Iosif Mporas

Work place: Dept. of Electrical & Computer Engineering, University of Patras, 26500 Patras, Greece

E-mail: imporas@upatras.gr

Website:

Research Interests: Pattern Recognition, Speech Recognition, Speech Synthesis

Biography

Iosif Mporas is senior researcher at the University of Patras and non-tenured Assistant Professor at the Technological Educational Institute of Patras. His research interests include speech and audio signal processing, pattern recognition, automatic speech recognition, automatic speech segmentation and spoken language/dialect identification.

Author Articles

An Overview of Automatic Audio Segmentation

By Theodoros Theodorou Iosif Mporas Nikos Fakotakis

DOI: https://doi.org/10.5815/ijitcs.2014.11.01, Pub. Date: 8 Oct. 2014

In this report we present an overview of the approaches and techniques that are used in the task of automatic audio segmentation. Audio segmentation aims to find changing points in the audio content of an audio stream. Initially, we present the basic steps in an automatic audio segmentation procedure. Afterwards, the basic categories of segmentation algorithms, and more specific the unsupervised, the data-driven and the mixed algorithms, are presented. For each of the categorizations the segmentation analysis is followed by details about proposed architectural parameters, such us the audio descriptor set, the mathematical functions in unsupervised algorithms and the machine learning algorithms of data-driven modules. Finally a review of proposed architectures in the automatic audio segmentation literature appears, along with details about the experimenting audio environment (heading of database and list of audio events of interest), the basic modules of the procedure (categorization of the algorithm, audio descriptor set, architectural parameters and potential optional modules) along with the maximum achieved accuracy.

[...] Read more.

Integration of Temporal Contextual Information for Robust Acoustic Recognition of Bird Species from Real-Field Data

By Iosif Mporas Todor Ganchev Otilia Kocsis Nikos Fakotakis Olaf Jahn Klaus Riede

DOI: https://doi.org/10.5815/ijisa.2013.07.02, Pub. Date: 8 Jun. 2013

We report on the development of an automated acoustic bird recognizer with improved noise robustness, which is part of a long-term project, aiming at the establishment of an automated biodiversity monitoring system at the Hymettus Mountain near Athens, Greece. In particular, a typical audio processing strategy, which has been proved quite successful in various audio recognition applications, was amended with a simple and effective mechanism for integration of temporal contextual information in the decision-making process. In the present implementation, we consider integration of temporal contextual information by joint post-processing of the recognition results for a number of preceding and subsequent audio frames. In order to evaluate the usefulness of the proposed scheme on the task of acoustic bird recognition, we experimented with six widely used classifiers and a set of real-field audio recordings for two bird species which are present at the Hymettus Mountain. The highest achieved recognition accuracy obtained on the real-field data was approximately 93%, while experiments with additive noise showed significant robustness in low signal-to-noise ratio setups. In all cases, the integration of temporal contextual information was found to improve the overall accuracy of the recognizer.

[...] Read more.

Evaluation of Hidden Semi-Markov Models Training Methods for Greek Emotional Text-to-Speech Synthesis

By Alexandros Lazaridis Iosif Mporas

DOI: https://doi.org/10.5815/ijitcs.2013.04.03, Pub. Date: 8 Mar. 2013

This paper describes and evaluates four different HSMM (hidden semi-Markov model) training methods for HMM-based synthesis of emotional speech. The first method, called emotion-dependent modelling, uses individual models trained for each emotion separately. In the second method, emotion adaptation modelling, at first a model is trained using neutral speech, and thereafter adaptation is performed to each emotion of the database. The third method, emotion-independent approach, is based on an average emotion model which is initially trained using data from all the emotions of the speech database. Consequently, an adaptive model is build for each emotion. In the fourth method, emotion adaptive training, the average emotion model is trained with simultaneously normalization of the output and state duration distributions. To evaluate these training methods, a Modern Greek speech database which consists of four categories of speech, anger, fear, joy and sadness, was used. Finally, an emotion recognition rate subjective test was performed in order to measure and compare the ability of each of the four approaches in synthesizing emotional speech. The evaluation results showed that the emotion adaptive training achieved the highest emotion recognition rates among four evaluated methods, throughout all four emotions of the database.

[...] Read more.

Phone Duration Modeling of Affective Speech Using Support Vector Regression

By Alexandros Lazaridis Iosif Mporas Todor Ganchev

DOI: https://doi.org/10.5815/ijisa.2012.08.01, Pub. Date: 8 Jul. 2012

In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR) in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness) plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE) throughout all emotional categories.

[...] Read more.

MECS Press Menu

Iosif Mporas

Author Articles

An Overview of Automatic Audio Segmentation

Integration of Temporal Contextual Information for Robust Acoustic Recognition of Bird Species from Real-Field Data

Evaluation of Hidden Semi-Markov Models Training Methods for Greek Emotional Text-to-Speech Synthesis

Phone Duration Modeling of Affective Speech Using Support Vector Regression

Other Articles