Work place: Department of Computer Science, University of Karachi, Karachi, Pakistan
E-mail: humera@uok.edu.pk
Website:
Research Interests:
Biography
Dr. Humera Tariq received the B.E (Electrical) degree from NED University of Engineering and Technology in 1999 and then continue her studies at University of Karachi for Masters of Computer Science (MCS) in 2001. She stands First Class First amongst the Evening batch of MCS 2001-2003. She joined MS leading to PhD program in 2009 and completed MS course work with CGPA 4.0. She started her PhD work in the field of image processing in 2011 under the supervision of Meritorious Professor Dr. S.M. Aqil Burney.
By Muhammad Hazique Khatri Humera Tariq Maryam Feroze Ebad Ali Zeeshan Anjum Junaidi
DOI: https://doi.org/10.5815/ijitcs.2024.03.04, Pub. Date: 8 Jun. 2024
Urdu Language ranks ten and is continuously progressing. This unique PRISMA-Driven review deeply investigates Urdu speech recognition literature and adjoin it with English, Mandarin Chinese, and Hindi languages frame-works conceptualizing wider global perspective. The main objective is to unify progress on classical Artificially Intelligent (AI) and recent Deep Neural Networks (DNN) based speech recognition pipeline encompassing Dataset challenges, Feature extraction methods, Experimental design and the smooth integration with both Acoustic models (AM) and Language models (LM) using Transcriptions. A total of 176 articles were extracted from Google Scholar database for each language with custom query design. Inclusion criteria and quality assessment leads to end up with 5 review and 42 research articles. Comparative research questions have been addressed and findings were organized by four possible speech types: Isolated, connected, continuous and spontaneous. The finding shows that English, Mandarin, and Hindi languages used spontaneous speech size of 300, 200 and 1108 hours respectively which is quite remarkable as compared to Urdu spontaneous speech data size of only 9.5 hours. For the same data size reason, the Word Error Rate (WER) for English falls below 5% while for Mandarin Chinese the alternative metric Character Error Rate (CER) is mostly used that lies below 25%. The success of English and Chinese Speech recognition leads to incomparable accuracy due to wide use of DNNs like Conformer, Transformers, E2E-attention in comparison to conventional feature extraction and AI models LSTM, TDNN, RNN, HMM, GMM-HMM; used frequently by both Hindi and Urdu.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals