Muhammad Hazique Khatri

Work place: Department of Computer Science, University of Karachi, Karachi, Pakistan

E-mail: haziqiqbal2101@gmail.com

Website:

Research Interests: Machine Learning, Natural Language Processing

Biography

Muhammad Hazique Khatri is a software engineer at Techwards. He graduated from the University of Karachi, Pakistan in 2023 with a Bachelor’s degree, ranking second in his class with a CGPA of 3.7. His research interests span machine learning, deep neural networks, and natural language processing (NLP).

Author Articles
A PRISMA-driven Review of Speech Recognition based on English, Mandarin Chinese, Hindi and Urdu Language

By Muhammad Hazique Khatri Humera Tariq Maryam Feroze Ebad Ali Zeeshan Anjum Junaidi

DOI: https://doi.org/10.5815/ijitcs.2024.03.04, Pub. Date: 8 Jun. 2024

Urdu Language ranks ten and is continuously progressing. This unique PRISMA-Driven review deeply investigates Urdu speech recognition literature and adjoin it with English, Mandarin Chinese, and Hindi languages frame-works conceptualizing wider global perspective. The main objective is to unify progress on classical Artificially Intelligent (AI) and recent Deep Neural Networks (DNN) based speech recognition pipeline encompassing Dataset challenges, Feature extraction methods, Experimental design and the smooth integration with both Acoustic models (AM) and Language models (LM) using Transcriptions. A total of 176 articles were extracted from Google Scholar database for each language with custom query design. Inclusion criteria and quality assessment leads to end up with 5 review and 42 research articles. Comparative research questions have been addressed and findings were organized by four possible speech types: Isolated, connected, continuous and spontaneous. The finding shows that English, Mandarin, and Hindi languages used spontaneous speech size of 300, 200 and 1108 hours respectively which is quite remarkable as compared to Urdu spontaneous speech data size of only 9.5 hours.  For the same data size reason, the Word Error Rate (WER) for English falls below 5% while for Mandarin Chinese the alternative metric Character Error Rate (CER) is mostly used that lies below 25%. The success of English and Chinese Speech recognition leads to incomparable accuracy due to wide use of DNNs like Conformer, Transformers, E2E-attention in comparison to conventional feature extraction and AI models LSTM, TDNN, RNN, HMM, GMM-HMM; used frequently by both Hindi and Urdu.

[...] Read more.
Other Articles