IJISA Vol.7, No.1, Dec. 2014

A Stochastic Prediction Interface for Urdu

Qaiser Abbas

Index Terms

Urdu Prediction Interface, N-Gram Language Model, QASKU, Word and Sequence Prediction, Corpus Based Application


This work lays down a foundation for text prediction of an inflected and under-resourced language Urdu. The interface developed is not limited to a T9 (Text on 9 keys) application used in embedded devices, which can only predict a word after typing initial characters. It is capable of predicting a word like T9 and also a sequence of word after a word in a continuous manner for fast document typing. It is based on N-gram language model. This stochastic interface deals with three N-gram levels from unary to ternary independently. The uni-gram mode is being in use for applications like T9, while the bi-gram and tri-gram modes are being in use for sentence prediction. The measures include a percentage of keystrokes saved, keystrokes until completion and a percentage of time saved during the typing. Two different corpora are merged to build a sufficient amount of data. The test data is divided into a test and a held out data equally for an experimental purpose. This whole exercise enables the QASKU system outperforms the FastType with almost 15% more saved keystrokes.

Cite This Paper

Qaiser Abbas,"A Stochastic Prediction Interface for Urdu", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.1, pp.94-100, 2015. DOI: 10.5815/ijisa.2015.01.09


