Spelling Error Patterns in Typed Yorùbá Text Documents

Full Text (PDF, 707KB), PP.28-38

Views: 0 Downloads: 0

Author(s)

Asahiah Franklin Oladiipo 1,* Onifade Mary Taiwo 1 Adegunlehin Abayomi Emmanuel 1

1. Obafemi Awolowo University, Ile-Ife, Nigeria

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2020.06.03

Received: 27 Mar. 2020 / Revised: 13 May 2020 / Accepted: 24 Jun. 2020 / Published: 8 Dec. 2020

Index Terms

Yorùbá, diacritics, misspellings, patterns, spellchecking

Abstract

While writing in most of the world’s major languages have a long history, Yorùbá is a relatively young language as far as writing it down is concerned. It is therefore an under-resourced language as far as tools for processing it in digital format is concerned. Spell checking is one of these tools. An analysis of the spelling error pattern is fundamental to the task of producing a good spell checker. We addressed this challenge in this article and our findings showed that spelling error pattern in Yorùbá followed that of other languages in general. There were, however, obvious departure from the norms in the specific. Diacritic-related misspelling accounted for more than 80% of all errors and words with single edit error were less than the generally expected minimum threshold of 80%. In addition, most of the errors were vowel-related with consonants accounting for less than 15% of all errors. Word-length does not seem to have any direct bearing on number of errors in a word. The research showed that the impact of diacritics on spelling error is more in Yorùbá where diacritics are majorly used for tone marking where it accounts for more than 80% of spelling errors than in languages like Brazilian Portuguese and Spanish where diacritics are used for differentiating characters where spelling error due to diacritics covered less than 60% of all errors. We thus conclude that while, to a significant extent, the character set used in a language determines distribution of spelling error, the purpose to which diacritics is employed in language also affect the distribution of spelling error in a language.

Cite This Paper

Asahiah Franklin Oladiipo, Onifade Mary Taiwo, Adegunlehin Abayomi Emmanuel, "Spelling Error Patterns in Typed Yorùbá Text Documents", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.12, No.6, pp. 28-38, 2020. DOI:10.5815/ijieeb.2020.06.03

Reference

[1]Adejumo, A.: A postcolonial analysis of the literary and cultural consequences of the abolition of the 18th century transatlantic slave trade on the Yorùbá of south western nigeria. Lumina, 2010, 21(2), 1–1
[2]Andrade, G., Teixeira, F., Xavier, C.R., Oliveira, R.S., Rocha, L., Evsukoff, A.G.: Hasch: high performance automatic spell checker for Portuguese texts from the web. Procedia Computer Science, 2012, 9, 403–411
[3]Asahiah, F.O., Odejobi, O.A., Adagunodo, E.R.: Restoring tone-marks in standard Yorùbá electronic text: improved model. Computer Science, 2017, 18(3,) AGH University of Science and Technology Press. DOI: https://doi.org/10.7494/csci.2017.18.3.2128
[4]Bebout, L. An error analysis of misspellings made by learners of English as a first and as a second language. Journal of Psycholinguistic Research, 1985, 14(6), 569–593.
[5]van Berkelt, B., Smedt, K.D.: Triphone analysis: A combined method for the correction of orthographical and typographical errors. In: Second Conference on Applied Natural Language Processing. Association for Computational Linguistics, Austin, Texas, USA 1988). https://doi.org/10.3115/974235.974250 1988, 77–83.
[6]Bustamante, F.R., Díaz, E. L.: Spelling Error Patterns in Spanish for Word Processing Applications. In: LREC, 2006, 93–98
[7]Church Missionary Society: Dictionary of Yorùbá Language Church Missionary Society, Lagos, 1913.
[8]Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM, 1964, 7(3), 171 – 176
[9]Deorowicz, S., Ciura, M.G.: Correcting spelling errors by modelling their causes. Int. J. Appl. Math. Comput. Sci., 2005, 15(2), 275 – 285
[10]Desmet, P., Jooken, L., Schmitter, P., Swiggers, P. (eds.): The History of Linguistic and Grammatical Praxis. Leuven/Paris/Sterling: Peeters, 2000
[11]Elliott, G., Johnson, N.: All the right letters–just not necessarily in the right order. spelling errors in a sample of gcse english scripts. In: Paper presented at the British Educational Research Association Annual Conference, Edinburgh, UK. 2008
[12]Fagborun, J.G.: Disparities in tonal and vowel representation: Some practical problems in Yorùbá orthography. Journal of West African Languages 19(2) 1989,
[13]Gimenes, P.A., Roman, N.T., Carvalho, A.M.B.: Spelling error patterns in Brazilian Portuguese Computational Linguistics, 2015, 41(1), 175–183
[14]Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 1992, 24(4), 377–439
[15]Naseem, T., Hussain, S.: A novel approach for ranking spelling error corrections for urdu. Language Resources and Evaluation, 2007, 41(2), 117–128.
[16]Nguyen, T.T.H., Jatowt, A., Coustaty, M., Nguyen, N.V., Doucet, A.: Deep statistical analysis of OCR errors for effective post-ocr processing. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL),IEEE, 2019,. 29–38.
[17]Ogunbiyi, I.A.: The search for a Yorùbá orthography since the 1840s: Obstacles to the choice of the Arabic script. Sudanic Africa: A Journal of Historical Sources, 2003, 14, 77–102
[18]Olúmúyìwá, T.: Yorùbá writing: Standards and trends. Journal of Arts and Humanities, 2013, 2(1), 40
[19]Omu, F.I.A.: The ‘ìwe irohin’, 1859-1867. Journal of the Historical Society of Nigeria, 1967, 4(1), 35–44, http://www.jstor.org/stable/41971199
[20]Ren, X., Perrault, F.: The typology of unknown words: an experimental study of two corpora. In: COLING 1992 Volume 1: The 15th International Conference on Computational Linguistics, 1992.
[21]Salawu, A.: The Yorùbá and their language newspapers: Origin, nature, problems and prospects. Studies of Tribes and Tribals, 2004, 2(2), 97–104. https://doi.org/10.1080/0972639X.2004.11886508.