Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words

Full Text (PDF, 611KB), PP.8-20

Views: 0 Downloads: 0

Author(s)

Hashem Ghaleb 1,* P. Nagabhushan 1 Umapada Pal 2

1. Department of Studies in Computer Science, University of Mysore, Mysore, India

2. CVPR Unit, Indian Statistical Institute, Kolkata, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2016.12.02

Received: 20 Aug. 2016 / Revised: 5 Oct. 2016 / Accepted: 9 Nov. 2016 / Published: 8 Dec. 2016

Index Terms

Arabic Handwriting Recognition, Arabic Sub-words, Sub-word Segmentation, Connected Component Extraction, Graph theoretic modeling

Abstract

Segmentation of Arabic text is a major challenge that shall be addressed by any recognition system. The cursive nature of Arabic writing makes it necessary to handle the segmentation issue at various levels. Arabic text line can be viewed as a sequence of words which in turn can be viewed as a sequence of sub-words. Sub-words have the frequently encountered intrinsic property of sharing the same vertical space which makes vertical projection based segmentation technique inefficient. In this paper, the task of segmenting handwritten Arabic text at sub-word level is taken up. The proposed algorithm is based on pulling away the connected components to overcome the impossibility of separating them by vertical projection based approach. Graph theoretic modeling is proposed to solve the problem of connected component extraction. In the sequel, these components are subjected to thorough analysis in order to obtain the constituent sub-words where a sub-word may consist of many components. The proposed algorithm was tested using variety of handwritten Arabic samples taken from different databases and the results obtained are encouraging. 

Cite This Paper

Hashem Ghaleb, P. Nagabhushan, Umapada Pal,"Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.8-20, 2016. DOI: 10.5815/ijigsp.2016.12.02

Reference

[1]R.M. Bozinovic, S.N. Srihari, "Off-line cursive script word recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 1, pp. 68-83,1989. 

[2]S.N. Srihari, G. Ball, An assessment of Arabic handwriting recognition technology, in: V. Margner, H. El Abed (Eds.), Guide to OCR for Arabic Script, Springer-Verlag, London, pp. 3-34, 2012.

[3]M.T. Parvez, S.A. Mahmoud, "Arabic handwriting recognition using structural and syntactic pattern attributes", Pattern Recognition, Vol. 46, No. 1, pp. 141-154, 2013. 

[4]A.Cheung, M. Bennamoun, N.W. Bergmann, "An Arabic optical character recognition system using recognition-based segmentation", Pattern Recognition, Vol. 34, No. 2, pp. 215-233, 2001.

[5]M. Zand, A.N Nilchiand, S.A. Monadjemi, "Recognition-based Segmentation in Persian Character Recognition", World Academy of Science, Engineering and Technology, Vol. 38, pp. 183-187, 2008.

[6]A. AbdulKader, Two-Tier Approach for Arabic Offline Handwriting Recognition based on conditional joining rules, in: Proceedings of the 2006 Summit on Arabic and Chinese Handwritten Recognition, pp. 121-127, 2006.

[7]Y. Chherawala, M. Cheriet, "W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents", Pattern Recognition, Vol. 45, No. 9, pp. 3277-3287, 2012.

[8]S. Wshah, V. Govindaraju, Y. Cheng, H. Li, "A Novel Lexicon Reduction Method for Arabic Handwriting Recognition", in: Proceedings of the Twentieth International Conference on Pattern Recognition, pp. 2865-2868, 2010.

[9]G.A. Abandah, F. Jamour, E. Qaralleh, " Recognizing handwritten Arabic words using grapheme segmentation and recurrent neural networks", International Journal of Document Analysis and Recognition, Vol. 17, No. 3, pp. 275-291, 2014.

[10]M. Elzobi, A. Al-Hamadi , Z. Al Aghbari, L. Dings, "IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach, International Journal of Document Analysis and Recognition", Vol. 16 No. 3, pp. 295-308, 2013.

[11]A. Alaei, P. Nagabhushan, U. Pal, "Dataset and Ground Truth for Handwritten Text in Four Different Scripts", International Journal of Pattern Recognition and Artificial I, Vol. 26, No. 4, 2012.

[12]S.A. Mahmoud, I. Ahmad, W.G, "Al-Khatib, M. Alshayeb, KHATT:An open Arabic offline handwritten text database", Pattern Recognition, Vol. 47, No. 3, pp. 1096-1112, 2014.

[13]M. Pechwitz, S. S. Maddouri, V. Märgner, N. Ellouze, H. Amiri, "IFN/ENIT- Database of Handwritten Arabic Words", in: Proceedings of CIFED : colloque international francophone surl'écrit et le document, pp.129-136, 2002.

[14]A. Elnagar, R. Bentrcia, "A Multi-Agent Approach to Arabic Handwritten Text Segmentation", Journal of Intelligent Learning Systems and Applications, Vol. 4, No. 3, pp. 207-215, 2012.

[15]A. Elnagar, R. Bentrcia, "A Recognition-Based Segmentation Approach to Segmenting Arabic Handwritten Text", Journal of Intelligent Learning Systems and Applications, Vol. 7, No. 4, pp. 93-103, 2015. 

[16]S. Marchand-Maillet, Y. M. Sharaih, Binary Digital Image Processing A Discrete Approach, first ed. Academic Press, London, 2000.

[17]R. C. Gonzalez, R. E. Woods, Digital Image Processing, third ed., Dorling Kindersley (India) Pvt. Ltd., India, 2009.

[18]S.N. Srihari, G. R. Ball, H. Srinivasan, "Versatile Search of Scanned Arabic Handwriting", in: Proceedings of the 2006 Summit on Arabic and Chinese Handwritten Recognition, pp. 57-69, 2006.

[19]F. K. Jaiem, S. Kanoun, M. Khemakhem, H. El Abed, J. Kardoun, "Database for Arabic Printed Text Recognition Research", in: Proceedings of the Seventeenth International Conference on Image Analysis and Processing, pp. 251-259, 2013.

[20]L. Zheng, A.H. Hassin, X. Tang, "A new algorithm for machine printed Arabic character segmentation", Pattern Recognition Letters, Vol. 25, No. 15, pp. 1723-1729, 2004.

[21]A. Ebrahimi, E. Kabir, "A pictorial dictionary for printed Farsi subwords", Pattern Recognition Letters, Vol. 29, No. 5, pp. 656-663, 2008.

[22]M. Khayyat, L. Lam, C.Y. Suen, "Learning-based word spotting system for Arabic handwritten documents", Pattern Recognition, Vol. 47, No. 3, pp. 1021-1030, 2014.