Automating Text Simplification Using Pictographs for People with Language Deficits

Full Text (PDF, 497KB), PP.26-34

Views: 0 Downloads: 0

Author(s)

Mai Farag Imam 1,* Amal Elsayed Aboutabl 1 Ensaf H. Mohamed 1

1. Computer Science Department, Faculty of Computers and Information, Helwan University, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2019.07.04

Received: 21 Feb. 2019 / Revised: 28 Apr. 2019 / Accepted: 7 Jun. 2019 / Published: 8 Jul. 2019

Index Terms

Natural language processing, pictographic communication, social inclusion, Text simplification, text summarization, word sense disambiguation

Abstract

Automating text simplification is a challenging research area due to the compound structures present in natural languages. Social involvement of people with language deficits can be enhanced by providing them with means to communicate with the outside world, for instance using the internet independently. Using pictographs instead of text is one of such means. This paper presents a system which performs text simplification by translating text into pictographs. The proposed system consists of a set of phases. First, a simple summarization technique is used to decrease the number of sentences before converting them to pictures. Then, text preprocessing is performed including processes such as tokenization and lemmatization. The resulting text goes through a spelling checker followed by a word sense disambiguation algorithm to find words which are most suitable to the context in order to increase the accuracy of the result. Clearly, using WSD improves the results. Furthermore, when support vector machine is used for WSD, the system yields the best results. Finally, the text is translated into a list of images. For testing and evaluation purposes, a test corpus of 37 Basic English sentences has been manually constructed. Experiments are conducted by presenting the list of generated images to ten normal children who are asked to reproduce the input sentences based on the pictographs. The reproduced sentences are evaluated using precision, recall, and F-Score. Results show that the proposed system enhances pictograph understanding and succeeds to convert text to pictograph with precision, recall and F-score of over 90% when SVM is used for word sense disambiguation, also all these techniques are not combined together before which increases the accuracy of the system over all other studies.

Cite This Paper

Mai Farag Imam, Amal Elsayed Aboutabl, Ensaf H. Mohamed, "Automating Text Simplification Using Pictographs for People with Language Deficits", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.7, pp.26-34, 2019. DOI:10.5815/ijitcs.2019.07.04

Reference

[1]Communication Matters accessed on https://www.communicationmatters.org.uk/page/what-is-aac

[2]V. VANDEGHINSTE, I. SEVENS, and F. VAN EYNDE, “Translating text into pictographs,” Natural Language Engineering, vol. 23, no. 2, pp. 217–244, 2017.

[3]M. Shardlow, “A Survey of Automated Text Simplification,” Int. J. Adv. Comput. Sci. Appl. Spec. Issue Nat. Lang. Process., pp. 58–70, 2014. 

[4]Siddharthan, “Syntactic Simplification and Text Cohesion (Thesis),” J. Laparoendosc. Adv. Surg. Tech. A, vol. 20, no. 10, pp. 1–31, 2004.

[5]M. Molineaux, M. Molineaux, K. Com, D. Aha, and N. R. L. Navy, “Continuous Explanation Generation in a Multi-Agent Domain,” vol. 2015, no. Article 1, pp. 1–6, 2015. 

[6]R. Antunes and S. Matos, “Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation,” J. Integr. Bioinform., vol. 14, no. 4, pp. 1–8, 2017. 

[7]Tijus, J.  Barcenilla, B. Cambon de Lavalette, and J. Meunier. "Chapter 2: The Design, Understanding and Usage of Pictograms". In Written Documents in the Workplace, (Leiden, The Netherlands: BRILL, 2007) doi: https://doi.org/10.1163/9789004253254_003 

[8]Das and A. F. T. Martins, “A Survey on Automatic Text Summarization,” Eighth ACIS Int. Conf. Softw. Eng. Artif. Intell. Netw. ParallelDistributed Comput. SNPD 2007, vol. 4, pp. 574–578, 2007.

[9]L. Frommberger and N. Waidyanatha, “Pictographs in Disaster Communication for Linguistically Challenged and Illiterate Populations,” Int. J. Inf. Syst. Cris. Response Manag., vol. 9, no. 2, pp. 37–57, 2017.

[10]L. Sevens, G. Jacobs, V. Vandeghinste, I. Schuurman, and F. Van Eynde, “Improving Text-to-Pictograph Translation Through Word Sense Disambiguation,” Proc. Fifth Jt. Conf. Lex. Comput. Semant., pp. 131–135, 2016.

[11]V. Vandeghinste and I. Schuurman, “Linking Pictographs to Synsets : Sclera2Cornetto,” pp. 3404–3410, 2008. 

[12]U. Pavalanathan and J. Eisenstein, “Emoticons vs. Emojis on Twitter: A Causal Inference Approach,” 2015. 

[13]J. Korpi and P. Ahonen-Rainio, “Design Guidelines for Pictographic Symbols : Evidence from Symbols Designed by Students,” Conf. Pap. EuroCarto 2015, no. November, pp. 1–19, 2015. 

[14]T. Dyches, A. Davis, B. Lucido & J. Young. Generalization of skills using pictographic and voice output communication devices. 18. 124-131. 10.1080/07434610212331281211, 2002.

[15]W. Leong, R. Mihalcea, and S. Hassan, “Text Mining for Automatic Image Tagging,” Coling, no. August, pp. 647–655, 2010. 

[16]M. Carpuat and D. Wu, “Improving statistical machine translation using word sense disambiguation,” Emnlp-2007, no. June, pp. 61–72, 2007. 

[17]C. Chiang and Y. Chan, “Word Sense Disambiguation Machine Translation,” no. June, pp. 33–40, 2007. 

[18]H. P. Luhn, "The Automatic Creation of Literature Abstracts," in IBM Journal of Research and Development, vol. 2, no. 2, pp. 159-165, Apr. 1958. doi: 10.1147/rd.22.0159 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5392672&isnumber=5392664 

[19]T. Kiss, & J. Strunk. Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics. 32. 485-525. 10.1162/coli.2006.32.4.485, 2006.

[20]S. Bird and E. Loper, “The natural language toolkit NLTK: The Natural Language Toolkit,” Proc. ACL-02 Work. Eff. tools Methodol. Teach. Nat. Lang. Process. Comput. Linguist., no. March, pp. 63–70, 2016. 

[21]J. Plisson, N. Lavrac, and D. D. Mladenić, “A rule based approach to word lemmatization,” Proc. 7th Int. Multiconference Inf. Soc., no. November, pp. 83–86, 2004.

[22]A. Ranjan Pal and D. Saha, “Word Sense Disambiguation: A Survey,” Int. J. Control Theory Comput. Model., vol. 5, no. 3, pp. 1–16, 2015. 

[23]M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone,” Proc. 5th Annu. Int. Conf. Syst. Doc., pp. 24–26, 1986. 

[24]Y. K. Lee, H. T. Ng, and T. K. Chia, “Supervised word sense disambiguation with support vector machines and multiple knowledge sources,” Senseval-3 Third Int. Work. Eval. Syst. Semant. Anal. Text, no. July, pp. 137–140, 2004. 

[25]“300 Basic English Sentences,” pp. 1–19. NLTK book accessed  on https://www.nltk.org/book/

[26]G. a. Miller, “WordNet: a lexical database for English,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995. 

[27]Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 248–255, 2009. 

[28][Online] precision and recall. Accessed on http://www.dcode.fr/precision-recall.

[29]P. Resnik and J. Lin, “Evaluation of NLP Systems,” Handb. Comput. Linguist. Nat. Lang. Process., pp. 271–295, 2010.