Work place: Department of Computer Applications, Maulana Azad National Institute of Technology, Bhopal-462003, India
E-mail: sujdas@gmail.com
Website:
Research Interests: Natural Language Processing, Information Systems, Information Retrieval
Biography
Sujoy Das was born in Patna, India in 1969. He has done his Masters in Computer Applications in 1991 and Ph. D. in 2009 from MANIT, Bhopal.
He is working as Associate Professor in Department of Mathematics & Computer Applications, Maulana Azad National Institute of Technology (MANIT), Bhopal, India .He has a teaching experience of 20 years and his research interests include areas such as Cross Language Information Retrieval, Query Expansion and Cross Language Text Reuse. His published works include:
Dr. Das is a Member of Information Retrieval Society of India (IRSI) and a Professional Member, Association for Computing Machinery (ACM).
DOI: https://doi.org/10.5815/ijitcs.2016.08.09, Pub. Date: 8 Aug. 2016
Linking and tracking news stories covering the same events written in different languages is a challenging task. In natural languages same information may be expressed in multiple ways and newspapers try to exploit this feature for making the news stories more appealing. It has been observed that the same news story is presented in same as well as in different language in different ways but normally the gist remains the same. Diversity of linguistic expressions presents a major challenge in identifying and tracking news stories covering the same events across languages, but doing so may provide rich and valuable resources as comparable and parallel corpora can be generated with this resource. In the case of Indian languages there exist limited language resources for Natural Language Processing and Information Retrieval tasks and identifying comparable and parallel documents would offer a potential source for deriving bilingual dictionaries and training statistical Machine Translation systems. Paraphrasing is the most common way of reproducing news stories and translated text is also a type of paraphrase. Prior to linking monolingual or bilingual news stories, these paraphrase types need to identified and classified to help researchers to devise techniques to solve these challenging problems. English-Hindi language pair not only differs in their scripts but also in their grammar and vocabulary. A number of paraphrase typologies have been built from the perspective of Natural Language Processing or for some or the other specific applications but as per the knowledge of the authors, no typology have been reported for English-Hindi cross language text reuse. In this paper a typology is formulated for cross lingual journalistic text reuse in English-Hindi. Typology unravels level of difficulties in English-Hindi mapping. It shall help in devising techniques for linking and tracking English-Hindi stories.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals