Aarti Kumar

Work place: Department of Computer Applications, Maulana Azad National Institute of Technology, Bhopal-462003, India

E-mail: aartikumar01@gmail.com

Website:

Research Interests: Information Systems, Information Retrieval, Information Storage Systems, Multimedia Information System

Biography

Aarti Kumar was born in Patna, India in 1963. She has done her Masters in Botany in 1983 from Patna University, India and in Computer Applications in 2005 from Indira Gandhi National Open University, India and is a university topper of Bachelors in Education (1999) from Barkatullah University, India.

She is currently pursuing her Ph. D. in Computer Applications from Maulana Azad National Institute of Technology (MANIT), Bhopal, India. Her area of research is Cross-Language Information Retrieval, more specifically English-Hindi Journalistic Text Reuse. She has a teaching Experience of 18 years and a research experience of more than three years. Her published works include:

  • “Query Formulation for Heuristic Retrieval in Obfuscated and Translated Partially Derived Text”, Journal of Information Science Theory and Practice(JISTaP), Korea Institute of Science and Technology Information, Vol. 3, No. 1 March 2013 issue, pp24-39, pISSN2287-9099, eISSN 2287-4577, DOI Prefix10.1633.
  • “An evolutionary survey from Monolingual Text Reuse to Cross Lingual Text Reuse in context to English‐Hindi”, International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015  ISSN 2229-5518 pp 996-1003
  • Pre-Retrieval based Strategies for Cross Language News Story Search" accepted in ACM Journal as Post-Proceedings of the 2013 Forum for Information Retrieval Evaluation (FIRE) December 04 - 06 2013, New Delhi, India.

Mrs. Kumar is an Associate Member, Information Retrieval Society of India (IRSI) and a Professional Member, Association for Computing Machinery (ACM).

Author Articles
Typology for Linguistic Pattern in English-Hindi Journalistic Text Reuse

By Aarti Kumar Sujoy Das

DOI: https://doi.org/10.5815/ijitcs.2016.08.09, Pub. Date: 8 Aug. 2016

Linking and tracking news stories covering the same events written in different languages is a challenging task. In natural languages same information may be expressed in multiple ways and newspapers try to exploit this feature for making the news stories more appealing. It has been observed that the same news story is presented in same as well as in different language in different ways but normally the gist remains the same. Diversity of linguistic expressions presents a major challenge in identifying and tracking news stories covering the same events across languages, but doing so may provide rich and valuable resources as comparable and parallel corpora can be generated with this resource. In the case of Indian languages there exist limited language resources for Natural Language Processing and Information Retrieval tasks and identifying comparable and parallel documents would offer a potential source for deriving bilingual dictionaries and training statistical Machine Translation systems. Paraphrasing is the most common way of reproducing news stories and translated text is also a type of paraphrase. Prior to linking monolingual or bilingual news stories, these paraphrase types need to identified and classified to help researchers to devise techniques to solve these challenging problems. English-Hindi language pair not only differs in their scripts but also in their grammar and vocabulary. A number of paraphrase typologies have been built from the perspective of Natural Language Processing or for some or the other specific applications but as per the knowledge of the authors, no typology have been reported for English-Hindi cross language text reuse. In this paper a typology is formulated for cross lingual journalistic text reuse in English-Hindi. Typology unravels level of difficulties in English-Hindi mapping. It shall help in devising techniques for linking and tracking English-Hindi stories.

[...] Read more.
Other Articles