Work place: LITIO laboratory, University of Oran, BP 1524, El-M'Naouer, 31000, Oran, Algeria
E-mail: nait-bahloul.safia@univ-oran.dz
Website:
Research Interests: World Wide Web
Biography
Nait Bahloul Safia: Associate professor in department of computer science, University of Oran Es Senia, Algeria. Since 2011, she has been leading a team on the topic of data engineering and Web Technology. Her research covers advanced aspects of databases, Web technology and unsupervised classification.
By Saidi Imene Nait Bahloul Safia
DOI: https://doi.org/10.5815/ijitcs.2014.09.07, Pub. Date: 8 Aug. 2014
Web information sources such as forums, blogs, and news articles are becoming increasingly large and diverse. Even if advances in technology are helping to improve techniques for dealing with the large amounts of the generated data, such data sources are heterogeneous in structure (semi structured or unstructured sources) and nature (texts or images). Implementation of software solutions is then necessary to prepare data and access these sources in a homogenous way. In this paper we present an approach for indexing heterogeneous data sources. Our objective is to offer techniques for efficient indexing of web sources by storing only the necessary information. We propose automatic indexing for semi structured or unstructured sources (e.g., xml files, html files) and annotation for other sources (e.g., images, videos that exist within a page). We present our algorithms of indexing and propose the use of MapReduce model to build a scalable inverted index. Experiments on a real-world corpus show that our approach achieves a good performance.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals