Work place: Department of Computer Science & Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, India
E-mail: bhupinderdhani@gmail.com
Website:
Research Interests: Computer systems and computational processes, Computer Architecture and Organization, Data Mining, Data Structures and Algorithms, Analysis of Algorithms
Biography
Bhupinderjit Singh received his M Tech degree in Computer Science and Engineering from Dr. B R Ambedkar National Institute of Technology, Jalandhar. His research interest includes Web Mining, Data Structures and Algorithms.
By Bhupinderjit Singh Deepak Kumar Gupta Raj Mohan Singh
DOI: https://doi.org/10.5815/ijmecs.2017.11.04, Pub. Date: 8 Nov. 2017
World Wide Web is a vast, dynamic and continuously growing collection of web documents. Due to its huge size, it is very difficult for the users to search for the relevant information about a particular topic of interest. In this paper, an improved architecture of focused crawler is proposed, which is a hybrid of various techniques used earlier. The main goal of a focused crawler is to fetch the web documents which are related to a pre-defined set of topics/domains and to ignore the irrelevant web pages. To check the relevancy of a web page, Page Score is computed on the basis of content similarity of the web page with reference to the topic keywords. URLs Priority Queue is implemented by calculating the Link Score of extracted URLs based on URLs attributes. URLs queue is also optimized by removing the duplicate contents. Topic Keywords Weight Table is expanded by extracting more keywords from the relevant pages database and recalculating the keywords weight. The experimental result shows that our proposed crawler has better efficiency than the earlier crawlers.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals