Discussion on Damping Factor Value in PageRank Computation

Full Text (PDF, 696KB), PP.19-28

Views: 0 Downloads: 0

Author(s)

Atul Kumar Srivastava 1,* Rakhi Garg 2 P. K. Mishra 1

1. Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi -221005, India

2. Department of Computer Science, MMV, Banaras Hindu University, Varanasi -221005, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.09.03

Received: 17 Jan. 2017 / Revised: 2 Mar. 2017 / Accepted: 10 Apr. 2017 / Published: 9 Sep. 2017

Index Terms

PageRank method, Power method, Web matrix, Markov model, Eigenvalue problem

Abstract

Web search engines use various ranking methods to determine the order of web pages displayed on the Search Engine Result Page (SERP). PageRank is one of the popular and widely used ranking method. PageRank of any web page can be defined as a fraction of time a random web surfer spends on that web page on average. The PageRank method is a stationary distribution of a stochastic method whose states are web pages of the Web graph. This stochastic method is acquired by combining the hyperlink matrix of the web graph and a trivial uniform process. This combination is needed to make primitive so that stationary distribution is well defined. The combination depends on the value of damping factor α∈[0,1] in the computation of PageRank. The damping factor parameter state that how much time random web surfer follow hyperlink structure than teleporting. The value of α is exceptionally empirical and in current scenario α = 0.85 is considered as suggested by Brin and Page. If we take α =0.8 then we can say that out of total time, 80% of time is taken by the random web surfer to follow the hyperlink structure and 20% time they teleport to new web pages randomly. Today web surfer gets worn out too early on the web because of non-availability of relevant information and they can easily teleport to new web pages rather than following hyperlink structure. So we have to choose some value of damping factor other than 0.85. In this paper, we have given an experimental analysis of PageRank computation for different value of the damping factor. We have observed that for value of α=0.7, PageRank method takes fewer numbers of iterations to converge than α=0.85, and for these values of α the top 25 web pages returned by PageRank method in the SERP are almost same, only some of them exchange their positions. From the experimental results it is observed that value of damping factor α=0.7 takes approximate 25-30% fewer numbers of iterations than α=0.85 to get closely identical web pages in top 25 result pages for personalized web search, selective crawling, intra-web search engine.

Cite This Paper

Atul Kumar Srivastava, Rakhi Garg, P. K. Mishra, "Discussion on Damping Factor Value in PageRank Computation", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.9, pp.19-28, 2017. DOI:10.5815/ijisa.2017.09.03

Reference

[1]Sun, H. and Wei, Y., (2006) “A note on the PageRank algorithm” Applied Mathematics and computation, Vol. 179 No.2, pp.799-806.
[2]Bianchini, M., Gori, M. and Scarselli, F., (2005) “Inside pagerank” ACM Transactions on Internet Technology (TOIT), Vol. 5 No. 1, pp.92-128.
[3]F. Crestani, M. Girolami, and C. J. van Rijsbergen (2002) “Advances in Information Retrieval” number 2291 in LNCS, pp. 73–85.
[4]Rajaraman, A. and Ullman, J.D., (2012) “Mining of massive datasets” Cambridge: Cambridge University Press, Vol. 1.
[5]Muhammad Mahbubur Rah-man,"Mining Social Data to Extract Intellectual Knowledge", International Journal of Intelligent Systems and Applica-tions(IJISA), vol.4, no.10, pp.15-24, 2012. DOI: 10.5815/ijisa.2012.10.02.
[6]Henzinger, M. R. (2001) “Hyperlink analysis for the web”, Internet Computing, IEEE, Vol.5 Issue.1, pp. 45-50.
[7]Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (1999) “The PageRank citation ranking: Bringing order to the web” Technical Report 66, Stanford University, Available at http://dbpubs.stanford.edu/pub/1999-66.
[8]Brin, S., and Page, L. (2012) “Reprint of: The anatomy of a large-scale hyper-textual web search engine” Computer networks, Vol. 56 No.18, pp. 3825-3833.
[9]Sargolzaei, P. and Soleymani, F., (2010) “PageRank problem, survey and future research directions” In International Mathematical Forum, Vol. 5 No. 19, pp. 937-956.
[10]Langville, A. N., and Meyer, C. D. (2004) “Deeper inside pagerank” Internet Mathematics, Vol.1 No.3, pp. 335-380.
[11]Avrachenkov, K., Litvak, N., and Pham, K. S. (2008) “A singular perturbation approach for choosing the PageRank damping factor” Internet Mathematics}, Vol.5 No. (1-2), pp. 47-69.
[12]Boldi, P., Santini, M., and Vigna, S. (2005) “PageRank as a function of the damping factor” In Proceedings of the 14th international conference on World Wide Web, ACM, pp. 557-566.
[13]Langville, A.N., Meyer, C.D. (2006) “Google’s PageRank and Beyond: The Science of Search Engine Rankings” Princeton University Press, Princeton.
[14]Brinkmeier, M. (2006) “PageRank revisited” ACM Transactions on Internet Technology (TOIT), Vol. 6 No. 3, pp. 282-301.
[15]Arasu, A., Novak, J., Tomkins, A., and Tomlin, J. (2002) “PageRank computation and the structure of the web: Experiments and algorithms” In Proceedings of the Eleventh International World Wide Web Conference, Poster Track pp. 107-117.
[16]Avrachenkov, K., and Litvak, N. (2006) “The effect of new links on Google PageRank” Stochastic Models, Vol. 22 No. 2, pp. 319-331.
[17]Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. (2005) “Link analysis ranking: algorithms, theory, and experiments” ACM Transactions on Internet Technology (TOIT), Vol. 5 No. 1, pp. 231-297.
[18]Baeza-Yates, R., Boldi, P., and Castillo, C. (2006) “Generalizing pagerank: Damping functions for link-based ranking algorithms” In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval ACM, pp. 308-315.
[19]Bressan, M., and Peserico, E. (2010) “Choose the damping, choose the ranking” Journal of Discrete Algorithms, Vol. 8 No. 2, pp. 199-213.
[20]Melucci, M., and Pretto, L. (2007) “PageRank: When order changes” Springer Berlin Heidelberg, pp. 581-588.
[21]Engström, C. and Silvestrov, S., (2014) “Generalisation of the Damping Factor in PageRank for Weighted Networks” In Modern Problems in Insurance Mathematics, Springer International Publishing, pp. 313-333.
[22]Pretto, L. (2002) “A theoretical analysis of Google’s PageRank” In String Processing and Information Retrieval, Springer Berlin Heidelberg, pp. 131-144.
[23]Ma, N., Guan, J. and Zhao, Y., (2008) “Bringing PageRank to the citation analysis” Information Processing & Management, Vol. 44 No.2, pp.800-810.
[24]Liu, B., (2007) “Web data mining: exploring hyperlinks, contents, and usage data” Springer Science & Business Media.
[25]Berkhin, Pavel (2005) “A survey on pagerank computing” Internet Mathematics, Vol. 2 No. 1 pp. 73-120.
[26]Berry, M. W., Drmac, Z., and Jessup, E. R. (1999) “Matrices, vector spaces, and information retrieval” SIAM review, Vol. 41 No. 2, pp. 335-362.
[27]Donato, D., Laura, L., Leonardi, S., and Millozzi, S. (2004) “Large scale properties of the web graph” The European Physical Journal B-Condensed Matter and Complex Systems, Vol. 38 No. 2, pp. 239-243.
[28]Kamvar, S., Haveliwala, T., Manning, C., and Golub, G. (2003) “Exploiting the block structure of the web for computing pagerank” Stanford University Technical Report.
[29]Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. L. (2000) “Graph structure in the web” Computer networks, Vol. 33 No. (1-6), pp. 309-320.
[30]Kim, S. J., & Lee, S. H. (2002) “An improved computation of the pagerank algorithm, Advances in Information Retrieval, Springer Berlin Heidelberg, pp. 73-85.
[31]Lempel, Ronny, and Shlomo Moran (2001) “SALSA: the stochastic approach for link-structure analysis” ACM Transactions on Information Systems (TOIS), Vol. 19 No.2, pp. 131-160.
[32]Gleich, David Francis, and Michael Saunders (2009) “Models and algorithms for pagerank sensitivity” Stanford University, 2009.
[33]Serra Capizzano, S. (2007) “Google PageRanking problem: The model and the analysis” In Dagstuhl Seminar Proceedings, Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[34]Varga, Richard S. (2009) “Matrix iterative analysis” Springer Science and Business Media, Vol. 27.
[35]Datasets for Experiments on Link Analysis Ranking Algorithm shttp://www.cs.toronto.edu/experiments/datasets/index.html
[36]Jure Leskovec and Andrej Krevl (2014) “Stanford Large Network Dataset Collection” http://snap.stanford.edu/data.
[37]“Guava: Google Core Libraries for Java” https://github.com/google/guava.