A Review on Large Scale Graph Processing Using Big Data Based Parallel Programming Models

Full Text (PDF, 679KB), PP.49-57

Views: 0 Downloads: 0

Author(s)

Anuraj Mohan 1,* Remya G 1

1. Dept of Computer Science & Engineering, NSS College of Engineering, Palakkad, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.02.07

Received: 2 May 2016 / Revised: 11 Aug. 2016 / Accepted: 25 Oct. 2016 / Published: 8 Feb. 2017

Index Terms

Graph algorithm, big data, mapreduce, bulk synchronous parallel, parallel programming

Abstract

Processing big graphs has become an increasingly essential activity in various fields like engineering, business intelligence and computer science. Social networks and search engines usually generate large graphs which demands sophisticated techniques for social network analysis and web structure mining. Latest trends in graph processing tend towards using Big Data platforms for parallel graph analytics. MapReduce has emerged as a Big Data based programming model for the processing of massively large datasets. Apache Giraph, an open source implementation of Google Pregel which is based on Bulk Synchronous Parallel Model (BSP) is used for graph analytics in social networks like Facebook. This proposed work is to investigate the algorithmic effects of the MapReduce and BSP model on graph problems. The triangle counting problem in graphs is considered as a benchmark and evaluations are made on the basis of time of computation on the same cluster, scalability in relation to graph and cluster size, resource utilization and the structure of the graph.

Cite This Paper

Anuraj Mohan, Remya G,"A Review on Large Scale Graph Processing Using Big Data Based Parallel Programming Models", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.2, pp.49-57, 2017. DOI:10.5815/ijisa.2017.02.07

Reference

[1]Jim, L. and Michael, S. (2010), ‘ Design Patterns for Efficient Graph Algorithms in MapReduce’ Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG 2010, New York, USA, pp. 78-85.
[2]Siddharth, S. and Sergei, V. (2011), ‘Counting Triangles and the Curse of the Last Reducer’. Proceedings of the 20th International Conference on World Wide Web, New York, USA, pp. 607-614
[3]Valiant, L.G. (1990) ‘A bridging model for parallel computation’, Communications of the ACM, vol. 33, no. 8, pp. 103–111.
[4]Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N. and Czajkowski, G. (2010), ‘Pregel: a system for large-scale graph processing’. Proceedings of the 2010 International Conference on Management of Data, SIGMOD ’10, New York, USA, pp. 135-146
[5]David, E. and David, A. E. (2013), ‘Investigating Graph Algorithms in the BSP Model on the Cray XMT’. Proceedings of the IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum, Cambridge, MA, pp. 1638-1645.
[6]Sherif, S. (2013), ‘Processing large scale graph `data. A guide to current technology’. IBM Developer Works [Online]. http://www.ibm.com/developerworks/opensource/library/os-giraph/index.html (Accessed 15 November 2015)
[7]Apache Giraph Project. [Online] http://giraph.apache.org/ (Accessed 21 September 2015.
[8]Thomasz, K., Przemyslaw, K. and Wojciech, I. (2014), ‘Parallel processing of large graphs’. Future Generation Computer Systems, Vol 32, pp. 324–337.
[9]Zhang, Y., Gao, Q., Gao, L. and Wang, C. (2012) ‘iMapReduce: A Distributed Computing Framework for Iterative Computation’. Journal of Grid Computing, Vol 10, No 1, pp. 47-68.
[10]SNAP.[Online]htps://snap.stanford.edu/data/(Accessed 18 March 2015)
[11]Jeffrey, D. and Sanjay, G. (2004), ‘MapReduce:Simpliļ¬ed data processing on large Clusters’.Proceedings of the Sixth Symposium of Operating Systems Design and Implementation(OSDI 2004), CA, USA, pp. 137-150
[12]Jonathan, C.H. (2009) ‘Graph twiddling in a mapreduce world’. Computing in Science &Engineering, Vol 11, No 4, pp. 29 –41
[13]Kang, U., Tsourakakis, C.E. and Faloutsos, C. (2010)PEGASUS:mining peta-scale graphs’.Knowledge and Information Systems, Vol 27, No 2, pp. 303-325.
[14]Louise, Q., Paul, W. and David, H. (2012), ‘Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis’. Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, pp. 457-463
[15]Rahman, Muhammad Mahbubur. "Mining social data to extract intellectual knowledge."International Journal of Intelligent Systems and Applications(IJISA), vol.4, no.10, pp.15-24, 2012
[16]Thomas, S.(2007) Algorithmic Aspects of Triangle-Based Network Analysis. PhD thesis, Computer Science, University of Karlsruhe, Germany.
[17]Apache Hadoop Project.[Online] http://hadoop.apache.org/(Accessed 11 June 2015)
[18]Apache Hama Project. [Online] http://hama.apache.org/(Accessed24June2015)
[19]Apache Spark Project. [Online] http://spark.apache.org/graphx (Accessed 02 August 2015)
[20]Facebook.[Online] http://code.facebook.com/posts (Accessed 18 December 2015) GraphLab. [Online]
[21]http://graphlab.org/(Accessed 21 July 2015)