Analyzing Cost Parameters Affecting Map Reduce Application Performance

Full Text (PDF, 576KB), PP.50-58

Views: 0 Downloads: 0

Author(s)

N.K. Seera 1,* S. Taruna 1

1. Banasthali Vidyapeeth, Jaipur, INDIA

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2016.08.06

Received: 3 Aug. 2015 / Revised: 11 Dec. 2015 / Accepted: 17 Feb. 2016 / Published: 8 Aug. 2016

Index Terms

Map-Reduce, Hadoop, Cost Parameters, Cost-Optimizer

Abstract

Recently, big data analysis has become an imperative task for many big companies. Map-Reduce, an emerging distributed computing paradigm, is known as a promising architecture for big data analytics on commodity hardware. Map-Reduce, and its open source implementation Hadoop, have been extensively accepted by several companies due to their salient features such as scalability, elasticity, fault-tolerance and flexibility to handle big data. However, these benefits entail a considerable performance sacrifice. The performance of a Map-Reduce application depends on various factors including the size of the input data set, cluster resource settings etc. A clear understanding of the factors that affect Map-Reduce application performance and the cost associated with those factors is required. In this paper, we study different performance parameters and an existing Cost Optimizer that computes the cost of Map-Reduce job execution. The cost based optimizer also considers various configuration parameters available in Hadoop that affect performance of these programs. This paper is an attempt to analyze the Map-Reduce application performance and identifying the key factors affecting the cost and performance of executing Map-Reduce applications.

Cite This Paper

N.K. Seera, S. Taruna, "Analyzing Cost Parameters Affecting Map Reduce Application Performance", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.8, pp.50-58, 2016. DOI:10.5815/ijitcs.2016.08.06

Reference

[1]K. Wottrich, Thomas Bressoud, “The Performance Characteristics of Map-Reduce applications on scalable clusters”, MCURCSM 2011.

[2]S. Babu, “Towards Automatic Optimization of MapReduce Programs”, in SOCC, pages 137-142, 2010H. 

[3]Herodotou et. Al, Starfish: A Self Tuning System for Big Data Analytics, 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11) January 912, 2011, Asilomar, California, USA.

[4]H. Herodotou and S. Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. PVLDB, 4, 2011.

[5]Horodotou, “Hadoop Performance Model”, Technical Report, CS-2011-05, CS Department, Duke University

[6]Rong Gu et al, “SHadoop: Improving Map Reduce performance by optimizing job execution mechanism is Hadoop cluster”, Journal of Parallel and Distributed Computing, Elsevier, Vol 74 Issue 3, March 2014, pg 2166-2179.

[7]Chang, M. Kodialam, R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in mapreduce-like systems for fast completion time,” in Proc. IEEE INFOCOM’11, Shanghai, China, 2011.

[8]H. Herodotou, F. Dong, S. Babu, MapReduce Programming and Cost based Optimization? Crossing this Chasm with Starfish, Proceedings of the VLDB Endowment, 21508097/11/08, Vol. 4, No. 12, 2011

[9]Narinder, S. Taruna, “Efficient data layouts for cost optimized Map-Reduce operations”, Proceedings of INDIACom 2015, BVICAM, Delhi.

[10]Arun C Murthy, ”Programming Hadoop Map-Reduce”, Yahoo CCD, ApacheCon US 2008.

[11]D. Borthakur, “The Hadoop Distributed File System: Architecture and design”, Apache Software Foundation, 2007.

[12]K. Lee, Y. LeeH. Choi, Y. Chung, B. Moon, “Parallel data processing with Map Reduce: A Survey”, SIGMOD Record, December 2011 (Vol. 40, No. 4).

[13]Dittrich, Jens, J. Arnulfo, "Efficient big data processing in Hadoop MapReduce." Proceedings of the VLDB Endowment 5.12 (2012): 2014-2015.

[14]J. Tan, S. Meng, X. Meng and Li Zhang, “Improving ReduceTask Data Locality for Sequential  MapReduce Jobs”, in Proc. IEEE INFOCOM’13, Turin, Italy, 2013.

[15]H. Chang, M. Kodialam, R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in mapreduce-like systems for fast completion time,” in Proc. IEEE INFOCOM’11, Shanghai, China, 2011.

[16]] C. Doulkeridis, K. Norvag, “A Survey of Large Analytical Query Processing in Map-Reduce”, the VLDB Journal.

[17]A. Floratou et al, “Column-Oriented Storage Techniques for Map-Reduce”, In proceedings of VLDB Endowment, Vol 4, No. 7, 2011.