A Proposal for High Availability of HDFS Architecture based on Threshold Limit and Saturation Limit of the Namenode

Full Text (PDF, 573KB), PP.27-34

Views: 0 Downloads: 0

Author(s)

Sabyasachi Chakraborty 1,* Kashyap Barua 1 Manjusha Pandey 1 Siddharth Rautaray 1

1. School of Computer Engineering, KIIT University, Bhubaneswar, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2017.06.04

Received: 2 Jun. 2017 / Revised: 14 Jul. 2017 / Accepted: 7 Aug. 2017 / Published: 8 Nov. 2017

Index Terms

Hadoop, Namenode, Datanode, Big Data, Architecture

Abstract

Big Data which is one of the newest technologies in the present field of science and technology has created an enormous drift of technology to a salient data architecture. The next thing that comes right after big data is Hadoop which has motivated the complete Big Data Environment to its jurisdiction and has reinforced the complete storage and analysis of big data. This paper discusses a hierarchical architecture of Hadoop Nodes namely Namenodes and Datanodes for maintaining a High Availability Hadoop Distributed File System. The High Availability Hadoop Distributed File System architecture establishes itself onto the two fundamental model of Hadoop that is Master-Slave Architecture and elimination of single point node failure. The architecture will be of such utilization that there will be an optimum load on the data nodes and moreover there will be no loss of any data in comparison to the size of data.

Cite This Paper

Sabyasachi Chakraborty, Kashyap Barua, Manjusha Pandey, Siddharth Rautaray, "A Proposal for High Availability of HDFS Architecture based on Threshold Limit and Saturation Limit of the Namenode", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.9, No.6, pp. 27-34, 2017. DOI:10.5815/ijieeb.2017.06.04

Reference

[1]Azzedin, Farag. "Towards a scalable HDFS architecture." Collaboration Technologies and Systems (CTS), 2013 International Conference on. IEEE, 2013.
[2]Wang, Xin, and Jianhua Su. "Research of distributed data store based on hdfs." Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on. IEEE, 2013.
[3]Apache Software Foundation, “Quorum Journal Manager “, https://hadoop.apache.org/docs/r2.7.1/hadoop-project- dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
[4]Kim, Yonghwan, et al. "A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System." Reliable Distributed Systems (SRDS), 2014 IEEE 33rd International Symposium on. IEEE, 2014.
[5]Apache Software Foundation, “Centralized Cache Management in HDFS,” http://hadoop.apache.org/docs/r2.3.0/hadoop-projectdist/ hadoop-hdfs/CentralizedCacheManagement.html.
[6]Tantisiriroj, Wittawat, et al. "On the duality of data- intensive file system design: reconciling HDFS and PVFS." Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis.ACM,2011.
[7]Zaharia, Matei, et al. "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association,2012.
[8]Demchenko, Yuri, Cees De Laat, and Peter Membrey. "Defining architecture components of the Big Data Ecosystem." Collaboration Technologies and Systems (CTS), 2014 International Conference on. IEEE, 2014.
[9]Stamatakis, Dimokritos, et al. "A General-Purpose Architecture for Replicated Metadata Services in Distributed File Systems." IEEE Transactions on Parallel and Distributed Systems (2017).
[10]Islam, Nusrat Sharmin, et al. "Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture." Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. IEEE, 2015.
[11]Jena, Bibhudutta, et al. "A Survey Work on Optimization Techniques Utilizing Map Reduce Framework in Hadoop Cluster." International Journal of Intelligent Systems and Applications 9.4 (2017): 61.
[12]Stoica, Ion, et al. "Chord: a scalable peer-to-peer lookup protocol for internet applications." IEEE/ACM Transactions on Networking (TON) 11.1 (2003): 17-32.
[13]Karger, David, et al. "Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web." Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. ACM, 1997.
[14]Ananthanarayanan, Ganesh, et al. "PACMan: Coordinated memory caching for parallel jobs." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
[15]Zulkarnain, Novan, and Muhammad Anshari. "Big data: Concept, applications, & challenges." Information Management and Technology (ICIMTech), International Conference on. IEEE, 2016.
[16]Fetjah, Laila, et al. "Toward a Big Data Architecture for Security Events Analytic." Cyber Security and Cloud Computing (CSCloud), 2016 IEEE 3rd International Conference on. IEEE, 2016.
[17]Demchenko, Yuri, et al. "Addressing big data issues in scientific data infrastructure." Collaboration Technologies and Systems (CTS), 2013 International Conference on. IEEE, 2013.
[18]Ramaprasath, Abhinandan, Anand Srinivasan, and Chung- Horng Lung. "Performance optimization of big data in mobile networks", 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), 2015.
[19]Mayank Bhushan , Monica Singh , Sumit K Yadav ," Big Data query optimization by using Locality Sensitive Bloom Filter ",IJCT, 2015.
[20]E. Yildirim, J. Kim, and T. Kosar, “Optimizing the sample size for a cloud-hosted data scheduling service,” in Proc. 2nd Int. Workshop Cloud Computing. Sci. Appl., 2012.
[21]Shvachko, Konstantin V. "HDFS Scalability: The limits to growth." ; login:: the magazine of USENIX & SAGE 35.2 (2010): 6-16 .
[22]The Apache Software Foundation, “The Apache Hadoop Project,” http://hadoop.apache.org/.
[23]Fischer, Michael J., Nancy A. Lynch, and Michael S. Paterson. "Impossibility of distributed consensus with one faulty process." Journal of the ACM (JACM) 32.2 (1985): 374-382.
[24]Zaharia, Matei, et al. "Resilient distributed datasets: A fault- tolerant abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
[25]Islam, Nusrat Sharmin, et al. "In-memory i/o and replication for hdfs with memcached: Early experiences." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014.
[26]Gray, Cary, and David Cheriton. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. Vol. 23. No. 5. ACM, 1989.
[27]D. Liben-Nowell, H. Balakrishnan, and D. R. Karger, ― Analysis of the evolution of peer-to-peer systems,‖ in Proc. 21st ACM Symp. Principles of Distributed Computing (PODC), Monterey, CA, July 2002, pp. 233 – 242.
[28]The Forrester Wave: Big Data Predictive Analytics Solutions, Q1 2013. Mike Gualtieri, January 13, 2013. [Online]. Available: http://www.forrester.com/pimages/rws/reprints/document/8 5601/oid/1-LTEQDI
[29]Morton, Guy M. A computer oriented geodetic data base and a new technique in file sequencing. New York: International Business Machines Company, 1966.
[30]Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
[31]Raji, R. Pillai. "MapReduce: Simplified Data Processing On Large Clusters." (2009).
[32]Jens Dittrich Jorge-Arnulfo Quian´eRuiz, ”Efficient Big Data Processing in Hadoop MapReduce.”