Md. Nowraj Farhan

Work place: Department of Computer Science & Engineering, University of Liberal Arts Bangladesh, Dhaka, 1209, Bangladesh

E-mail: nowraj.farhan@gmail.com

Website:

Research Interests: Software Engineering, Data Mining, Data Structures and Algorithms

Biography

Md. Nowraj Farhan received his Bachelor of Science in Computer Science and Engineering from University of Liberal Arts Bangladesh in 2015. He is pursuing M.Sc in the area of software Engineering/ Information & Communication Systems. His main research interests include Software Engineering, Data Mining and Big Data Analytic.

Author Articles
A study and Performance Comparison of MapReduce and Apache Spark on Twitter Data on Hadoop Cluster

By Md. Nowraj Farhan Md. Ahsan Habib Md. Arshad Ali

DOI: https://doi.org/10.5815/ijitcs.2018.07.07, Pub. Date: 8 Jul. 2018

We explore Apache Spark, the newest tool to  analyze big data, which lets programmers perform in-memory computation on large data sets in a fault tolerant manner. MapReduce is a high-performance distributed BigData programming framework which is highly preferred by most big data analysts and is out there for a long time with a very good documentation. The purpose of this project was to compare the scalability of open-source distributed data management systems like Apache Hadoop for small and medium data sets and to compare it’s performance against the Apache Spark, which is a scalable distributed in-memory data processing engine. To do this comparison some experiments were executed on data sets of size ranging from 5GB to 43GB, on both single machine and on a Hadoop cluster. The results show that the cluster outperforms the computation of a single machine by a huge range. Apache Spark outperforms MapReduce by a dramatic margin, and as the data grows Spark becomes more reliable and fault tolerant. We also got an interesting result that, with the increase of the number of blocks on the Hadoop Distributed File System, also increases the run-time of both the MapReduce and Spark programs and even in this case, Spark performs far more better than MapReduce. This demonstrates Spark as a possible replacement of MapReduce in the near future.

[...] Read more.
Other Articles