The Obstacles in Big Data Process

Full Text (PDF, 277KB), PP.28-35

Views: 0 Downloads: 0

Author(s)

Rasim M. Alguliyev 1,* Rena T. Gasimova 2 Rahim N. Abbasli 2

1. Institute of Information Technology of Azerbaijan National Academy of Sciences 9, B. Vahabzade str., Baku, AZ1141, Azerbaijan

2. Institute of Information Technology of Azerbaijan National Academy of Sciences 9, B. Vahabzade str., Baku, AZ1141, Azerbaijan, GoEasy LTD, Canada, Mississauga L5B2N5

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2017.03.04

Received: 10 Sep. 2016 / Revised: 18 Oct. 2016 / Accepted: 16 Jan. 2017 / Published: 8 Mar. 2017

Index Terms

Big data, big data analytics, database, management, NoSQL, MapReduce, Hadoop, cloud, data scientists

Abstract

The increasing amount of data and a need to analyze the given data in a timely manner for multiple purposes has created a serious barrier in the big data analysis process. This article describes the challenges that big data creates at each step of the big data analysis process. These problems include typical analytical problems as well as the most uncommon challenges that are futuristic for the big data only. The article breaks down problems for each step of the big data analysis process and discusses these problems separately at each stage. It also offers some simplistic ways to solve these problems.

Cite This Paper

Rasim M. Alguliyev, Rena T. Gasimova, Rahim N. Abbaslı, "The Obstacles in Big Data Process", International Journal of Modern Education and Computer Science(IJMECS), Vol.9, No.3, pp.28-35, 2017. DOI:10.5815/ijmecs.2017.03.04

Reference

[1]L. Clifford, “Big data: How do your data grow?”, Nature, vol.455, 2008, pp.28–29.
[2]The digital universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Study report, IDC, December 2012. http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf
[3]V. Gopalkrishnan, D. Steier, H. Lewis, J., “Guszcza Big data, big business: bridging the gap” in Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine '12), NY, USA, 2012. pp. 7–11.
[4]S. Madden, “From Databases to Big Data”, IEEE Internet Computing, vol.16, no.3, 2012, pp. 4–6.
[5]K-H. Lee, Y-J. Lee, H. Choi, Y.D. Chung, B. Moon, “Parallel data processing with MapReduce: a survey”, ACM SIGMOD Record, vol.40, no.4, 2011,pp.11–20.
[6]K.H. Lee, Y.J. Lee, H. Choi, Y.D. Chung, B. Moon “Parallel data processing with MapReduce: a survey” ACM SIGMOD Record, 2012, vol. 40, no. 4, pp. 11–20.
[7]Y. Chen, S. Alspaugh, R.H. Katz, “Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads” Proceedings of the VLDB Endowment (PVLDB), 2012, vol. 5, no. 12, pp. 1802–1813.
[8]W. Shang, Z.M. Jiang, H. Hemmati, B. Adams, A.E. Hassan, P. Martin, “Assisting developers of big data analytics applications when deploying on hadoop clouds”, in Proceedings of the 2013 International Conference on Software Engineering (ICSE '13), NJ, USA, 2013, pp.402–411.
[9]C.L.P. Chen, C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data, Information Sciences”, vol. 275, no. 10, 2014, pp. 314–347.
[10]C. Statchuk, M. Iles, F. Thomas, “Big data and analytics”, in Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research (CASCON '13), USA, 2013, pp. 341–343.
[11]V.Mayer-Schonberger, K. Cukier, “Big Data: A Revolution That Will Transform How We Live”, Work and Think, Pub.: John Murray, 2013, p. 256.
[12] R. Birke, M. Björkqvist, L. Y. Chen, E. Smirni, T. Engbersen, “(Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters” in Proceedings of the 12th USENIX conference on File and Storage Technologies (FAST'14), USENIX Association Berkeley, CA, USA, 2014, pp.177–189.
[13]Big Data - What Is It? 2013, http://www.sas.com/big-data/what-is-big-data.html
[14]SAS 9.2 Language Reference: Dictionary 4th Edition, Publisher SAS Institute Inc, Cary, NC, USA, 2011, p. 2356. https://support.sas.com/documentation/cdl/en/lrdict/64316/PDF/default/lrdict.pdf
[15]K. Munir, M. Odeh, R. McClatchey, S. Khan, I. Habib, “Semantic Information Retrieval from Distributed Heterogeneous Data Sources”, Presented at the 4th International Workshop on Frontiers of Information Technology (FIT 2006), Islamabad, Pakistan, 2006, pp. 1– 6. http://arxiv.org/ftp/arxiv/papers/0707/0707.0745.pdf.
[16]O. Leif Katsuo, H. Hao, “World data transfer record back in Danish hands”, Technical University of Denmark (DTU), 2014, online resource,
http://www.dtu.dk/english/News/2014/07/Verdensrekord-i-dataoverfoersel-paa-danske-haender-igen?id=bed76c33-c9da-4214-91f3-c9ed3f8a0e24
[17]A. Cuzzocrea, “Privacy and Security of Big Data: Current Challenges and Future Research Perspectives”, in Proceedings of the First International Workshop on Privacy and Security of Big Data (PSBD '14), NY, USA, 2014, pp. 45–47.
[18] R.T. Gasimova, “Security of global domain infrastructure in the Internet”, Journal Problems of İnformation Technology, "İnformasiya Texnologiyaları" Publishing house, 2015, no. 2, p. 61–67. http://jpit.az/storage/files/article/71c96379ecf1714a60247e0206a0ba4b.pdf
[19]DigiCert is a U.S.-based Certificate Authority. It provides SSL Certificates and SSL management tools, online resource, https://www.digicert.com/ssl.htm
[20]S.V. Stacey, “Big Data creates big industry for storing data”, online resource, http://www.marketplace.org/topics/business/big-data-creates-big-industry-storing-data
[21]Google Inc. Announces Fourth Quarter and Fiscal Year 2013 Results http://investor.google.com/pdf/2013Q4_google_earnings_release.pdf
[22]T. Mastelic, A. Oleksiak, H. Claussen , I. Brandic, J-M. Pierson, V.A. Vasilakos, “Cloud Computing: Survey on Energy Efficiency”, Journal ACM Computing Surveys (CSUR), NY, USA, vol. 47, no.2, 2015, pp. 1–36.
[23]K. Smith, L. Seligman, A. Rosenthal, C. Kurcz, M. Greer, C. Macheret, M. Sexton, A. Eckstein, “"Big Metadata": The Need for Principled Metadata Management in Big Data Ecosystems”, in Proceedings of Workshop on Data analytics in the Cloud (DanaC'14), NY, USA, 2014, pp. 1–4.
[24]R.M. Alguliev, R.T. Gasimova, “Identification of Categorical Registration Data of Domain Names in Data Warehouse Construction Task” Intelligent Control and Automation, vol.4, no.2, 2013, pp. 227–234.
[25]M. L. Haas, “The Power Behind the Throne: Information Integration in the Age of Data-Driven Discovery”, in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15), NY, USA, 2015, p. 661.
[26]Oracle Database Online Documentation 11 g Release 2 (11.2), E10897-10, 2012, Primary Author: Bert Rich. http://docs.oracle.com/cd/E11882_01/server.112/e10897.pdf
[27]D. Lin, A. Squicciarini, “Data protection models for service provisioning in the cloud”, in Proceedings of the 15th ACM symposium on Access control models and technologies (SACMAT '10), NY, USA, 2010, pp.183–192.
[28]M. L. Kaufman, “Data Security in the World of Cloud Computing”, Journal IEEE Security and Privacy, vol.7, no. 4, 2009, pp. 61–64.
[29]C. Marinescu Dan. Cloud Computing: Theory and Practice. Publisher: Morgan Kaufmann, 1 edition, San Francisco, CA, USA, 2013, p. 416.
[30]D. Assunção Marcos, N. Rodrigo, Bianchi Silvia, A.S. Netto Marco, Buyya Rajkumar, “Big Data computing and clouds: Trends and Future Directions” Journal of Parallel and Distributed Computing, vol.79, 2015, p. 3–15.
[31]B. Marr, “Big Data: Using Smart Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance”, Pub. John Wiley & Sons, Ltd.; 1 edition, 2015, p. 258.
[32]D-H. Tran, M.M. Gaber, K-U. Sattler, “Change detection in streaming data in the era of big data: models and issues”, ACM SIGKDD Explorations Newsletter - Special issue on big data, vol. 16, no. 1, 2014, NY, USA, pp. 30–38.
[33]L. Doug, “3D Data Management: Controlling Data Volume, Velocity and Variety”, Technical report, META Group, Inc (now Gartner, Inc.), February 2001, pp.1–3. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
[34]K. Slagter, C-H. Hsu, Y-C. Chung Zhang Daqiang, “An improved partitioning mechanism for optimizing massive data analysis using MapReduce” Journal of Supercomputing, vol.66, no.1, 2013, pp.539–555.
[35]A. Ashraf, B. Shivnath, “Workload management for big data analytics”, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13), NY, USA, 2013, pp. 929–932.
[36]J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A. Hung Byers, “Big data: The next frontier for innovation, competition, and productivity”, Analyst report, McKinsey Global Institute, May 2011. online resource, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
[37]H. Chen, R.H.L. Chiang, V.C. Storey, “Business intelligence and analytics: from big data to big impact”, Journal Management Information Systems Quarterly, vol.36, no.4, 2012, pp.1165–1188.
[38]R. Ramasamy, “Towards big data analytics framework: ICT professionals salary profile compilation perspective”, in Proceedings of the 8th International Conference on Theory and Practice of Electronic Governance (ICEGOV '14), NY, USA, 2014, pp. 450–451.
[39]A. Labrinidis, H. V. Jagadish, “Challenges and Opportunities with Big Data”, Proceedings of the VLDB Endowment, vol. 5, no.12, 2012, pp. 2032–2033.
[40]A. Baaziz, L. Quoniam, “How to use Big Data technologies to optimize operations in Upstream Petroleum Industry”, International Journal of Innovation, 2013, vol. 1, no. 1, pp. 19–29.
[41]K. Karthik, G. Kollias, V. Kumar, A. Grama, “Trends in Big Data analytics” Journal of Parallel and Distributed Computing, 2014, vol. 74, no. 7, pp. 2561–2573.