Performance Framework for HPC Applications on Homogeneous Computing Platform

Full Text (PDF, 646KB), PP.28-39

Views: 0 Downloads: 0

Author(s)

Chandrashekhar B. N 1,* Sanjay H. A 1

1. Advance computing research, Nitte Meenakshi Institute of Technology, Bangalore-560064, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2019.08.03

Received: 23 Apr. 2019 / Revised: 30 May 2019 / Accepted: 24 Jun. 2019 / Published: 8 Aug. 2019

Index Terms

Central Processing Unit(CPU), Compute Unified Device Architecture (CUDA), Graphics processing units (GPUs), High Performance computing(HPC), Message passing Interface (MPI), Giga Floating Point Operations Per seconds(GFLOPs)

Abstract

In scientific fields, solving large and complex computational problems using central processing units (CPU) alone is not enough to meet the computation requirement. In this work we have considered a homogenous cluster in which each nodes consists of same capability of CPU and graphical processing unit (GPU). Normally CPU are used for control GPU and to transfer data from CPU to GPUs. Here we are considering CPU computation power with GPU to compute high performance computing (HPC) applications. The framework adopts pinned memory technique to overcome the overhead of data transfer between CPU and GPU. To enable the homogeneous platform we have considered hybrid [message passing interface (MPI), OpenMP (open multi-processing), Compute Unified Device Architecture (CUDA)] programming model strategy. The key challenge on the homogeneous platform is allocation of workload among CPU and GPU cores. To address this challenge we have proposed a novel analytical workload division strategy to predict an effective workload division between the CPU and GPU. We have observed that using our hybrid programming model and workload division strategy, an average performance improvement of 76.06% and 84.11% in Giga floating point operations per seconds(GFLOPs) on NVIDIA TESLA M2075 cluster and NVIDIA QUADRO K 2000 nodes of a cluster respectively for N-dynamic vector addition when compared with Simplice Donfack et.al [5] performance models. Also using pinned memory technique with hybrid programming model an average performance improvement of 33.83% and 39.00% on NVIDIA TESLA M2075 and NVIDIA QUADRO K 2000 respectively is observed for saxpy applications when compared with pagable memory technique.

Cite This Paper

Chandrashekhar B. N, Sanjay H. A, " Performance Framework for HPC Applications on Homogeneous Computing Platform", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.11, No.8, pp. 28-39, 2019. DOI: 10.5815/ijigsp.2019.08.03

Reference

[1]N. P. Karunadasa and D. N. Ranasinghe:"Accelerating High Performance Appli-cations with CUDA and MPI”, 4th international conference on Industrial and in-formation Systemsl 2009. University of Colombo School of Computing.

[2]Qing-kui Chen. and Jia-kang Zhang:‘A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA’,1st International con-ference of Information Science and Engineering, School of Optical-Electrical and Computer Engineeringl 2007.University of Shanghai for Science and Technology Shanghai.

[3]Rong Shi and Khaled Hamidouche Xiaoyi Lu, Karen Tomko, and Dhabaleswar K Ohio State University (2013),‘A Scalable and Portable Approach to Accelerate Hybrid HPL on Heterogeneous CPU-GPU Clusters’,978-1-4799-0898-1/13 2013 IEEE.

[4]TakuroUdagawa and Masakazu Sekijima:‘GPU Accelerated Molecular Dynamics with Method of Heterogeneous Load Balancing’,2015 IEEE International Paral-lel and Distributed Processing Symposium Workshop 978-1-4673-7684-6/15, 2015 IEEE Computer society.

[5]Simplice Donfack,StanimireTomovand,Jack Dongarra:‘Dynamically balanced synchronization-avoiding LU factorization with multi core and GPUs’, 2014 IEEE 28th International Parallel and Distributed Processing Symposium Workshops 978-1-4799-4116-2/14 IEEE Computer society. 2014. Innovative Computing Labora-tory, University of Tennessee, Knoxville, USA.

[6]Mohammed Sourouri,Johannes Langguth, FilippoSpiga, Scott B. Badenand Xing Cai,Simula:‘CPU+GPU Programming of Stencil Computations for Resource E - cient Use of GPU Clusters’, 2015 IEEE 18th International Conference on Com-putational Science and Engineering IEEE Computer Society78-1-4673-8297-7/15 2015 IEEE.

[7]Ashwin M, Aji, Lokendra S. Panwar, Feng Ji, Karthik Murthy, MilindCh-abbi,PavanBalaji,Keith R. Bisset, James Dinan, Wu-chunFeng,John Mellor-Crummey, Xiaosong Ma, and Rajeev Thakur:‘MPI-ACC: Accelerator-Aware MPI for Scienti c Applications‘, IEEE Transactions on Parallel and Distributed Systems VOL 27 NO 5 1045-9219 (c) MAY 2016 IEEE.

[8]TarunBeri ,Sorav Bansal and Subodh Kumar Indian Institute of Technology Delhi:‘A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators‘, 29th International Parallel and Distributed Processing Symposium 1530-2075/152015 IEEE computer society.

[9]Gurung A, Das B, and Rajarsh. :Simultaneous Solving of Linear Programming Problems in GPU‘,in Proc. of IEEE HIPC 2015 Conference: Student Research Symposium on HPC, Vol. 8, Bengaluru, India, pp. 1-5..

[10]Lang, J. and Runger, G:‘Dynamic distribution of workload between CPU and GPU for a parallel conjugate gradient method et. al [5]in an adaptive FEM’,ICCS 2013 Conf., Procedia Computer Science, 18, 299-308.

[11]Lee J, Samadi, M. Park, Y. and Mahlke S:‘Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems’, in Proc. of the 22nd Inter-national Conference on Parallel Architectures and Compilation Techniques, PACT ’13, pp. 245-256. 2013.

[12]Rabenseifner R, Hager G.and Jost G.:‘Hybrid MPI and OpenMP Parallel Programming’, Supercomputing 2013 Conference, Nov 17-22, Denver, USA, Tutorial,http://openmp.org/ wp/sc13-tutorial-hybrid-mpi-and-openmp-parallel-programming.

[13]Yang C.T., Huang C.L., and Lin C.F. (2011).‘Hybrid CUDA, OpenMP, and MPI parallel programming on multi core GPU Clusters’, Computer Physics Communi-cations, 182, 266-269.

[14]Lu, Fengshun et al. ‘Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters’, Computer physics communications 183.6 (2012): 1172-1181.

[15]Yang, Chao-Tung,Chih-Lin Huang, and Cheng-Fang Lin. ‘Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters’, Computer Physics Communications 182.1 (2011): 266-269.

[16]Noaje, Gabriel, Michael Krajecki, and Christophe Jaillet. ‘MultiGPU comput-ing using MPI or OpenMP’, Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on. IEEE, 2010.