IJCNIS Vol. 4, No. 7, 8 Jul. 2012
Cover page and Table of Contents: PDF (size: 1072KB)
Full Text (PDF, 1072KB), PP.1-11
Views: 0 Downloads: 0
Fail-over, high availability, node-recovery, multilevel checkpointing, checkpointing ratio
Main objective of this research work is to improve the checkpoint efficiency for integrated multilevel checkpointing algorithms and prevent checkpointing from becoming the bottleneck of cloud data centers. In order to find an efficient checkpoint interval, checkpointing overheads has also considered in this paper. Traditional checkpointing methods stores persistently snapshots of the present job state and use them for resuming the execution at a later time. The attention of this research is strategies for deciding when and whether a checkpoint should be taken and evaluating them in regard to minimizing the induced monetary costs. By varying rerun time of checkpoints performance comparisons are which will be used to evaluate optimal checkpoint interval.
The purposed fail-over strategy will work on application layer and provide highly availability for Platform as a Service (PaaS) feature of cloud computing.
Dilbag Singh, Jaswinder Singh, Amit Chhabra, "UML Based Integrated Multilevel Checkpointing Algorithms for Cloud Computing Environment", International Journal of Computer Network and Information Security(IJCNIS), vol.4, no.7, pp.1-11, 2012. DOI:10.5815/ijcnis.2012.07.01
[1]D. Singh, J. Singh, A. Chhabra, "High Availability of Clouds: Fail-over Strategies for Cloud Computing Using Integrated Checkpointing Algorithms," CSNT, pp.698-703, 2012 International Conference on Communication Systems and Network Technologies, 17 May 2012, ISBN: 978-1-4673-1538-8, Rajkot, India. [Online]. Available: http://ieeexplore.ieee.org and http://www.computer.org.
[2]D. Singh, J. Singh, A. Chhabra, "Failures in cloud computing data centers in 3-tier cloud architecture, "Accepted for publication in International Journal of Information Engineering and Electronic Business (IJIEEB), ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online).
[3]D. Singh, J. Singh, A. Chhabra, "Evaluating overheads of integrated multilevel checkpointing algorithms in cloud computing environment," Accepted for publication in International Journal of Computer Network and Information Security (IJCNIS, ISSN: 2074-9090, (Print), ISSN: 2074-9104 (Online).
[4]Y. J. Wen, S. D. Wang, "Minimizing Migration on Grid Environments: An Experience on Sun Grid Engine," National Taiwan University, Taipei, Taiwan Journal of Information Technology and Applications, March, 2007, pp. 297-230.
[5]S. Kalaiselvi, "A Survey of Check-Pointing Algorithms for Parallel and Distributed Computers," Supercomputer Education and Research Centre (SERC), Indian Institute of Science, Bangalore V Rajaraman Jawaharlal Nehru Centre for Advanced Scientific Research, Indian Institute of Science Campus, Bangalore Oct. 2000,pp. 489-510, [Online]. Available: www.ias.ac.in/sadhana/Pdf2000Oct/Pe838.pdf
[6]Reese, G., "Cloud Application Architectures: Building Applications and Infrastructure in the cloud (Theory in Practice)", O'Reilly Media, 1st Ed., 2009 pp 30-46.
[7]R. Koo and S. Toueg, "Checkpointing and rollback-recovery for distributed systems," IEEE Transactions on Software Engineering, vol. 13, no. 1, pp. 23-31, 1987.
[8]"ZXTM for cloud Hosting Providers," Jan. 2010, [Online]. Available: http://www.zeus.com/cloud-computing/for-cloud- providers.html.
[9]K. Stanoevska-Slabeva, T. W. S. Ristol, "Grid and cloud Computing and Applications, A Business Perspective on Technology," 1st Ed., pp. 23-97, 2004
[10]"What Is Apache Hadoop?,"[Last Published:] 12/28/2011 02:56:30, [Online].Available: http://hadoop.apache.org.
[11]"JPPF Work distribution,"[Last Released] 1/31/2012, [Online]. Available: http://www.jppf.org
[12]A. J. Oliner, L. Rudolph, and R. K. Sahoo, "Cooperative Checkpointing: A Robust Approach to Large-Scale Systems Reliability," in ICS 06: Proceedings of the 20th Annual International Conference on Supercomputing, 2006, pp. 14-23.
[13]S. Agarwal, R. Garg, M. S. Gupta, and J. E. Moreira, "Adaptive Incremental Checkpointing for Massively Parallel Systems," in Proceedings of the 18th Annual International Conference on Supercomputing (ICS), 2004, pp. 277-286.
[14]S. I. Feldman and C. B. Brown, "IGOR: A System for Program Debugging via Reversible Execution," in Proceedings of the 1988 ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging (PADD), 1988, pp. 112-123.
[15]N. Naksinehaboon, Y. Liu, C. B. Leangsuksun, R. Nassar, M. Paun, and S. L. Scott, "Reliability-Aware Approach: An Incremental Checkpoint Restart Model in HPC Environments," in Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008, pp. 783-788.
[16]J. D. Sloan, High Performance Linux Clusters With Oscar, Rocks, OpenMosix and Mpi, O'Reilly, Nov.2004, ISBN 10: 0-596- 00570-9 / ISBN 13: 9780596005702, pp. 2-3, [Online]. Available: gec.di.uminho.pt.
[17]Alvisi, Lorenzo and Marzullo, Keith," Message Logging: Pessimistic, Optimistic, Causal, and Optimal," IEEE Transactions on Software Engineering, Vol. 24, No. 2, February 1998, pp. 149-159.
[18]L. Alvisi, B. Hoppe, K. Marzullo, "Nonblocking and Orphan-Free message Logging Protocol," Proc. of 23rd Fault Tolerant Computing Symp., pp. 145-154, June 1993.
[19]A. Agbaria, W. H Sanders,"Distributed Snapshots for Mobile Computing Systems," IEEE Intl. Conf. PERCOM04, pp. 1-10, 2004.
[20]P. Kumar, L. Kumar, R. K. Chauhan, "A Nonintrusive Hybrid Synchronous Checkpointing Protocol for Mobile Systems," IETE Journal of Research, Vol. 52 No. 2&3, 2006.
[21]P. Kumar, "A Low-Cost Hybrid Coordinated Checkpointing Protocol for mobile distributed systems," Mobile Information Systems. pp 13-32, Vol. 4, No. 1, 2007.
[22]L. Kumar, P. Kumar, "A Synchronous Checkpointing Protocol for Mobile Distributed Systems: Probabilistic Approach," International Journal of Information and Computer Security, Vol.1, No.3 pp 298-314.
[23]S. Kumar, R. K. Chauhan, P. Kumar, "A Minimum-process Coordinated Checkpointing Protocol for Mobile Computing Systems," International Journal of Foundations of Computer science,Vol 19, No. 4, pp 1015-1038 (2008).
[24]G. Cao , M. Singhal , "On coordinated checkpointing in Distributed Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no.12, pp. 1213-1225, Dec 1998.
[25]G. Cao , M. Singhal, "On the Impossibility of Minprocess Non-blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems," Proceedings of International Conference on Parallel Processing, pp. 37-44, August 1998.
[26]G. Cao , M. Singhal, "Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing systems," IEEE Transaction On Parallel and Distributed Systems, vol. 12, no. 2, pp. 157-172, February 2001.
[27]Nitin H. Vaidya, "On Checkpoint Latency," Department of Computer Science, Texas A& M University College Station, TX 77843-3112, Technical Report 95-015, March 1995, [Online]. Available: citeseerx. ist.psu.edu.
[28]R. Subramaniyan, R. Scott Studham, and E. Grobelny, "Optimization of checkpointingrelated I/O for high-performance parallel and distributed computing," In Proceedings of The International Conference on Parallel and Distributed Processing Techniques and Applications, pp 937943, 2006.
[29]John W. Young, "A first order approximation to the optimum checkpoint interval," Communications of the ACM, 17(9):530531, 1974.
[30]J. Daly, "A higher order estimate of the optimum checkpoint interval for restart dumps," Future Generation Computer Systems, pp 303312, 2006.
[31]K. Pattabiraman, C. Vick, and AlanWood, "Modeling coordinated checkpointing for large-scale supercomputers," In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN05), pp 812821, Washington, DC, 2005. IEEE Computer Society.