Comparative Analysis of Multiple Sequence Alignment Tools

Full Text (PDF, 382KB), PP.24-30

Views: 0 Downloads: 0

Author(s)

Eman M. Mohamed 1,* Hamdy M. Mousa 1 Arabi E. keshk 1

1. Faculty of Computers and Information, Menoufia University, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2018.08.04

Received: 24 Apr. 2018 / Revised: 11 Jun. 2018 / Accepted: 7 Jul. 2018 / Published: 8 Aug. 2018

Index Terms

Multiple Sequence Alignment, Accuracy, Progressive Alignment, Iterative alignment, and Bioinformatics

Abstract

The perfect alignment between three or more sequences of Protein, RNA or DNA is a very difficult task in bioinformatics. There are many techniques for alignment multiple sequences. Many techniques maximize speed and do not concern with the accuracy of the resulting alignment. Likewise, many techniques maximize accuracy and do not concern with the speed. Reducing memory and execution time requirements and increasing the accuracy of multiple sequence alignment on large-scale datasets are the vital goal of any technique. The paper introduces the comparative analysis of the most well-known programs (CLUSTAL-OMEGA, MAFFT, BROBCONS, KALIGN, RETALIGN, and MUSCLE). For programs’ testing and evaluating, benchmark protein datasets are used. Both the execution time and alignment quality are two important metrics. The obtained results show that no single MSA tool can always achieve the best alignment for all datasets.

Cite This Paper

Eman M. Mohamed, Hamdy M. Mousa, Arabi E. keshk, "Comparative Analysis of Multiple Sequence Alignment Tools", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.8, pp.24-30, 2018. DOI:10.5815/ijitcs.2018.08.04

Reference

[1]S. B. N. a. C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence", Journal of Molecular Biology, Vol. 48, No. 3, pp. 443–453, 1970.

[2]T. F. S. a. M. S. Waterman, "Identification of common molecular subsequences", Journal of Molecular Biology, vol. 147, no. 1, pp. 195–197, 1981.

[3]G. B. a. D. G. H. I. M. Wallace, "Multiple sequence alignments," Current Opinion in Structural Biology, Vol. 15, No. 3, pp. 261-266, 2005.

[4]Arabi E. keshk,"Enhanced Dynamic Algorithm of Genome Sequence Alignments", IJITCS, vol.6, no.6, pp.40-46, 2014. DOI: 10.5815/ijitcs.2014.06.06.

[5]Jayapriya J, Michael A," A Novel Distance Metric for Aligning Multiple Sequences Using DNA Hybridization Process", International Journal of Intelligent Systems and Applications (IJISA), Vol.8, No.6, pp.40-47, 2016.

[6]Xu Li, ZhenzhouJi,"Efficient Parallel Design for Edit distance algorithm in DNA Sequence Alignment", IJEM,  Vol. 1, No.4, pp.32-38, 2011.

[7]Gupta, S. K., Kececioglu, J. D. and Schaffer, A. A. “Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment”, J. Comput. Biol., Vol. 2, pp. 459–472, 1995.

[8]S. Altschul, "Gap costs for multiple sequence alignment", J. Theor. Biol, Vol. 138, pp. 297–309, 1989.

[9]W. J. W. a. D. J. Lipman, " Rapid similarity searches of nucleic acid and protein data banks", Proceedings of the National Academy of Sciences of the United States of America, Vol. 80, No. 3, pp. 726-730, 1983.

[10]D.-F. F. a. R. F. Doolittle, "Progressive sequence alignment as a prerequisite to correct phylogenetic trees", Journal of Molecular Evolution, Vol. 25, No. 4, pp. 351-360, 1987.

[11]D. G. H. a. T. J. G. J. D. Thompson, "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties, and weight matrix choice", Nucleic Acids Research, Vol. 22, No. 22, pp. 4673–4680, 1994.

[12]F. Sievers, A. Wilm, D. Dineen,T. J. Gibson, K. Karplus, W. Li, R. Lopez, H. McWilliam, M. Remmert, J. Soding, J.D. Thompson and D.G. Higgins, “Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega”, Molecular Systems Biology 7: 539, 2011.

[13]K. K. a. D. M. Standley, "MAFFT multiple sequence alignment software version 7: improvements in performance and usability", Molecular Biology and Evolution, Vol. 30, No. 4, pp. 772–780, 2013.

[14]T. L. a. E. L. L. Sonnhammer, "Kalign—an accurate and fast multiple sequence alignment algorithm", BMC Bioinformatics, Vol. 6:298, 2005.

[15]R. C. Edgar, "MUSCLE: a multiple sequence alignment method with reduced time and space complexity", BMC Bioinformatics, Vol. 5:113, 2004.

[16]M. C.B. Do, "ProbCons: probabilistic consistency-based multiple sequence alignment," Genome Research, Vol. 15, No. 2, pp. 330-340, 2005.

[17]Adrienn Szabó, Ádám Novák, István Miklós, Jotun Hein,” Reticular alignment: A progressive corner-cutting method for multiple sequence alignment”, BMC Bioinformatics, Vol. 11:570, 2010.

[18]Hirosawa,M., Totoki,Y., Hoshida,M. and Ishikawa,M., “Comprehensive study on iterative algorithms of multiple sequence alignment”, CABIOS, Vol. 11, pp. 13–18, 1995.

[19]S. M. I. X. e. a. P. Di Tommaso, " T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension", Nucleic Acids Research, Vol. 39, No. 2, pp. w13-w17, 2011.

[20]K. P. R. R. P. O. Thompson JD, "BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark", Proteins, Vol. 61, No. 1, pp. 36-127, 2005.

[21]J. P. F. P. O. Thompson,"BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs", Bioinformatics, Vol. 15, No. 1, pp. 87-88, 1999.

[22]J. Thompson, F. Plewniak, et al., “BAliBASE: A Comprehensive comparison of multiple alignment programs”, Nucleic Acids Research, Vol. 27(13), pp. 2682-2690, 1999.

[23]J.D. Thompson, B. Linard, D. Lecompte and O. Poch, “A comprehensive benchmark study of multiple sequence alignment methods: Current challenges and future perspectives”, PLoS ONE, Vol. 6, pp. 1-14, 2011.