A Novel Distance Metric for Aligning Multiple Sequences Using DNA Hybridization Process

Full Text (PDF, 732KB), PP.40-47

Views: 0 Downloads: 0

Author(s)

Jayapriya J 1,* Michael Arock 1

1. Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2016.06.05

Received: 11 Sep. 2015 / Revised: 20 Dec. 2015 / Accepted: 4 Feb. 2016 / Published: 8 Jun. 2016

Index Terms

Multiple sequence alignment, DNA Hybridization, Sequence alignment, Distance matrix, DNA structure

Abstract

This paper elucidates a new approach for aligning multiple sequences using DNA operations. A new distance metric using DNA hybridization melting temperature that gives approximate solutions for the multiple sequence alignment (MSA) problem is proposed. This paper provides proof for the proposed distance metric using the distance function properties. With this distance metric, a distance matrix is constructed that generates a guide tree for the alignment. Providing an accurate solution in less computational time is considered to be a challenging task for the MSA problem. Developing an algorithm for the MSA problem is essentially a trade-off between finding an accurate solution and that can be completed in less computational time. In order to reduce the time complexity, the Bio-inspired technique called the DNA computing is applied in calculating the distance between the sequences. The main application of this multiple sequence alignment (MSA) is to identify the sub-sequences for the functional study of the whole genome sequences. The detailed theoretical study of this approach is explained in this paper.

Cite This Paper

Jayapriya J, Michael Arock, "A Novel Distance Metric for Aligning Multiple Sequences Using DNA Hybridization Process", International Journal of Intelligent Systems and Applications (IJISA), Vol.8, No.6, pp.40-47, 2016. DOI:10.5815/ijisa.2016.06.05

Reference

[1]J. Thomopson, D.G. Higgins, T.J. Gibson, "ClustalW." Nucleic Acids Res 22, 4673 (1994).
[2]C. Notredame, D.G. Higgins, J. Heringa, "T-Coffee: A novel method for fast and accurate multiple sequence alignment." Journal of molecular biology 302(1), 205 (2000)
[3]C.B. Do, M.S. Mahabhashyam, M. Brudno, S. Batzoglou, “ProbCons: Probabilistic consistency-based multiple sequence alignment." Genome research 15(2), 330 (2005).
[4]K. Katoh, K. Misawa, K.i. Kuma, T. Miyata, "MUSCLE: multiple sequence alignment with high accuracy and high throughput." Nucleic acids research 30(14), 3059 (2002).
[5]B. Morgenstern, Bioinformatics “DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment." 15(3), 211 (1999).
[6]E. Depiereux, E. Feytmans, "MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences." Computer applications in the biosciences: CABIOS 8(5), 501 (1992).
[7]S.R. Eddy, "Profile hidden Markov models." Bioinformatics 14(9), 755 (1998).
[8]P.J. Van Laarhoven, E.H. Aarts, Simulated annealing (Springer, 1987).
[9]C. Notredame, D.G. Higgins, "SAGA: sequence alignment by genetic algorithm." Nucleic acids research 24(8), 1515 (1996).
[10]S. Hosangadi, “Distance measures for sequences” arXiv preprint arXiv: 1208.5713 (2012).
[11]T. Jiang, G. Lin, B. Ma, K. Zhang, “A general edit distance between RNA structures.” Journal of computational biology 9(2), 371 (2002).
[12] R.C. Edgar, "MUSCLE: multiple sequence alignment with high accuracy and high throughput." Nucleic acids research 32(5), 1792 (2004).
[13]Eddy, Sean R., and Burkhard Rost. "A probabilistic model of local sequence alignment that simplifies statistical significance estimation." PLoS Comput Biol4.5 (2008): e1000069.
[14]Y. Zhang, W. Chen, "A new measure for similarity searching in DNA sequences." MATCH Commun. Math. Comput. Chem 65, 477 (2011).
[15]Xu Li, ZhenzhouJi,"Efficient Parallel Design for Edit distance algorithm in DNA Sequence Alignment", IJEM, vol.1, no.4, pp.32-38, (2011).
[16]F. Naznin, R. Sarker, D. Essam, “Vertical decomposition with genetic algorithm for multiple sequence alignment." BMC bioinformatics 12(1), 353 (2011).
[17]G. Garai, B. Chowdhury, Journal of Biophysical Chemistry 3, 201 (2012).
[18]K.D. Nguyen, Y. Pan, “A Knowledge-Based Multiple-Sequence Alignment Algorithm." IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 10(4), 884 (2013).
[19]J. Sun, V. Palade, X. Wu, W. Fang, "Multiple sequence alignment with hidden Markov models learned by random drift particle swarm optimization." Computational Biology and Bioinformatics, IEEE/ACM Transactions on, Vol.11, no.1, pp: 243_257, (2014).
[20]M. Kaya, A. Sarhan, R. Alhajj, “Multiple sequence alignment with affine gap by using multi-objective genetic algorithm." Computer methods and programs in biomedicine 114(1), 38 (2014).
[21]Modzelewski, Michal, and Norbert Dojer. "MSARC: Multiple sequence alignment by residue clustering." Algorithms for Molecular Biology 9.1 (2014).
[22]Arabi E. keshk,"Enhanced Dynamic Algorithm of Genome Sequence Alignments", IJITCS, vol.6, no.6, pp.40-46, 2014. DOI: 10.5815/ijitcs.2014.06.06.
[23]Bodenhofer, Ulrich, et al. "msa: an R package for multiple sequence alignment." Bioinformatics (2015): btv494.
[24]Jayapriya, J., and Michael Arock. "A parallel GWO technique for aligning multiple molecular sequences." Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on. IEEE, 2015.
[25]M. Amos, Theoretical and experimental DNA computation, vol. 4 (Springer, 2005).
[26]J.T. Wang, M.J. Zaki, H.T. Toivonen, D. Shasha, Introduction to data mining in bioinformatics (Springer, 2005).
[27]J. Xiong, Essential bioinformatics (Cambridge University Press, 2006).
[28]R.J. Britten, "Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels." Proceedings of the National Academy of Sciences 99(21), 13633 (2002).