Integrated Model of DNA Sequence Numerical Representation and Artificial Neural Network for Human Donor and Acceptor Sites Prediction

Full Text (PDF, 497KB), PP.51-57

Views: 0 Downloads: 0

Author(s)

Mohammed Abo-Zahhad Abo-Zeid 1,* Sabah M. Ahmed 1 Shimaa A. Abd-Elrahman 1

1. Electrical and Electronics Engineering Department, Faculty of Engineering, Assiut University, Assiut, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2014.08.07

Received: 15 Nov. 2013 / Revised: 3 Feb. 2014 / Accepted: 12 May 2014 / Published: 8 Jul. 2014

Index Terms

Artificial Neural Network, Exons, Introns, DNA Sequence, Circular Mapping, Donor Site, Acceptor Site

Abstract

Human Genome Project has led to a huge inflow of genomic data. After the completion of human genome sequencing, more and more effort is being put into identification of splicing sites of exons and introns (donor and acceptor sites). These invite bioinformatics to analysis the genome sequences and identify the location of exon and intron boundaries or in other words prediction of splicing sites. Prediction of splice sites in genic regions of DNA sequence is one of the most challenging aspects of gene structure recognition. Over the last two decades, artificial neural networks gradually became one of the essential tools in bioinformatics. In this paper artificial neural networks with different numerical mapping techniques have been employed for building integrated model for splice site prediction in genes. An artificial neural network is trained and then used to find splice sites in human genes. A comparison between different mapping methods using trained neural network in terms of their precision in prediction of donor and acceptor sites will be presented in this paper. Training and measuring performance of neural network are carried out using sequences of the human genome (GRch37/hg19- chr21). Simulation results indicate that using Electron-Ion Interaction Potential numerical mapping method with neural network yields to the best performance in prediction.

Cite This Paper

Mohammed Abo-Zahhad, Sabah M. Ahmed, Shimaa A. Abd-Elrahman, "Integrated Model of DNA Sequence Numerical Representation and Artificial Neural Network for Human Donor and Acceptor Sites Prediction", International Journal of Information Technology and Computer Science(IJITCS), vol.6, no.8, pp.51-57, 2014. DOI:10.5815/ijitcs.2014.08.07

Reference

[1]Limin Fu. Neural Networks in Computer Intelligence. Tata MCGraw-Hill Edition.

[2]Howard D. and Mark B. (2009) “Neural Network Toolbox for Use With MATLAB”, user’s guide, version 6, (www.mathworks.com, last accessed 10 oct. 2013). 

[3]Lapedes, A., et al. Application of Neural Networks and Other Machine Learning Algorithms to DNA Sequence Analysis [A]. The Proceedings of the Interface Between Computations Science and Nucleic Acid Sequencing Workshop, Dec. 1988 in Santa Fe. New Mexico. Eds. G.I. Bell and T.G. Marr. Proceedings of the Santa Fe Institute Addison- Wesley, 1988, VII: 157-182. 

[4]Sqren b. et al. Prediction of human mRNA donor and acceptor sites from the DNA sequence. Technical report, Lyngby, Denmark, 1991.

[5]Artemis H. et al. Functional site prediction on the DNA sequence by artificial neural networks. Intelligence and Systems [C]. IEEE International Joint Symposia, Rockville, MD, 1996, 12-17.

[6]Richard O. D. et al. Pattern Classification (2nd Edition). Wiley-Interscience, 2000.

[7]M. Akhtar. Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction [J]. IEEE Journal of selected topics in signal processing, 2008, 2(3): 310 – 321. 

[8]M. Abo-Zahhad. et al. Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques [J]. I.J. Information Technology and Computer Science, 2012, 4(8): 22-36.

[9]H. K. Kwan and S. B. Arniker. numerical representation of DNA sequences Electro/Information Technology [C]. eit '09. IEEE International Conference , Windsor, 2009, 307-310.

[10]R. F. Voss. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences, Phys. Rev. Lett., 1992, 68(25): 3805–3808.

[11]P. D. Cristea. Conversion of nucleotides sequences into genomic signals. J. Cell. Mol. Med.6, 2002, 279-303.

[12]P. D. Cristea. Genetic signal representation and analysis [C]. in Proc. SPIE Inf. Conf. Biomedical Optics, , 2002, 4623: 77–84.

[13]N. Chakravarthyet al. Autoregressive modeling and feature analysis of DNA sequences [J]. EURASIP Journal of Genomic Signal Processing, 2004, 1, 13-28.

[14]M. Akhtar et al. On DNA numerical representations for period-3 based exon prediction [A]. in Proc. of IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), 2007, 1-4.

[15]S. S. Nair and T. Mahalakshmi. Visualization of genomic data using internucleotide distance signals [C]. in Proceedings of IEEE Genomic Signal Processing, Bucharest, 2005, 11–13.

[16]M. Abo-Zahhad et al. A New Numerical Mapping Technique for Recognition of Exons and Introns in DNA Sequences [C]. The 30th National Radio Science Conference, NTI, Cairo, Egypt, 16th - 18th April 2013, 573-580.

[17]Achuthsankar S. Nair and Sreenadhan S. Pillai. A coding measure scheme employing electron-ion interaction pseudo potential (EIIP) [J]. Bioinformation, 2006, 1, 197-202.

[18]T. Holden et al. ATCG nucleotide fluctuation of Deinococcus radiodurans radiation genes. in Instruments, Methods, and Missions for Astrobiology X [C]. Proceedings of Society of Photo-Optical 2 Instrumentation Engineers (SPIE) Conference, 6694 (SPIE, 12 September 2007), 669417-1–669417-10.

[19]C.-K. Peng, S.V. Buldyrev, A.L. Goldberger, S. Havlin, F. Sciortino, M. Simons, H.E. Stanley, A.L. Goldberger, S. Havlin, C.-K. Peng, H.E. Stanley, G.M. Viswanathan, Analysis of DNA sequences using methods of statistical physics. Physica A, Elsevier Science B.V, 1998, 249: 430-438. 

[20]R. Zhang and C. T. Zhang, Z curves. An Intuitive Tool, for Visualizing and Analyzing the DNA sequences [J]. J. BioMol. Struct. Dyn, 1994, 11: 767-782.

[21]Qystein J. et al. Gene Splice Site Prediction using Artificial Neural Networks. Master's theses (TN-IDE), University of Stavanger, Norway. 2008.

[22]Qystein J. K-means separated neural networks training with application to backgammon evaluations. Technical report, University of Stavanger. 2007.

[23]D Karolchik, AS Hinrichs, TS Furey, KM Roskin, CW Sugnet, D Haussler, WJ Kent. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32(Database issue), 2004, D493-496. 

[24]J Goecks, A Nekrutenko, J Taylor, The Galaxy Team, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, Article R86, 2010 11(8).

[25]D Blankenberg, G Von Kuster, N Coraor, G Ananda, R Lazarus, M Mangan, A Nekrutenko, J Taylor, Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. Chapter 19, Unit 19.10.1-21, 2010.

[26]B. Giardine et al. “Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 200515(10): 1451–1455.