A Novel Circular Mapping Technique for Spectral Classification of Exons and Introns in Human DNA Sequences

Full Text (PDF, 540KB), PP.19-29

Views: 0 Downloads: 0

Author(s)

Mohammed Abo-Zahhad Abo-Zeid 1,* Sabah M. Ahmed 1 Shimaa A. Abd-Elrahman 1

1. Electrical and Electronics Engineering Department, Faculty of Engineering, Assiut University, Assiut, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2014.04.02

Received: 21 Jun. 2013 / Revised: 15 Nov. 2013 / Accepted: 13 Jan. 2014 / Published: 8 Mar. 2014

Index Terms

Genome, Codon, Exons, Introns, DNA sequence, Circular Mapping

Abstract

Signals that represent information may be classified into two forms: numeric and symbolic. Symbolic signals such as DNA symbolic sequences cannot be directly processed with digital signal processing (DSP) techniques. The only way to apply DSP in genomic field is the mapping of DNA symbolic sequences to numerical sequences. Hence, biological properties are reflected in a numerical domain. This opens a field to present a set of tools for solving genomic problems. In literature many techniques have been developed for numerical representation of DNA sequences. The main drawback of these techniques is that each nucleotide is represented by a numerical value depending on nucleotide type only ignoring its position in codon and DNA sequence. In this paper a new approach for DNA symbolic to numeric representation called Circular Mapping (CM) is introduced. It’s based on graphical representation of DNA sequence that maps each nucleotide by a complex numerical value depending not only on nucleotide type but also on its position in codons. The main applications of this method are the gene prediction that aims to locate the protein-coding regions and the classification of exons and introns in DNA sequences. The proposed approach showed significant improvement in exons and introns classification as compared with the existing techniques. The efficiency of this method in classification depends on the right choice of the mapping angle (θ) as indicated by the power spectral analysis results over the sequences of the human genome (GRch37/hg19).

Cite This Paper

Mohammed Abo-Zahhad, Sabah M. Ahmed, Shimaa A. Abd-Elrahman, "A Novel Circular Mapping Technique for Spectral Classification of Exons and Introns in Human DNA Sequences", International Journal of Information Technology and Computer Science(IJITCS), vol.6, no.4, pp.19-29, 2014. DOI:10.5815/ijitcs.2014.04.02

Reference

[1]D. Anastassiou. Genomic signal processing [M]. IEEE Signal Processing Magazine, 2001, 18: 8–20.

[2]M. Akhtar, Julien Epps, and E. Ambikairajah. Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction [J].IEEE Journal of selected topics in signal processing, June 2008, 2: 310 – 321.

[3]M. Abo-Zahhad, S. M. Ahmed and Sh. A. Abd-Elrahman. Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques [J]. I.J. Information Technology and Computer Science (IJITCS), July 2012, 4: 22-36.

[4]H. K. Kwan and S. B. Arniker. numerical representation of DNA sequences, Electro/Information Technology [C]. 2009. eit '09. IEEE International Conference , Windsor, 2009 307-310.

[5]R. F. Voss. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett., 1992, 68: 3805–3808.

[6]P. D. Cristea. Conversion of nucleotides sequences into genomic signals [J]. J. Cell. Mol. Med., April-June 2002, 6:279-303.

[7]P. D. Cristea. Genetic signal representation and analysis [C]. in Proc. SPIE Inf. Conf. Biomedical Optics, 2002, 77–84.

[8]N. Chakravarthy, A. Spanias, L. D. Lasemidis, and K. Tsakalis. Autoregressive modeling and feature analysis of DNA sequences [J]. EURASIP Journal of Genomic Signal Processing, January 2004, 1: 13-28.

[9]M. Akhtar, J. Epps, and E. Ambikairajah. On DNA numerical representations for period-3 based exon prediction [A]. in Proc. of IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), June 2007, 1-4.

[10]A. S. S. Nair and T. Mahalakshmi. Visualization of genomic data using internucleotide distance signals [C]. in Proc. IEEE Inter. Conf. on Genomic Signal Processing, 2005.

[11]Achuthsankar S. Nair and Sreenadhan S. Pillai. A coding measure scheme employing electron-ion interaction pseudo potential (EIIP) [J]. Bioinformation, October 2006, 1:197-202.

[12]Todd Holden, R. Subramaniam, R. Sullivan, E. Cheng, C. Sneider, G.Tremberger, Jr. A. Flamholz, D. H. Leiberman, and T. D. Cheung. ATCG nucleotide fluctuation of Deinococcus radiodurans radiation genes. in Proc. of Society of Photo-Optical Nature , San Diego, CA, USA, 356 (1992) 168.

[13]C.-K. Peng, S.V. Buldyrev, A.L. Goldberger, S. Havlin, F. Sciortino, M. Simons, H.E. Stanley, A.L. Goldberger, S. Havlin, C.-K. Peng, H.E. Stanley, G.M. Viswanathan, Analysis of DNA sequences using methods of statistical physics. Physica A, Elsevier Science B.V, 1998, 249: 430-438.

[14]R. Zhang and C. T. Zhang, Z curves. An Intuitive Tool, for Visualizing and Analyzing the DNA sequences [J]. J. BioMol. Struct. Dyn, 1994, 11: 767-782.

[15]M. Yan, Z. S. Lin, and C. T. Zhang. A new Fourier transform approach for protein coding measure based on the format of Z curve [J]. Bioinformatics, 1998, 14: 685–690.

[16]R. Kakumani, V. Devabhaktuni, and M.O. Ahmad. Prediction of protein-coding regions in DNA sequences using a model-based approach. IEEE International Symposium on Circuits and Systems, Seattle, WA, 2008, 1918-1921.

[17]M. Akhtar, E. Ambikairajah, and J. Epps. Optimizing period-3 methods for eukaryotic gene prediction [C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Digital Object Identifier: 10.1109/ICASSP.2008.4517686, Las Vegas, NV, 2008, 621 - 624.

[18]M.K. Hota, and V.K. Srivastava. DSP technique for gene and exon prediction taking complex indicator sequence [C]. IEEE Region 10 Conference, 2008, 1 - 6.

[19]D.G. Grandhi, and C. V. Kumar. 2-Simplex mapping for identifying the protein coding regions in DNA [C]. IEEE region conference (TENCON), Jan 2008, 1-3.

[20]R. Gupta, A. Mittal, K. Singh, P. Bajpai, and S. Prakash. A Time Series Approach for Identification of Exons and Introns [C]. 10th International Conference on Information Technology, Digital Object Identifier: 10.1109/ICIT.2007.54, 2007, 91 - 93.

[21]J. Y. Y. Kwan, B. Y. M. Kwan and H. K. Kwan. Spectral analysis of numerical exon and intron sequences [C]. Proceedings of IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) Hong Kong, 2010, 876-877.

[22]J. Y. Y. Kwan, B. Y. M. Kwan and H. K. Kwan. Novel methodologies for spectral classification of exon and intron sequences [J]. EURASIP Journal on Advances in Signal Processing, 2012.

[23]D Karolchik, AS Hinrichs, TS Furey, KM Roskin, CW Sugnet, D Haussler, WJ Kent. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32(Database issue), 2004, D493-496. 

[24]J Goecks, A Nekrutenko, J Taylor, The Galaxy Team, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, Article R86, 2010 11(8).

[25]D Blankenberg, G Von Kuster, N Coraor, G Ananda, R Lazarus, M Mangan, A Nekrutenko, J Taylor, Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. Chapter 19, Unit 19.10.1-21, 2010.

[26]B. Giardine et al. “Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 200515(10): 1451–1455.