A Hybrid Dimensionality Reduction Model for Classification of Microarray Dataset

Full Text (PDF, 399KB), PP.57-63

Views: 0 Downloads: 0

Author(s)

Micheal O. Arowolo 1,* Sulaiman O. Abdulsalam 1 Rafiu M. Isiaka 1 Kazeem A. Gbolagade 1

1. Kwara State University, Department of Computer Science, Malete, Nigeria

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2017.11.06

Received: 17 Jun. 2017 / Revised: 6 Aug. 2017 / Accepted: 18 Sep. 2017 / Published: 8 Nov. 2017

Index Terms

Dimensionality Reduction, One-Way-ANOVA, PCA, PLS, SVM

Abstract

In this paper, a combination of dimensionality reduction technique, to address the problems of highly correlated data and selection of significant variables out of set of features, by assessing important and significant dimensionality reduction techniques contributing to efficient classification of genes is proposed. One-Way-ANOVA is employed for feature selection to obtain an optimal number of genes, Principal Component Analysis (PCA) as well as Partial Least Squares (PLS) are employed as feature extraction methods separately, to reduce the selected features from microarray dataset. An experimental result on colon cancer dataset uses Support Vector Machine (SVM) as a classification method. Combining feature selection and feature extraction into a generalized model, a robust and efficient dimensional space is obtained. In this approach, redundant and irrelevant features are removed at each step; classification presents an efficient performance of accuracy of about 98% over the state of art.

Cite This Paper

Micheal O. Arowolo, Sulaiman O. Abdulsalam, Rafiu M. Isiaka, Kazeem A. Gbolagade, "A Hybrid Dimensionality Reduction Model for Classification of Microarray Dataset", International Journal of Information Technology and Computer Science(IJITCS), Vol.9, No.11, pp.57-63, 2017. DOI:10.5815/ijitcs.2017.11.06

Reference

[1]P. Veerabhadrappa and R. Lalitha, “Bi-level dimensionality reduction methods using feature selection and feature extraction”. IJCA vol. 4, pp. 33-38, 2010.

[2]H. C. Austin, H.L. Chia, and H.C. Chih, “New Approaches to Improve the Performance of Disease Classification Using Nested-Random Forest and Nested-Support Vector Machine Classifiers”. RNIS. Vol. 14. pp.105, 2013.

[3]A.H. Chen, and M. Lee, “Novel Approaches for the Prediction of Cancer Classification,” IJACT, vol. 3, pp. 30-39, 2011.

[4]M.M. Jazzar, and G. Muhammad, “Feature Selection Based Verification /Identification System Using Fingerprints and Palm Print. Arabian Journal for Science and Engineering,” Vol. 38, pp.849-857, 2013.

[5]Q. Shen, R. Diao, and P. Su, “Feature Selection Ensemble. In: proceedings of Computing,” Springer-Verlag, pp. 289-306, 2011.

[6]J.W. Han, and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann Publishers,  2006.

[7]J. Zhang, and H. Deng, “Gene selection for classification of microarray data based on the Bayes error,” BMC Bioinformatics, Vol. 8, No. 1, pp. 370, 2007.

[8]M. Abeer, A. Basma, “A Hybrid Reduction Approach for Enhancing Cancer Classification of Microarray Data,” IJARAI, Vol. 3, 2014.

[9]M. Zena, and F.G. Duncan, “A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data,” 2015.

[10]M. Vaidya, and P.S. Kulkar ni, “Innovative Technique for Gene Selection in Microarray Based on Recursive Cluster Elimination and Dimension Reduction for Cancer Classification,” IJIRAE, pp.209-213, 2014. 

[11]U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Nat. Acad. Sci. USA, Vol. 96, pp. 6745–6750, 2001. 

[12]O.E. Nadir, I. Othman, and H.O. Ahmed, “A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification,” Research Journal of Applied Sciences, Engineering and Technology, Vol. 7, No. 3, pp. 625-638, 2014.

[13]Z. Xue-Qiang and L. Guo-Zheng, “Dimension Reduction for p53 Protein Recognition by Using Incremental Partial Least Squares,” IEEE, Vol. 13, No. 4, pp. 73-79, 2014.

[14]G. Isabelle, W. Jason, B. Stephen and V. Vapnik., “Gene selection for cancer classification using support vector machines,” Mach. Learn. Vol. 46,  Pp. 389-422. 2002.

[15]A. Zainal, "Relation between eye movement and fatigue: Classification of morning and afternoon measurement based on Fuzzy rule", International Conference on Instrumentation Communication Information Technology and Biomedical Engineering pp. 1-6, 2009.

[16]L. Songlu and O. Sejong, “Improving Feature Selection Performance Using pairwise Pre-evaluation”. BMC Bioinformatics, Vol. 17, pp. 1-13, 2016. 

[17]J. Yang, H. Wang, H. Ding, A. Ning, and A. Gil. “Nonlinear Dimensionality Reduction for Synthetic Biology Biobrick Visualization. BMC Bioinformatics, Vol. 17, No. 4, pp. 1-10, 2017.

[18]E. Amir, and F. Joe, “Feature Selection with Targeted Projection Pursuit”, I.J. Information Technology and Computer Science, Vol. 5, No. 5,  pp. 34-39, 2015.

[19]G. Saptarsi, and C. Amlan, “Feature Selection: A Practitioner View”, I.J. Information Technology and Computer Science, Vol. 11, No. 10,  pp. 66-77, 2014.