Ferrer diagram based partitioning technique to decision tree using genetic algorithm

Full Text (PDF, 201KB), PP.25-32

Views: 0 Downloads: 0

Author(s)

Pavan Sai Diwakar Nutheti 1 Narayan Hasyagar 1 Rajashree Shettar 1 Shankru Guggari 2 Umadevi V 2

1. Rashtreeya Vidyalaya College of Engineering, Bengaluru, India

2. B.M.S College of Engineering, Bengaluru, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2020.01.03

Received: 24 Jun. 2019 / Revised: 20 Jul. 2019 / Accepted: 11 Aug. 2019 / Published: 8 Feb. 2020

Index Terms

Data mining, Decision tree, Ferrer diagram, Vertical Partitioning.

Abstract

Decision tree is a known classification technique in machine learning. It is easy to understand and interpret and widely used in known real world applications. Decision tree (DT) faces several challenges such as class imbalance, overfitting and curse of dimensionality. Current study addresses curse of dimensionality problem using partitioning technique. It uses partitioning technique, where features are divided into multiple sets and assigned into each block based on mutual exclusive property. It uses Genetic algorithm to select the features and assign the features into each block based on the ferrer diagram to build multiple CART decision tree. Majority voting technique used to combine the predicted class from the each classifier and produce the major class as output. The novelty of the method is evaluated   with 4 datasets from UCI repository and shows approximately 9%, 3% and 5% improvement as compared with CART, Bagging and Adaboost techniques.

Cite This Paper

Pavan Sai Diwakar Nutheti, Narayan Hasyagar, Rajashree Shettar, Shankru Guggari,  Umadevi V,"Ferrer diagram based partitioning technique to decision tree using genetic algorithm", International Journal of Mathematical Sciences and Computing(IJMSC), Vol.6, No.1, pp.25-32, 2020. DOI: 10.5815/ijmsc.2020.01.03

Reference

[1]Diogo R, Ferreira A, Evgeniy Vasilyev,  “Using logical decision trees to discover the cause of process delays from event logs,” Computers in Industry, vol. 70,  pp. 194–207, 2015.

[2]Gregory-Piatetsky-Shaoiro, “KDnuggets Website,” in http://www.kdnuggets.com/polls/2011/algorithms-analytics-data-mining.html, 2013.

[3]Jae Sung Lee and Eun Sung Lee, “Exploring the Usefulness of a Decision Tree in Predicting Peoples Location,” in Procedia - Social and Behavioural Sciences, 2014, pp. 447–451.

[4]Srecko Natek and Moti Zwilling, “Student data mining solution knowledge management system related to higher education institutions,” Computers in Industry, vol. 41, pp. 6400–6407, 2014.

[5]H. Hamsa, S. Indiradevi, and J. J. Kizhakkethottam, “Student academic performance prediction model using decision tree and fuzzy genetic algorithm,” Procedia Technology, vol. 25, pp. 326 – 332, 2016.

[6]M. Gerdes, “Decision trees and genetic algorithms for condition monitoring forecasting of aircraft air conditioning,” Expert Systems with Applications, vol. 40,  no. 12, pp. 5021 – 5026, 2013. 

[7]I. B. Yashkov,  “Feature selection using decision trees in the problem of jsm classification,” Automatic Documentation and Mathematical Linguistics, vol. 48,    pp. 6–11, 2014.

[8]Kyoungok Kim, “A hybrid classification algorithm by sub space partitioning through semi-supervised decision tree,” Pattern Recognition, vol. 60, pp. 157–163, 2016.

[9]B. Rabiee-Ghahfarrokhi, F. Rafiei, A. A. Niknafs, and B. Zamani, “Prediction of microrna target genes using an efficient genetic algorithm-based decision tree,”, FEBS Open Bio, vol. 5, pp. 877 – 884, 2015.

[10]Jaideep Vaidya et. al, “Privacy-preserving decision trees over vertically partitioned data,” ACM Transactions on Knowledge Discovery from Data, vol. 2, 2008.

[11]Hari Seetha,M. Narasimha Murty, R. Saravanan, “Classification by majority voting in feature partitions,” International Journal of Information and Decision Sciences, vol. 8, no. 2, pp. 109–124, 2016.

[12]S. Guggari, V. Kadappa, and V. Umadevi, “Non-sequential partitioning approaches to decision tree classifier,” Future Computing and Informatics Journal, vol. 3, no. 2, pp. 275 – 285, 2018.

[13]H. Akcan, “A  genetic algorithm based solution to the minimum-cost bounded-error calibration tree problem,” Applied Soft Computing, vol. 73, pp. 83 – 95,   2018.

[14]L. Yi and K. Wanli, “A new genetic programming algorithm for building decision tree,” Procedia Engineering, vol. 15, pp. 3658 – 3662, 2011.

[15]T. Lamba, Kavita, and A.K.Mishra, “Optimal machine learning model for software defect prediction,” International Journal of Intelligent Systems and Applications(IJISA), vol. 11, no. 02, pp. 36 – 48, 2019.

[16]D.R. Ferreira and M. Zacarias and M. Malheiros and P. Ferreira , “Approaching process mining with sequence clustering: experiments and findings,” in Proceedings of the 5th International Conference on Business Process Management (BPM 2007), vol. 4714, 2007.

[17]M.N. Garofalakis and R. Rastogi and K. Shim , “Spirit: Sequential pattern mining with regular expression constraints,” in Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), 1999, pp. 223–234.

[18]C. Wang and A. Kao and J. Choi and R. Tjoelker, “Discovering Time-Constrained Patterns from Long Sequences,” Advances of computational intelligence in industrial systems, vol. , no. , pp. 99–116, 2008.

[19]Lior Rokach, “Decomposition Methodology for Classification Tasks - A Meta Decomposer Framework,” Pattern Analysis and Applications, vol. 9, pp. 257–271, 2006.

[20]Vijayakumar Kadappa and Shankru Guggari and Atul Negi, “Decision Tree Classifier using Theme based Partitioning,” in IEEE International Conference on computing and network Communications (CoCoNet’15), 2015, , pp. 546–552.

[21]Lior Rokach and Oded Maimon, “Data mining for improving the quality of manufacturing: a feature set decomposition approach,” J Intell Manuf, vol. 17, pp. 285–299, 2006.

[22]Baxt W. G, “Use of an artificial neural network for data analysis in clinical decision making: The diagnosis of acute coronary occlusion,” Neural Computation, vol. 2(9), pp. 480–489, 1990.

[23]Kusiak, A, “Decomposition in Data Mining: An Industrial Case Study,” in IEEE Transactions on Electronics Packaging Manufacturing, 2000, pp. 345–353.

[24]Aha D and Murphy P, “UCI Repository of machine learning databases,” in http://www.ics.uci.edu/mlearn/MLRepository.html . Irvine, CA: University of California, Department of Information and Computer Science, 1994.

[25]R, “The R project for statistical computing,” in http://www.r-project.org/.