Ensembles of Classification Methods for Data Mining Applications

Full Text (PDF, 910KB), PP.6-21

Views: 0 Downloads: 0

Author(s)

M.Govindarajan 1,*

1. Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar  – 608002, Tamil Nadu, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2013.06.02

Received: 14 Sep. 2013 / Revised: 2 Oct. 2013 / Accepted: 8 Nov. 2013 / Published: 8 Dec. 2013

Index Terms

Data Mining, Ensemble, Intrusion Detection, Direct Marketing, Signature Verification, Radial Basis Function, Support Vector Machine, Accuracy

Abstract

One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed using classifiers in both homogeneous ensemble classifiers using bagging and heterogeneous ensemble classifiers using arcing classifier and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of real and benchmark data sets of data mining applications like intrusion detection, direct marketing and signature verification. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase and combining phase. A wide range of comparative experiments are conducted for real and benchmark data sets of direct marketing. The accuracy of base classifiers is compared with homogeneous and heterogeneous models for data mining problem. The proposed ensemble methods provide significant improvement of accuracy compared to individual Classifiers and also heterogeneous models exhibit better results than homogeneous models for real and benchmark data sets of data mining applications.

Cite This Paper

M.Govindarajan, "Ensembles of Classification Methods for Data Mining Applications", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.5, no.6, pp.6-21, 2013. DOI:10.5815/ijieeb.2013.06.02

Reference

[1]P. Anderson. Computer security threat monitoring and surveillance, Technical Report, James P. Anderson Co., Fort Washington, PA, 1980. 

[2]A. Amin, H. B. Al-Sadoun, and S. Fischer. Hand-printed Arabic Character Recognition System Using An Artificial Network, Pattern Recognition Vol. 29, No. 4, 1996:663-675.

[3]Amritha Sampath, Tripti C, Govindaru V. Freeman code based online handwritten character recognition for Malayalam using backpropagation neural networks, International journal on Advanced computing, Vol. 3, No. 4, 2012: 51 – 58. 

[4]Aptéa, C. and Weiss, S. Data mining with decision trees and decision rules, Future Generation Computer Systems 13, No.2-3, 1997:197–210. 

[5]Bentz, Y., & Merunkay, D. Neural networks and the multinomial logit for brand choice modeling: A hybrid approach, Journal of Forecasting, 19(3), 2000: 177–200. 

[6]E. Biermann, E. Cloete and L.M. Venter. A comparison of intrusion detection Systems, Computer and Security, vol. 20, 2001: 676-683.

[7]Breiman. L. Bias, Variance, and Arcing Classifiers, Technical Report 460, Department of Statistics, University of California, Berkeley, CA, 1996.

[8]Breiman, L. Bagging predictors. Machine Learning, 24(2), 1996a:123– 140.

[9]Breiman, L. Stacked Regressions, Machine Learning, 24(1), 1996c:49-64.

[10]Breiman, L. Random forests, Machine Learning, 45, 2001:5-32. 

[11]Bounds, D., Ross, D. Forcasting Customer Response with Neural Network, Handbook of Neural Computation G6.2, 1997: 1-7.

[12]Burges, C. J. C. A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2(2), 1998:121-167.

[13]C. J. C. Burges and B. Scholkopf. Improving the Accuracy and Speed of Support vector Learning Machine, Advanced in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, 1997: 375-381.

[14]J. Cai, M. Ahmadi, and M. Shridhar. Recognition of Handwritten Numerals with Multiple Feature and Multi-stage Classifier, Pattern Recognition, Vol. 28, No. 2, 1995:153-160.

[15]Cherkassky, V. and Mulier, F. Learning from Data - Concepts, Theory and Methods, John Wiley & Sons, New York, 1998. 

[16]W. H. Chen, S. H. Hsu, H.P Shen. Application of SVM and ANN for intrusion detection, Comput OperRes Vol-ume 32, Issue 10, 2005a: 2617–2634.

[17]Chen Y, Abraham A, and Yang J. Feature deduction and intrusion detection using flexible neural trees, In: Second IEEE International Symposium on Neural Networks, 2005b: 2617-2634. 

[18]Cherkassky, V. and Mulier, F. Learning from Data - Concepts, Theory and Methods, John Wiley & Sons, New York, 1998. 

[19]Cheung, K.-W., Kwok, J. K., Law, M. H., & Tsui, K.-C. Mining customer product rating for personalized marketing. Decision Support Systems, 35, 2003: 231–243.

[20]Chiu, c. A Case-Based Customer Classification Approach for Direct Marketing, Expert Systems with Application 22, 2002: 163-168.

[21]Coenen, F., Swinnen, G., Vanhoof, k., & Wets, G. Combining Rule-Induction and Case-Based Reasoning, Expert Systems with Application 18, 2000: 307-313.

[22]Cortes, C. and Vapnik, V. Support Vector Networks, Machine Learning 20, No.3, 1995: 273–297.

[23]J. X. Dong, A. Krzyzak, and C.Y. Suen. Fast SVM Training Algorithm with Decomposition on Very Large Datasets, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, No. 4, 2005: 603-618.

[24]Freund, Y. and Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting, In proceedings of the Second European Conference on Computational Learning Theory, 1995: 23-37.

[25]Freund, Y. and Schapire R. Experiments with a new boosting algorithm, In Proceedings of the Thirteenth International Conference on Machine Learning, 1996:148-156 Bari, Italy.

[26]Ghosh AK, Schwartzbard A. A study in using neural networks for anomaly and misuse detection. In: The proceeding on the 8th USENIX security symposium, <http://citeseer.ist.psu.edu/context/1170861/0>; 1999, [accessed August 2006].

[27]M.Govindarajan, RM.Chandrasekaran. Intrusion Detection using an Ensemble of Classification Methods, In Proceedings of International Conference on Machine Learning and Data Analysis, San Francisco, U.S.A, 2012: 459-464.

[28]Ha, K., Cho, S., MacLachlan, D. Response models based on bagging neural networks, Submitted for publication. Journal of Interactive Marketing 19(1), 2005:17–30.

[29]Haykin, S. Neural networks: a comprehensive foundation (second ed.), New Jersey: Prentice Hall, 1999. 

[30]Heady R, Luger G, Maccabe A, Servilla M. The architecture of a network level intrusion detection system. Technical Report, Department of Computer Science, University of New Mexico, 1990. 

[31]HosseinJavaheri, S. Response Modeling in Direct Marketing-A Data Mining Based Approach for Target Selection, http://www.directworks.org/, 2007, Retrieved 2013/03/15.

[32]T.K.Ho, J.J.Hull, and S.N.Srihari. Combination of Structural Classifiers, in Proc. IAPR Workshop Syntatic and Structural Pattern Recog., 1990: 123-137. 

[33]Y. S. Huang and C. Y. Suen. An Optimal Method of Combining Multiple Classifiers for Unconstrained Handwritten Numeral Recognition, Proceedings of 3rd International Workshop on Frontiers in Handwriting Recognition, 1993. 

[34]Y. S. Huang and C. Y. Suen. A Method of Combining Experts for the Recognition of Unconstrained Handwritten Numerals, IEEE Transactions on PAMI, Vol. 17, No. 1,1995: 90-94.

[35]Hu, X. A data mining approach for retailing bank customer attrition analysis, Applied Intelligence 22(1), 2005:47-60.

[36]K. Ilgun, R.A. Kemmerer and P.A. Porras. State transition analysis:A rule-based intrusion detection approach, IEEE Trans. Software Eng. vol. 21, 1995: 181-199.

[37]Ira Cohen, Qi Tian, Xiang Sean Zhou and Thoms S.Huang. Feature Selection Using Principal Feature Analysis, In Proceedings of the 15th international conference on Multimedia, Augsburg, Germany, September, 2007: 25-29. 

[38]Jiawei Han, Micheline Kamber. Data Mining – Concepts and Techniques, Elsevier Publications, 2003.

[39]C. Katar. Combining multiple techniques for intrusion detection, Int J Comput Sci Network Security, 2006: 208–218.

[40]U. Krebel. Pairwise Classification and Support Vector Machines, Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, 1999: 255-268.

[41]Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of International Joint Conference on Artificial Intelligence, 1995: 1137–1143.

[42]L. Lam and C. Y. Suen. Optimal Combinations of Pattern Classifiers, Pattern Recognition Letters, Vol. 16, No. 9, 1995: 945-954.

[43]Li, W., Wu, X., Sun, Y. and Zhang, Q. Credit Card Customer Segmentation and Target Marketing Based on Data Mining, In Proceedings of International Conference on Computational Intelligence and Security, 2010: 73-76.

[44]Ling, X. and Li, C. Data Mining for Direct Marketing: Problems and Solutions, In Proceedings of the 4th KDD conference, AAAI Press, 1998, 73–79.

[45]E. Lundin and E. Jonsson. Anomaly-based intrusion detection: privacy concerns and other problems, Computer Networks, vol. 34, 2002: 623-640. 

[46]Maryam Daneshmandi, Marzieh Ahmadzadeh. A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing, Indian Journal of Computer Science and Engineering, Vol. 3 No.6, 2013: 844-855. 

[47]D. Marchette. A statistical method for profiling network traffic, in proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring (Santa Clara), CA, 1999:119-128.

[48]Moncef Charfi, Monji Kherallah, Abdelkarim El Baati, Adel M. Alimi. A New Approach for Arabic Handwritten Postal Addresses Recognition, International Journal of Advanced Computer Science and Applications, Vol. 3, No. 3, 2012:1-7.

[49]Muhammad Naeem Ayyaz, Imran Javed, Waqar Mahmood. Handwritten Character Recognition Using Multiclass SVM Classification with Hybrid Feature Extraction, Pakistan journal of Engineering and Application Science, Vol. 10, 2012: 57-67. 

[50]Mukkamala S, Sung AH, Abraham A. Intrusion detection using ensemble of soft computing paradigms, third international conference on intelligent systems design and applications, intelligent systems design and applications, advances in soft computing. Germany: Springer; 2003: 239–48.

[51]Mukkamala S, Sung AH, Abraham A. Modeling intrusion detection systems using linear genetic programming approach, the 17th international conference on industrial & engineering applications of artificial intelligence and expert systems, innovations in applied artificial intelligence. In: Robert O., Chunsheng Y., Moonis A., editors. Lecture Notes in Computer Science, vol. 3029. Germany: Springer; 2004a: 633–42.

[52]Mukkamala S, Sung AH, Abraham A, Ramos V. Intrusion detection systems using adaptive regression splines. In: Seruca I, Filipe J, Hammoudi S, Cordeiro J, editors. Proceedings of the 6th international conference on enterprise information systems, ICEIS’04, vol. 3, Portugal, 2004b: 26–33.

[53]S. Mukkamala, G. Janoski and A.Sung. Intrusion detection: support vector machines and neural networks, in proceedings of the IEEE International Joint Conference on Neural Networks (ANNIE), St. Louis, MO, 2002: 1702-1707.

[54]Oliver Buchtala, Manuel Klimek, and Bernhard Sick, Member, IEEE. Evolutionary Optimization of Radial Basis Function Classifiers for Data Mining Applications, IEEE Transactions on systems, man, and cybernetics—part b: cybernetics, vol. 35, no. 5, 2005.

[55]Parr Rud, O. Data Mining Cook book: Modeling Data for Marketing, Risk, and Customer Relationship Management, John Wiley & Sons, Inc, 2001.

[56]Potharst, R., Kaymak, U., Pijls W. Neural networks for target selection in direct marketing, Erasmus Research Institute of Management (ERIM), Erasmus University Rotterdam in its series Discussion Paper with number 77, http://ideas.repec.org/s/dgr/eureri.html, 2001.

[57]Renata F. P. Neves, Alberto N. G. Lopes Filho, Carlos A.B.Mello, CleberZanchettin. A SVM Based Off-Line Handwritten Digit Recognizer, International conference on Systems, Man and Cybernetics, IEEE Xplore, pp. 510-515, 2011: 9-12, Brazil.

[58]Schapire, R., Freund, Y., Bartlett, P., and Lee, W. Boosting the margin: A new explanation for the effectives of voting methods, In proceedings of the fourteenth International Conference on Machine Learning, 1997: 322-330, Nashville, TN.

[59]Shah K, Dave N, Chavan S, Mukherjee S, Abraham A, Sanyal S. Adaptive neuro-fuzzy intrusion detection system, IEEE International Conference on Information Technology: Coding and Computing (ITCC’04), vol. 1. USA: IEEE Computer Society, 2004: 70–74.

[60]Shin, H., Cho, S. Response Modeling with Support vector Machines, Expert Systems with Applications 30: 2006: 746-760.

[61]T. Shon and J. Moon. A hybrid machine learning approach to network anomaly detection, Information Sciences, vol.177, 2007: 3799-3821.

[62]D. C. Shubhangi and P. S. Hiremath. Handwritten English character and digit recognition using multiclass SVM classifier and using structural 

micro features, International Journal of Recent Trends in Engineering, vol. 2, no. 2, 2009. 

[63]C.Y.Suen, C.Nadal, T.A.Mai, R.Legault, and L.Lam. Recognition of totally unconstrained handwritten numerals based on the concept of multiple experts, Frontiers in Handwriting Recognition, C.Y.Suen, Ed., IN Proc.Int.Workshop on Frontiers in Handwriting Recognition, Montreal, Canada, Apr. 2-3, 1990: 131-143. 

[64]C. Y. Suen, C. Nadal, R. Legault, T. A. Mai, and L. Lam. Computer recognition of unconstrained handwritten numerals, Proc. IEEE, vol. 80, 1992: 1162–1180. 

[65]Suh, E. H., Noh, K. C., & Suh, C. K. Customer list segmentation using the combined response model, Expert Systems with Applications, 17(2), 1999: 89–97.

[66]Summers RC. Secure computing: threats and safeguards. New York: McGraw-Hill, 1997. 

[67]Sundaram A. An introduction to intrusion detection. ACM Cross Roads; 2(4), 1996.

[68]W. Stallings. Cryptography and network security principles and practices, USA: Prentice Hall, 2006. 

[69]Tang, Z. Improving Direct Marketing Profitability with Neural Networks, International Journal of Computer Applications 29(5): 2011:13-18.

[70]C. Tsai, Y. Hsu, C. Lin and W. Lin. Intrusion detection by machine learning: A review, Expert Systems with Applications, vol. 36, 2009: 11994-12000.

[71]Vanajakshi, L. and Rilett, L.R. A Comparison of the Performance of Artificial Neural Network and Support Vector Machines for the Prediction of Traffic Speed, IEEE Intelligent Vehicles Symposium, University of Parma, Parma, Italy: IEEE:2004: 194-199.

[72]Vapnik, V. Statistical learning theory, New York, John Wiley & Sons, 1998.

[73]T. Verwoerd and R. Hunt. Intrusion detection techniques and approaches, Computer Communications, vol. 25, 2002: 1356-1365.

[74]Viaene, S., B. Baesens, et al. Wrapped Input Selection Using Multilayer Perceptrons for Repeat-Purchase Modeling in Direct Marketing, International Journal of Intelligent Systems in Accounting, Finance and Management, 10(2): 2001: 115-126.

[75]Viaene, S., Baesens, B., Van Gestel, T., Suykens, J. A. K., Van den Poel, D., Vanthienen, J., et al. Knowledge discovery in a direct marketing case using least squares support vector machines, International Journal of Intelligent Systems, 16, 2001b: 1023–1036.

[76]Wang, C.H, and Srihari, S.N. A framework for object recognition in a visually complex environment and its applications to locating address blocks on mail pieces, Int J Computer Vision 2, 125, 1998.

[77]S. Wu and W. Banzhaf. The use of computational intelligence in intrusion detection systems: A review, Applied Soft Computing, vol.10, 2010: 1-35.

[78]L. Xu, A. Krzyzak, and C. Y. Suen. Methods of Combining Multiple Classifiers and Their Applications to Handwritten Recognition, IEEE Transactions on Systems, Man, Cybernetics, Vol. 22, No. 3, 1992: 418-435.

[79]Zahavi, J., & Levin, N. Issues and problems in applying neural computing to target marketing, Journal of Direct Marketing, 11(4), 1997a: 63–75.

[80]Zahavi, J., & Levin, N. Applying neural computing to target marketing, Journal of Direct Marketing, 11(4), 1997b: 76–93.

[81]Zhang, H. The Optimality of Na?ve Bayes. In Proceedings of the 17th FLAIRS conference, AAAI Press, 2004.