IJMECS Vol. 2, No. 2, 8 Dec. 2010
Cover page and Table of Contents: PDF (size: 333KB)
Full Text (PDF, 333KB), PP.25-31
Views: 0 Downloads: 0
Pattern discovery, software communication, decision tree, tri-training, semi-supervised learning
Smaller time loss and smoother communication pattern is the urgent pursuit in the software development enterprise. However, communication is difficult to control and manage and demands on technical support, due to the uncertainty and complex structure of data appeared in communication. Data mining is a well established framework aiming at intelligently discovering knowledge and principles hidden in massive amounts of original data. Data mining technology together with shared repositories results in an intelligent way to analyze data of communication in software development environment. We propose a data mining based algorithm to tackle the problem, adopting a co-training styled algorithm to discover pattern in software development environment. Decision tree is trained as based learners and a majority voting procedure is then launched to determine labels of unlabeled data. Based learners are then trained again with newly labeled data and such iteration stops when a consistent state is reached. Our method is naturally semi-supervised which can improve generalization ability by making use of unlabeled data. Experimental results on data set gathered from productive environment indicate that the proposed algorithm is effective and outperforms traditional supervised algorithms.
Gang Zhang, Caixian Ye, Chunru Wang, Xiaomin He, "Data Mining based Software Development Communication Pattern Discovery", International Journal of Modern Education and Computer Science(IJMECS), vol.2, no.2, pp.25-31, 2010. DOI:10.5815/ijmecs.2010.02.04
[1]Vineeth Mekkat, Ragavendra Natarajan. Performance characterization of data mining benchmarks. In Proceedings of TKDD, 2010, 3.
[2]Ted E. Senator, On the efficacy of data mining for security applications. In Proceedings of International Conference on Knowledge Discovery and Data Mining, 2009, 6.
[3]Huang Ming, Niu Wenying, Liang Xu. An improved decision tree classification algorithm based on ID3 and the application in score analysis, Chinese Control and Decision conference, 2009 , 1876-1878.
[4]Carson Kai-Sang Leung, Efficient algorithms for mining constrained frequent patterns fromuncertain data, ACM 2009, 6.
[5]Damon Fenacci, Björn Franke, John Thomson. Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining, SCOPES, 2010, 5.
[6]Hilderman, R. J., Carter, C. L., Hamilton, H. J., Cercone, N. Mining market basket data using share measures and characterized itemsets. In Proceedings of PAKDD-98. Melbourne, Australia. 72–86.
[7]Hung, S., Yen, D., Wang, H. Applying data mining to telecom churn management. In: Expert Systems and Applications 31, pp.515-524, 2006.
[8]Abascal, E., Lautre, I., Mallor, F. Data mining in a bicriteria clustering problem. 17. In Proceedings of European Journal of Operational Research 173, pp.705-716, 2006.
[9]Shirish Tatikonda, Srinivasan Parthasarathy. Mining tree-structured data on multicore systems. In Proceedings of the VLDB Endowment, 2009.
[10]Sumathi, S., Sivanandam, S. Introduction to Data Mining and its Applications. Springer-Verlag Berlin Heidelberg, 2006.
[11]Berry, M., Linoff, G. Data Mining Techniques. For marketing, sales and customer. Relationship Management. Wiley Publishing Inc., 2004.
[12]Jiawei Han, Michelin Kamber, Data Mining Concepts and Techniques, Second Edition.
[13]Zhi-Hua Zhou, Ming Li. Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans on knowledge and Data Engineering, 2005.17(11).
[14]TheWinPcapTeam,WimPcnp. http://www.wimpcap.org/docs/docs31/html/main.html,2005.12
[15]Thomas Bernecker, Hans-Peter Kriegel, Matthias Renz, Florian Verhein, and Andreas Zuefle. Probabilistic frequent itemset mining in uncertain databases. In KDD, 2009.
[16]Chen Cheny, Xifeng Yanz, Feida Zhuy, and Jiawei Hany. gapprox: Mining frequent approximate patterns from a massive network. In ICDM, 2007.
[17]Maitreya Natu, Vaishali Sadaphal, Sangameshwar Patil, and Ankit Mehrotra. Mining frequent subgraphs to extract communication patterns in data-centres. In Proceedings of the 12th ICDCN'11, Springer-Verlag, Berlin, Heidelberg, 239-250, 2011.
[18]G. Valiente. Efficient Algorithms on Trees and Graphs with Unique Node Labels. In Studies in Computational Intelligence, Springer- Berlin, vol. 52, pp. 137-149, 2007.
[19]I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[20]G. Brown, J. L. Wyatt, P. Tiňo, Managing diversity in regression ensembles, JMLR, 6, pp. 1621-1650, 2005.
[21]W. Wang and Z.-H. Zhou. Analyzing co-training style algorithms. In Proceedings of the 18th ECML, pp. 454–465, Warsaw, Poland, 2007.
[22]M. Hein and M. Maier. Manifold denoising. In B. Sch¨olkopf, J. C. Platt, and T. Hoffman, editors, Advances in NIPS 19, pages 561–568. MIT Press, Cambridge, MA, 2007.