Clustering Undergraduate Computer Science Student Final Project Based on Frequent Itemset

Full Text (PDF, 512KB), PP.1-7

Views: 0 Downloads: 0

Author(s)

Lusi Maulina Erman 1,* Imas Sukaesih Sitanggang 1

1. Computer Science Department of Bogor Agricultural University, Bogor, 16680 Indonesia

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2016.11.01

Received: 12 Jan. 2016 / Revised: 11 May 2016 / Accepted: 20 Aug. 2016 / Published: 8 Nov. 2016

Index Terms

Abstract, association rule mining, frequent itemset, K-Means, purity

Abstract

Abstract is a part of document has an important role in explaining the whole document. Words that frequently appear can be used as a reference in grouping the final project document into categories. Text mining method can be used to group the abstracts. The purpose of this study is to apply the method of association rule mining namely ECLAT algorithm to find most common terms combination and to group a collection of abstracts. The data used in this study is documents of final project abstract in English of undergraduate computer science student of IPB from 2012 to 2014. This research used stopwords about common computer science terminology, applied association rule mining with support of 0.1, 0.15, 0.2, 0.25, 0.3, and 0.35, and used k-Means clustering with number of cluster (k) of 10 because it gives the lowest SSE. This research compared the value of support, SSE, the number of cluster members, and purity value in each cluster. The best clustering result is data with additional stopwords and without applying association rule mining, and with k is 10. The SSE result is 23 485.03, and with purity of 0.512

Cite This Paper

Lusi Maulina Erman, Imas Sukaesih Sitanggang, "Clustering Undergraduate Computer Science Student Final Project Based on Frequent Itemset", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.11, pp.1-7, 2016. DOI:10.5815/ijitcs.2016.11.01

Reference

[1]F. Beil, M. Ester, X. Xu. “Frequent term-based text clustering,” In Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, New York, NY, USA. pp. 436-442.

[2]R. Bordawekar, B. Blainey, R. Puri, Analyzing Analytics. San Rafael, CA: Morgan & Claypool, 2016.

[3]C. Borgelt, “Efficient implementations of Apriori and ECLAT” In Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI-03), 2003, Melbourne, FL, USA. pp. 26-34.

[4]SR. Fhattiya. “Development  of  Data  Warehouse  and  OLAP Application for Monitoring the Achievement of  IPB Computer Science Students”. unpublished.

[5]R. Feldman, and J. Sanger. The Text Mining Handbook Advanced Approaches in Analyzing Unstructured Data. Cambridge, UK: Cambridge University Press, 2007.

[6]BCM. Fung, K. Wang, M. Ester, “Hierarchical document clustering using frequent itemsets,” In Proceedings of the 2003 SIAM International Conference on Data Mining, 2003, San Francisco, CA, USA. pp. 59-70.

[7]X. Guandong, Z. Yanchun, L. Lin. Web Mining and Social Networking: Techniques and Applications. New York (US): Springe Science & Business Media. 2010.

[8]J. Han, M. Kamber, J. Pei. Data Mining Concepts and Techniques. Waltham, USA: Morgan Kaufmann Publisher, 2012.

[9]N.P. Katariya and M.S. Chaudhari. “Bisecting k-means algorithm for text clustering”, International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5 issue 2, pp.221-223, February 2015.

[10]M. Kaur and U. Grag. “ECLAT Algorithm for Frequent Itemsets Generation”, International Journal of Computer Systems, vol.1 issue 3, pp. 82-84, December 2014.

[11]CD. Manning, P. Raghavan, H. Schutze. An Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2009.

[12]P. Shinde and S. Govilkar. “A systematic study of text mining techniques”, International Journal on Natural Language Computing (IJNLC), vol. 4 no.4, pp. 54-62, August 2015. 

[13]T. Slimani and A. Lazzez. “Efficient Analysis of Pattern and Association Rule Mining Approaches”, I.J. Information Technology and Computer Science, vol.6 no.3, 2014, pp. 70-81.

[14]M. Steinbach, G. Karypis, V. Kumar. “A comparison of document clustering techniques,” In KDD Workshop on Text Mining, 2000, Boston, MA, USA. pp. 1-20.

[15]MJ. Zaki, S. Parthasarathy, M Ogihara, W. Li. “New algorithms for fast discovery of association rules”. In 3rd International Conference on Knowledge and Data Engineering, 1997, California, USA. pp. 283-286.