IJISA Vol. 3, No. 3, 8 May 2011
Cover page and Table of Contents: PDF (size: 588KB)
Full Text (PDF, 588KB), PP.26-32
Views: 0 Downloads: 0
New trending events, incremental Clustering, Incremental priority, multi-representation index tree
Traditional Clustering is a powerful technique for revealing the hot topics among Web information. However, it failed to discover the trending events coming out gradually. In this paper, we propose a novel method to address this problem which is modeled as detecting the new cluster from time-streaming documents. Our approach concludes three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting process and the metrics for measuring a new cluster. Compared with the traditional method, we process the newly coming data first and merge the old clustering tree into the new one. Our algorithm can avoid that the documents owning high similarity were assigned to different clusters. We designed and implemented a system for practical application, the experimental results on a variety of domains demonstrate that our algorithm can recognize new valuable cluster during the iteration process, and produce quality clusters.
Hui Song, Lifeng Wang, Baiyan Li, Xiaoqiang Liu, "New Trending Events Detection based on the Multi-Representation Index Tree Clustering", International Journal of Intelligent Systems and Applications(IJISA), vol.3, no.3, pp.26-32, 2011. DOI:10.5815/ijisa.2011.03.04
[1]Oren Zamir, Oren Etzioni, “Web Document Clustering: A Feasibility Demonstration”, in Proceedings of SIGIR’98, Melbourne, Australia, 1998.
[2]Nachiketa Sahoo , Jamie Callan , Ramayya Krishnan , George Duncan , Rema Padman, “Incremental hierarchical clustering of text documents”, in Proceedings of the 15th ACM international conference on Information and knowledge management, November, 2006, Arlington, Virginia, USA.
[3]B. Thorsten, C. Francine, and F. Ayman. “A System for New Event Detection”. in Proceedings of the 26th Annual International ACM SIGIR Conference, pp:330–337, New York, NY, USA. 2003.
[4]T. Brants and F. Chen, “A system for new event detection”, In Proceedings of SIGIR’03, pp:330–337, Toronto, Canada, 2003. ACM.
[5]K. Zhang, J. Z. Li, and G. Wu, “New event detection based on indexing-tree and named entity”. In Proceedings of SIGIR’07, pp:215–222, Amsterdam, The Netherlands, 2007. ACM.
[6]Wim De Smet, Marie-Francine Moens, ”An Aspect Based Document Representation for Event Clustering”, in Proceedings of the 19th Meeting of Computational Linguistics in the Netherlands, pp:55-68, 2009.
[7]Gavin Shaw, Yue Xu, “Enhancing an Incremental Clustering Algorithm for Web Page Collections”, in Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, September, 2009.
[8]G.P.C. Fung, J.X. Yu, H. Liu and P.S. Yu. “Time-Dependent Event Hierarchy Construction”. in Proceedings of KDD-2007, pp 300-309, 2007.
[9]W. Wong and A. Fu, “Incremental document clustering for web page classification”, in Proceedings of International Conference on Information Society, Japan, 2000.
[10]M. Charikar, C. Chekuri, T. Feder, and R. Motwani, “Incremental clustering and dynamic information retrieval”, in The 29th annual ACM symposium on Theory of computing, pp:626-635, 1997.
[11]K. Hammouda and M. Kamel, “Incremental document clustering using cluster similarity histograms”, in IEEE/WIC International Conference on Web Intelligence, 2003.
[12]Chung-Chian Hsu, Yan-Ping Huang. “Incremental clustering of mixed data based on distance hierarchy”. in Expert Systems with Applications, vol(35), pp:1177– 1185, 2008.
[13]Maria Soledad Pera , Yiu-Kai Ng, “Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles”, Integrated Computer-Aided Engineering, vol.15, pp.331-350, December 2008.
[14]Weimao Ke, Cassidy R. Sugimoto, Javed Mostafa, “Dynamicity vs. effectiveness: studying online clustering for scatter/gather”, in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009.
[15]Canhui Wang, Min Zhang, Shaoping Ma, Liyun Ru, “Automatic online news issue construction in web environment”, in Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008.
[16]Han xi-wu, Zhao Tie-jun. “An evaluation method for clustering quality and its application,” Journal of harbin institute of technology, vol 41, pp.225-227, November 2009, 225-227.(In Chinese)