Applying Clustering and Topic Modeling to Automatic Analysis of Citizens’ Comments in E-Government

Full Text (PDF, 925KB), PP.1-10

Views: 0 Downloads: 0

Author(s)

Gunay Y. Iskandarli 1,2,*

1. Institute of Information Technology, Azerbaijan National Academy of Sciences

2. AZ1141, B. Vahabzade street, 9A, Baku, Azerbaijan

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2020.06.01

Received: 11 Mar. 2020 / Revised: 7 May 2020 / Accepted: 25 Jun. 2020 / Published: 8 Dec. 2020

Index Terms

E-government, text mining, topic modeling, K-Means

Abstract

The paper proposes an approach to analyze citizens' comments in e-government using topic modeling and clustering algorithms. The main purpose of the proposed approach is to determine what topics are the citizens' commentaries about written in the e-government environment and to improve the quality of e-services. One of the methods used to determine this is topic modeling methods. In the proposed approach, first citizens' comments are clustered and then the topics are extracted from each cluster. Thus, we can determine which topics are discussed by citizens. However, in the usage of clustering and topic modeling methods appear some problems. These problems include the size of the vectors and the collection of semantically related of documents in different clusters. Considering this, the semantic similarity of words is used in the approach to reduce measure. Therefore, we only save one of the words that are semantically similar to each other and throw the others away. So, the size of the vector is reduced. Then the documents are clustered and topics are extracted from each cluster. The proposed method can significantly reduce the size of a large set of documents, save time spent on the analysis of this data, and improve the quality of clustering and LDA algorithm.

Cite This Paper

Gunay Y. Iskandarli, "Applying Clustering and Topic Modeling to Automatic Analysis of Citizens’ Comments in E-Government", International Journal of Information Technology and Computer Science(IJITCS), Vol.12, No.6, pp.1-10, 2020. DOI:10.5815/ijitcs.2020.06.01

Reference

[1]G. M. P. Gupta, D. Jana, “E-government evaluation: A framework and case study”, Government Information Quarterly, vol. 20, no.4, pp. 365–387, 2003.

[2]Ch.-Ch. Huang, “User's Segmentation on Continued Knowledge Management System Use in the Public Sector”, Journal of Organizational and End User Computing, vol.32, no.1, pp. 19-40, 2020.

[3]L. Hong, B. D. Davison, “Empirical Study of Topic Modeling in Twitter”, Proceedings of the First Workshop on Social Media Analytics, pp.80-88, 2010.

[4]J. Chang, J. Boyd-Graber, D. M. Blei, “Connections between the lines: augmenting social networks with text”, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 169–178, 2009.

[5]A. McCallum, X. Wang, N. Mohanty, “Joint group and topic discovery from relations and text”, Journal of Statistical Network Analysis: Models, Issues and New Directions, volume 4503 of Lecture Notes in Computer Science, pp. 28–44, 2007.

[6]H. Zhang, C. L. Giles, H. C. Foley, J. Yen, “Probabilistic community discovery using hierarchical latent gaussian mixture model”, Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 663–668, 2007.

[7]M. Rosen-Zvi, T. Griffiths, M. Steyvers, P. Smyth, “The author-topic model for authors and documents”, Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494, 2004.

[8]D. Ramage, D. Hall, R. Nallapati, C. D. Manning, “Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora”, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 248–256, 2009.

[9]X.-H. Phan, L.-M. Nguyen, S. Horiguchi, “Learning to classify short and sparse text & web with hidden topics from large-scale data collections”, Proceedings of the 17th International Conference on World Wide Web, pp. 91–100, 2008.

[10]L. Yue, M. Qiaozhu, Z. ChengXiang, “Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA”, Information Retrieval, vol. 14, no.2, pp. 178–203, 2011.

[11]C.-K. Yau, A.L. Porter, N.C. Newman, A. Suominen, “Clustering scientific documents with topic modeling”, Scientometrics, vol. 100, no.3, pp. 767-786, 2014.

[12]M. Pourvali, S. Orlando, H. Omidvarborna, “Topic Models and Fusion Methods: Union to Improve Text Clustering and Cluster Labeling”, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 4, pp. 28-34, 2018.

[13]N. K. Nagwani, “Summarizing large text collection using topic modeling and clustering based on MapReduce framework”, Journal of Big Data, vol.2, no.6, pp.1-18, 2015.

[14]R. Alghamdi, K. Alfalqi, “A Survey of Topic Modeling in Text Mining”, International Journal of Advanced Computer Science and Applications, vol. 6, no. 1, pp. 147-153, 2015.

[15]K. E. C. Levy, M. Franklin, “Driving Regulation: Using Topic Models to Examine Political Contention in the U.S. Trucking Industry”, Social Science Computer Review, vol.32, no.2, pp. 182-194, 2013.

[16]D. M. Blei, Introduction to probabilistic topic models. Communications of the ACM, 2011. Retrieved from http://www.cs.princeton.edu/*blei/papers/Blei2011.pdf

[17]S. I. Nikolenko, S. Koltcov, O. Koltsova, “Topic modelling for qualitative studies”, Journal of Information Science, vol. 43, no. 1, pp.88-102, 2017.

[18]S. Liu, C. Xia, X. Jiang, “Efficient Probabilistic Latent Semantic Analysis with Sparsity Control”, IEEE International Conference on Data Mining, pp. 905-910, 2010.

[19]T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis”, Machine Learning, vol. 42, no. 1, pp. 177-196, 2001.

[20]D. M. Blei, A.Y. Ng, M.I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, vol.3, pp.993-1022, 2003.

[21]M. Shao, L. Qin, “Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence”, Proceedings of the 2nd International Conference on Software Engineering, Knowledge Engineering and Information Engineering, pp.199-203, 2014.

[22]T. Kakkonen, N. Myller, E.Sutinen, “Applying latent Dirichlet allocation to automatic essay grading”, Lecture Notes in Computer Science, vol.4139, pp.110-120, 2006.

[23]G.Y.Iskandarli, “Using Hotspot Information to Evaluate Citizen Satisfaction in E-Government: Hotspot Information”, International Journal of Public Administration in the Digital Age, vol.7, no. 1, pp. 47-62, 2020.

[24]R. Iftikhar, M. S. Khan, “Social Media Big Data Analytics for Demand Forecasting: Development and Case Implementation of an Innovative Framework”, Global Information Management, vol.28, no.1, pp.103-120, 2020.

[25]S.W. Kim, J.M.Gil, “Research paper classifcation systems based on TF IDF and LDA schemes”, Human-centric Computing and Information Sciences, vol.9, no. 30, pp. 1-21, 2019.

[26]R.M.Aliguliyev, G.Y.Niftaliyeva, “Detecting terrorism-related articles on the e-government using text-mining techniques”,  Problems of Information Technology, vol. 6, no.2, pp. 36-46, 2015.

[27]R.M.Alguliyev, R.M.Aliguliyev, G.Y.Niftaliyeva, “Filtration of Terrorism-Related Texts in the E-Government Environment”, International Journal of Cyber Warfare and Terrorism, vol. 8, no. 4, pp.35-48, 2018.

[28]Sh.A. Takale, S. S. Nandgaonkar, “Measuring Semantic Similarity between Words Using Web Documents”, International Journal of Advanced Computer Science and Applications, vol. 1, no.4, pp.78-85, 2010.

[29]R. Feldman, J. Sanger, The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data, 2007.

[30]D. Greene, P.Cunningham, “Practical solutions to the problem of diagonal dominance in kernel document clustering”, Proceedings of the 23rd International Conference on Machine Learning, pp. 377-384, 2006. 

[31]R. M. Aliguliyev, “Performance evaluation of density-based clustering methods”, Information Sciences, vol.179, pp.3583-3602, 2009.