Analyzing the Performance of SVM for Polarity Detection with Different Datasets

Full Text (PDF, 732KB), PP.29-36

Views: 0 Downloads: 0

Author(s)

Munir Ahmad 1,* Shabib Aftab 1

1. Department of Computer Science, Virtual University of Pakistan

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2017.10.04

Received: 28 Jul. 2017 / Revised: 9 Aug. 2017 / Accepted: 18 Sep. 2017 / Published: 8 Oct. 2017

Index Terms

Sentiment Analysis, Polarity Detection, Data Classification, Machine Learning, Support Vector Machine, SVM.

Abstract

Social media and micro-blogging websites have become the popular platforms where anyone can express his/her thoughts about any particular news, event or product etc. The problem of analyzing this massive amount of user-generated data is one of the hot topics today. The term sentiment analysis includes the classification of a particular text as positive, negative or neutral, is known as polarity detection. Support Vector Machine (SVM) is one of the widely used machine learning algorithms for sentiment analysis. In this research, we have proposed a Sentiment Analysis Framework and by using this framework, analyzed the performance of SVM for textual polarity detection. We have used three datasets for experiment, two from twitter and one from IMDB reviews. For performance evaluation of SVM, we have used three different ratios of training data and test data, 70:30, 50:50 and 30:70. Performance is measured in terms of precision, recall and f-measure for each dataset.

Cite This Paper

Munir Ahmad, Shabib Aftab, "Analyzing the Performance of SVM for Polarity Detection with Different Datasets", International Journal of Modern Education and Computer Science(IJMECS), Vol.9, No.10, pp. 29-36, 2017. DOI:10.5815/ijmecs.2017.10.04

Reference

[1]B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Found. Trends Inf. Retr., vol. 2, no. 1–2, pp. 1–135, 2008.
[2]H. Saif, Y. He, M. Fernandez, and H. Alani, “Contextual semantics for sentiment analysis of Twitter,” Inf. Process. Manag., vol. 52, no. 1, pp. 5–19, 2016.
[3]M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-Based Methods for Sentiment Analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267–307, 2011
[4]M. Ahmad, S. Aftab, S. S. Muhammad, and U. Waheed, “Tools and Techniques for Lexicon Driven Sentiment Analysis : A Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 1, pp. 17–23, 2017.
[5]M. Ahmad, S. Aftab, S. S. Muhammad, and S. Ahmad, “Machine Learning Techniques for Sentiment Analysis: A Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 3, pp. 27–32, 2017.
[6]A. Mudinas, D. Zhang, and M. Levene, “Combining lexicon and learning based approaches for concept-level sentiment analysis,” Proc. First Int. Work. Issues Sentim. Discov. Opin. Min. - WISDOM ’12, pp. 1–8, 2012.
[7]N. Malandrakis, A. Kazemzadeh, A. Potamianos, and S. Narayanan, “SAIL : A hybrid approach to sentiment analysis,” vol. 2, no. SemEval, pp. 438–442, 2013.
[8]P. P. Balage Filho and T. A. S. Pardo, “NILC{_}USP: A Hybrid System for Sentiment Analysis in Twitter Messages,” in Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2013, vol. 2, no. SemEval, pp. 568–572.
[9]“AlchemyAPI.” [Online]. Available: https://www.ibm.com/watson/alchemy-api.html.
[10]M. Ahmad, S. Aftab, I. Ali, and N. Hameed, “Hybrid Tools and Techniques for Sentiment Analysis: A Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 3, 2017.
[11]E. Cambria, B. B. Schuller, Y. Xia, and C. Havasi, “New Avenues in Opinion Mining and Sentiment Analysis,” IEEE Intell. Syst., vol. 28, no. 2, pp. 15–21, 2013.
[12]N. Cristianini and J. Shawe-Taylor, An introduction to Support Vector Machines, vol. 47, no. 2. 2000.
[13]J. Khairnar and M. Kinikar, “Machine Learning Algorithms for Opinion Mining and Sentiment Classification,” Int. J. Sci. Res. Publ., vol. 3, no. 6, pp. 1–6, 2013.
[14]M. M. Altawaier and S. Tiun, “Comparison of Machine Learning Approaches on Arabic Twitter Sentiment Analysis,” vol. 6, no. 6, pp. 1067–1073, 2016.
[15]S. Zainudin, D. S. Jasim, and A. A. Bakar, “Comparative Analysis of Data Mining Techniques for Malaysian Rainfall Prediction,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 6, no. 6, pp. 1148–1153, 2016.
[16]Neethu, M. S., and R. Rajasree. "Sentiment analysis in twitter using machine learning techniques." Computing, Communications and Networking Technologies (ICCCNT), 2013 Fourth International Conference on. IEEE, 2013.
[17]A. Shoukry and A. Rafea, “Preprocessing Egyptian Dialect Tweets for Sentiment Mining,” Fourth Work. Comput. …, no. November, pp. 47–56, 2012.
[18]R. Arora and Suman, “Comparative Analysis of Classification Algorithms on Different Datasets using WEKA,” Int. J. Comput. Appl., vol. 54, no. 13, pp. 21–25, 2012.
[19]N. Mallios, E. Papageorgiou, M. Samarinas, and K. Skriapas, “Comparison of machine learning techniques using the WEKA environment for prostate cancer therapy plan,” in Proceedings of the 2011 20th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, WETICE 2011, 2011, pp. 151–155.
[20]T. Garg and S. S. Khurana, “Comparison of classification techniques for intrusion detection dataset using WEKA,” Int. Conf. Recent Adv. Innov. Eng. ICRAIE 2014, 2014.
[21]B. M. Patil, D. Toshniwal, and R. C. Joshi, “Predicting burn patient survivability using decision tree in WEKA environment,” 2009 IEEE Int. Adv. Comput. Conf. IACC 2009, no. March, pp. 1353–1356, 2009.
[22]R. Sharma, “E-Mail Spam Detection Using SVM and RBF,” Int. J. Mod. Educ. Comput. Sci., vol. 8, no. April, pp. 57–63, 2016.
[23]Gokulakrishnan, Balakrishnan, et al. "Opinion mining and sentiment analysis on a twitter data stream." Advances in ICT for emerging regions (ICTer), 2012 International Conference on. IEEE, 2012.
[24]A. Mueen, B. Zafar, and U. Manzoor, “Modeling and Predicting Students’ Academic Performance Using Data Mining Techniques,” Int. J. Mod. Educ. Comput. Sci., vol. 8, no. 11, pp. 36–42, 2016.
[25]N. J. Sanders, “Sanders-twitter sentiment corpus,” Sanders Anal. LLC., 2011.
[26]E. Ikonomovska, “Airline dataset.” [Online]. Available: http://kt.ijs.si/elena_ikonomovska/data.html. [Accessed: 01-May-2017].
[27]B. Pang and L. Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts,” 2004.
[28]E. Frank, M. A. Hall, and I. H. Witten, “The WEKA Workbench,” in Morgan Kaufmann, Fourth Edition, 2016, pp. 553–571.
[29]T. Tokunaga and I. Makoto, “Text categorization based on weighted inverse document frequency,” Spec. Interes. Groups Inf. Process Soc. Japan (SIG-IPSJ, 1994.
[30]C. C. Aggarwal and C. X. Zhai, Mining text data. 2013.
[31]M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’04, 2004, p. 168.
[32]N. Kobayashi, K. Inui, and Y. Matsumoto, “Opinion Mining from Web Documents: Extraction and Structurization,” Trans. Japanese Soc. Artif. Intell., vol. 22, pp. 227–238, 2007.
[33]H. Jeong, D. Shin, and J. Choi, “FEROM: Feature extraction and refinement for opinion mining,” ETRI J., vol. 33, no. 5, pp. 720–730, 2011.
[34]G. Mishne, “Experiments with mood classification in blog posts,” Proc. ACM SIGIR 2005 Work. Stylist. Anal. Text Inf. Access, p. 19, 2005.
[35]S. Stymne, “Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages,” Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Student Sess., no. June, pp. 12–17, 2011.
[36]J. B. Lovins, “Development of a stemming algorithm,” Mech. Transl. Comput. Linguist., vol. 11, no. June, pp. 22–31, 1968.
[37]H. P. Luhn, “The Automatic Creation of Literature Abstracts,” IBM J. Res. Dev., vol. 2, no. 2, pp. 159–165, 1958.
[38]D. K. Ly, K. Sugiyama, Z. Lin, and M.-Y. Kan, “+Product Review Summarization from a Deeper Perspective,” Proc. 11th ACM/IEEE-CS Jt. Conf. Digit. Libr., no. July, pp. 311–314, 2011.
[39]N. Archak, A. Ghose, and P. G. Ipeirotis, “Show me the Money ! Deriving the Pricing Power of Product,” Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’07, pp. 56–65, 2007.
[40]Weka: http://www.cs.waikato.ac.nz/~ml/weka/
[41]M. Zavvar, M. Rezaei, and S. Garavand, “Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine,” Int. J. Mod. Educ. Comput. Sci., vol. 7, no. July, pp. 68–74, 2016