Soumick Chatterjee

Work place: Otto von Guericke University, Magdeburg, Germany

E-mail: contact@soumick.com

Website:

Research Interests: Computational Learning Theory, Image Compression, Image Manipulation, Image Processing, Medical Image Computing

Biography

Soumick Chatterjee did his Bachelor in Computer Application from Punjab Technical University, India. During his study, he launched his software startup Supernova Techlink, where he worked as a part-time professional and then as a full-time Chief Software Architect. He finished his post graduation in Computer Science from St. Xavier's College (Autonomous), Kolkata, India. He has few publications in the field of Steganography, Cryptography and Machine Learning. Currently, he is working as a Ph.D. Research Fellow in Otto-von-Guericke-Universität, Magdeburg, Germany, working on "Use of prior knowledge for interventional MRI", applying various Machine Learning and Deep Learning techniques. His research interest includes - Machine Learning, Deep Learning, Image Processing, Magnetic resonance imaging (MRI), MR Image Reconstruction, Interventional MRI, Text Analysis and Classification, Cryptography and Steganography.

Author Articles
Text Classification Using SVM Enhanced by Multithreading and CUDA

By Soumick Chatterjee Pramod George Jose Debabrata Datta

DOI: https://doi.org/10.5815/ijmecs.2019.01.02, Pub. Date: 8 Jan. 2019

With the sudden growth of the internet and digital documents available on the web, the task of organizing text data has become a major problem. In recent times, text classification has become one of the main techniques for organizing text data. The idea behind text classification is to classify a given piece of text to a predefined class or category. In the present research work, SVM has been used with linear kernel using the One-V-Rest strategy. The SVM is trained using various data sets collected from various sources. It may so happen that some particular words were not so common around 5-6 years ago, but are currently prevalent due to recent trends. Similarly, new discoveries may result in the coinage of new words. This process can also be applied to text blogs which can be crawled and then analyzed. This technique should in theory be able to classify blogs, tweets or any other document with a significant amount of accuracy. In any text classification process, preprocessing phase takes the most amount of time – cleaning, stemming, lemmatization etc. Hence, the authors have used a multithreading approach to speed up the process. The authors further tried to improve the processing time of the algorithm using GPU parallelism using CUDA.

[...] Read more.
Other Articles