International Journal of Information Technology and Computer Science (IJITCS)

IJITCS Vol. 13, No. 6, Dec. 2021

Cover page and Table of Contents: PDF (size: 284KB)

Table Of Contents

REGULAR PAPERS

Linked Data: A Framework for Publishing Five-Star Open Government Data

By Bassel Al-khatib Ali Ahmad Ali

DOI: https://doi.org/10.5815/ijitcs.2021.06.01, Pub. Date: 8 Dec. 2021

With the increased adoption of open government initiatives around the world, a huge amount of governmental raw datasets was released. However, the data was published in heterogeneous formats and vocabularies and in many cases in bad quality due to inconsistency, messy, and maybe incorrectness as it has been collected by practicalities within the source organization, which makes it inefficient for reusing and integrating it for serving citizens and third-party apps.
This research introduces the LDOG (Linked Data for Open Government) experimental framework, which aims to provide a modular architecture that can be integrated into the open government hierarchy, allowing huge amounts of data to be gathered in a fine-grained manner from source and directly publishing them as linked data based on Tim Berners lee’s five-star deployment scheme with a validation layer using SHACL, which results in high quality data.
The general idea is to model the hierarchy of government and classify government organizations into two types, the modeling organizations at higher levels and data source organizations at lower levels. Modeling organization’s experts in linked data have the responsibility to design data templates, ontologies, SHACL shapes, and linkage specifications. whereas non-experts can be incorporated in data source organizations to utilize their knowledge in data to do mapping, reconciliation, and correcting data. This approach lowers the needed experts that represent a problem of linked data adoption.
To test the functionality of our framework in action, we developed the LDOG platform which utilizes the different modules of the framework to power a set of user interfaces that can be used to publish government datasets. we used this platform to convert some of UAE's government datasets into linked data. Finally, on top of the converted data, we built a proof-of-concept app to show the power of five-star linked data for integrating datasets from disparate organizations and to promote the governments' adoption. Our work has defined a clear path to integrate the linked data into open governments and solid steps to publishing and enhancing it in a fine-grained and practical manner with a lower number of experts in linked data, It extends SHACL to define data shapes and convert CSV to RDF.

[...] Read more.
Impact of Internet of Things (IoT) as Persuasive Technology

By Shagufta Faryad Hira Batool Muhammad Asif Affan Yasin

DOI: https://doi.org/10.5815/ijitcs.2021.06.02, Pub. Date: 8 Dec. 2021

The Internet of Things (IoT) adds a new dimension to how people and things can communicate and collaborate. Society and the Internet are now being interconnected tightly and purposely. The research aims to analyze how IoT as a persuasive technology can affect human behavior and increase the awareness and effectiveness of IoT products among users. How will the Internet of Things infrastructure facilitate humans to change their attitudes and behaviors towards specific routine work? Our objective is to analyze which factors influence the acceptance and rejection of particular behaviors and the core motivators that persuade people to do something or to avoid something. We aim to determine whether IoT will facilitate humans to change their focused behaviors or not. Because of the rapid convergence of digital and physical worlds and the advent of digital technology, the Internet and social media have opened up a new world of affordances, constraints, and information flows from a design perspective. This article discusses how digital architecture affects behavior and the ramifications for designers who want to influence behavior for social and environmental good. In this paper we aim to give a brief introduction to persuasive technology, especially as it pertains to human adoption of IoT technology. We discuss a number of current research opportunities in IoT gadgets and their adoptions [1]. Our results indicate that persuasive (IoT) infrastructure can be expected to achieve a change of driving behaviour among their adopters. Furthermore, attention should be paid to an appropriate selection and implementation of persuasive strategies.

[...] Read more.
Psychosocial Features for Hate Speech Detection in Code-switched Texts

By Edward Ombui Lawrence Muchemi Peter Wagacha

DOI: https://doi.org/10.5815/ijitcs.2021.06.03, Pub. Date: 8 Dec. 2021

This study examines the problem of hate speech identification in codeswitched text from social media using a natural language processing approach. It explores different features in training nine models and empirically evaluates their predictiveness in identifying hate speech in a ~50k human-annotated dataset. The study espouses a novel approach to handle this challenge by introducing a hierarchical approach that employs Latent Dirichlet Analysis to generate topic models that help build a high-level Psychosocial feature set that we acronym PDC. PDC groups similar meaning words in word families, which is significant in capturing codeswitching during the preprocessing stage for supervised learning models. The high-level PDC features generated are based on a hate speech annotation framework [1] that is largely informed by the duplex theory of hate [2]. Results obtained from frequency-based models using the PDC feature on the dataset comprising of tweets generated during the 2012 and 2017 presidential elections in Kenya indicate an f-score of 83% (precision: 81%, recall: 85%) in identifying hate speech. The study is significant in that it publicly shares a unique codeswitched dataset for hate speech that is valuable for comparative studies. Secondly, it provides a methodology for building a novel PDC feature set to identify nuanced forms of hate speech, camouflaged in codeswitched data, which conventional methods could not adequately identify.

[...] Read more.
Myers-briggs Personality Prediction and Sentiment Analysis of Twitter using Machine Learning Classifiers and BERT

By Prajwal Kaushal Nithin Bharadwaj B P Pranav M S Koushik S Anjan K Koundinya

DOI: https://doi.org/10.5815/ijitcs.2021.06.04, Pub. Date: 8 Dec. 2021

Twitter being one of the most sophisticated social networking platforms whose users base is growing exponentially, terabytes of data is being generated every day. Technology Giants invest billions of dollars in drawing insights from these tweets. The huge amount of data is still going underutilized. The main of this paper is to solve two tasks. Firstly, to build a sentiment analysis model using BERT (Bidirectional Encoder Representations from Transformers) which analyses the tweets and predicts the sentiments of the users. Secondly to build a personality prediction model using various machine learning classifiers under the umbrella of Myers-Briggs Personality Type Indicator. MBTI is one of the most widely used psychological instruments in the world. Using this we intend to predict the traits and qualities of people based on their posts and interactions in Twitter. The model succeeds to predict the personality traits and qualities on twitter users. We intend to use the analyzed results in various applications like market research, recruitment, psychological tests, consulting, etc, in future.

[...] Read more.
Performance of Machine Learning Algorithms with Different K Values in K-fold Cross-Validation

By Isaac Kofi Nti Owusu Nyarko-Boateng Justice Aning

DOI: https://doi.org/10.5815/ijitcs.2021.06.05, Pub. Date: 8 Dec. 2021

The numerical value of k in a k-fold cross-validation training technique of machine learning predictive models is an essential element that impacts the model’s performance. A right choice of k results in better accuracy, while a poorly chosen value for k might affect the model’s performance. In literature, the most commonly used values of k are five (5) or ten (10), as these two values are believed to give test error rate estimates that suffer neither from extremely high bias nor very high variance. However, there is no formal rule. To the best of our knowledge, few experimental studies attempted to investigate the effect of diverse k values in training different machine learning models. This paper empirically analyses the prevalence and effect of distinct k values (3, 5, 7, 10, 15 and 20) on the validation performance of four well-known machine learning algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) and K-Nearest Neighbours (KNN)). It was observed that the value of k and model validation performance differ from one machine-learning algorithm to another for the same classification task. However, our empirical suggest that k = 7 offers a slight increase in validations accuracy and area under the curve measure with lesser computational complexity than k = 10 across most MLA. We discuss in detail the study outcomes and outline some guidelines for beginners in the machine learning field in selecting the best k value and machine learning algorithm for a given task.

[...] Read more.