Sample of Groups: A New Strategy to Find a Representative Point for Each Undisclosed Cluster

By Wallace A. Pinheiro Ana B. S. Pinheiro

DOI: https://doi.org/10.5815/ijitcs.2023.05.01, Pub. Date: 8 Oct. 2023

Some problems involving the selection of samples from undisclosed groups are relevant in various areas such as health, statistics, economics, and computer science. For instance, when selecting a sample from a population, well-known strategies include simple random and stratified random selection. Another related problem is selecting the initial points corresponding to samples for the K-means clustering algorithm. In this regard, many studies propose different strategies for choosing these samples. However, there is no consensus on the best or most effective ap-proaches, even when considering specific datasets or domains. In this work, we present a new strategy called the Sam-ple of Groups (SOG) Algorithm, which combines concepts from grid, density, and maximum distance clustering algo-rithms to identify representative points or samples located near the center of the cluster mass. To achieve this, we create boxes with the right size to partition the data and select the representatives of the most relevant boxes. Thus, the main goal of this work is to find quality samples or seeds of data that represent different clusters. To compare our approach with other algorithms, we not only utilize indirect measures related to K-means but also employ two direct measures that facili-tate a fairer comparison among these strategies. The results indicate that our proposal outperforms the most common-ly used algorithms.

[...] Read more.

Implementing Enterprise Resource Planning Systems in Tanzanian Higher Education Institutions: The Influence of Task-technology Fit on Staff Performance

By Mhina J. R. A. Lashayo D. M.

DOI: https://doi.org/10.5815/ijitcs.2023.05.02, Pub. Date: 8 Oct. 2023

Individual user performance over ERP systems in Higher Education Institutions (HEIs) is less researched. Furthermore, the simultaneous impacts of Task-Technology Fit (TTF) on individual user (staff) performance over ERP systems in the context of Tanzania is little known. This study aimed at investigating both the direct and indirect impacts of TTF on staff performance over ERP systems in HEIs in Tanzania. This study was quantitatively designed using the snowball sampling technique by modifying D & M IS success model. The modified framework was subjected to a sample of 163 staff who are using the ERP system to accomplish business processes. The data collected was analyzed using Structure Equation Modelling (SEM). The results show that TTF has considerable direct and indirect impacts towards the performance of staff who are using ERP systems. This implies that whenever an enterprise is implementing an ERP system to improve its business process outcomes then a deep analysis has to be taken on three key elements of TTF i.e. task, technology and individual characteristics. The discussion and implications of impacts were also reported.

[...] Read more.

Streamlining Stock Price Analysis: Hadoop Ecosystem for Machine Learning Models and Big Data Analytics

By Jesslyn Noverlita Herison Surbakti

DOI: https://doi.org/10.5815/ijitcs.2023.05.03, Pub. Date: 8 Oct. 2023

The rapid growth of data in various industries has led to the emergence of big data analytics as a vital component for extracting valuable insights and making informed decisions. However, analyzing such massive volumes of data poses significant challenges in terms of storage, processing, and analysis. In this context, the Hadoop ecosystem has gained substantial attention due to its ability to handle large-scale data processing and storage. Additionally, integrating machine learning models within this ecosystem allows for advanced analytics and predictive modeling. This article explores the potential of leveraging the Hadoop ecosystem to enhance big data analytics through the construction of machine learning models and the implementation of efficient data warehousing techniques. The proposed approach of optimizing stock price by constructing machine learning models and data warehousing empowers organizations to derive meaningful insights, optimize data processing, and make data-driven decisions efficiently.

[...] Read more.

Applying Clustering to Predict Attackers Trace in Deceptive Ecosystem by Harmonizing Multiple Decoys Interactions Logs

By Jalaj Pateria Laxmi Ahuja Subhranil Som Ashish Seth

DOI: https://doi.org/10.5815/ijitcs.2023.05.04, Pub. Date: 8 Oct. 2023

Bluff and truth are major pillars of deception technology. Deception technology majorly relies on decoy-generated data and looks for any behavior deviation to flag that interaction as an attack or not. But at times a legitimate user can also do suspicious decoy interactions due to lack of knowledge and can be categorized under the “ATTACK” category which in a true sense should not be flagged that way. Hence, there is a need of doing collaborative analysis on honeypot, which are set up to monitor and log activities of sources that compromise or probe them. This goldmine provides ample information about the attacker intent and target, how it is moving forward in the kill chain as this information can be used to enhance threat intelligence and upgrade behaviors analysis rules.
In this paper, decoys which are strategically placed in the network pointing to various databases, services, and Ips are used providing information of interactions made. This data is analyzed to understand underlying facts which can help in strengthening defense strategy, it also enhances confidence on the findings as analysis is not restricted to single decoy interaction which could be false positive or un-intentional in nature but analyzing holistically to conclude on the exact attack patten and progression. With experiment we have highlighted is reconciling various honeypots data and weighing IP visits and Honeypot interaction counts against scores and then using KNN and Weightage KNN to derive inclination of target IP against Source IP which can also be summarized as direction of Attack and count/frequency of interaction from highlights criticality of the interactions. Used KNN and W-KNN have shown approx. 94% accuracy which is best in class, also silhouette score highlighted high cohesion of data points in the experiment. Moreover, this was also analyzed that increasing the number of decoys in the analysis helps in getting better confidence on attack probability and direction.

[...] Read more.

A Comparative Analysis of Algorithms for Heart Disease Prediction Using Data Mining

By Snigdho Dip Howlader Tushar Biswas Aishwarjyo Roy Golam Mortuja Dip Nandi

DOI: https://doi.org/10.5815/ijitcs.2023.05.05, Pub. Date: 8 Oct. 2023

Heart disease is very common in today’s day and age, with death rates climbing up the numbers every year. Prediction of heart disease cases is a topic that has been around in the world of data and medical science for many years. The study conducted in this paper makes comparison of the different algorithms that have been used in pattern analysis and prediction of heart diseases. Among the algorithms that have been used in the past included a combination of machine learning and data mining concepts that essentially are derived from statistical analysis and relevant approaches. There are a lot of factors that can be considered when attempting to analytically predict instances of heart diseases, such as age, gender, resting blood pressure etc. Eight such factors have been taken into consideration for carrying out this qualitative comparison. As this study uses a particular data set for extracting results from, the output may vary when implemented over different data sets. The research includes comparisons of Naive Bayes, Decision Tree, Random Forest and Logistic Regression. After multiple implementations, the accuracy in training and testing are obtained and listed down. The observations from implementation of these algorithms over the same dataset indicates that Random Forest and Decision Tree have the highest accuracy in prediction of heart disease based on the dataset that we have provided. Similarly, Naive Bayes has the least accurate results for this scenario under the given contexts.

[...] Read more.

International Journal of Information Technology and Computer Science (IJITCS)

MECS Press Journal

Table Of Contents

Sample of Groups: A New Strategy to Find a Representative Point for Each Undisclosed Cluster

Implementing Enterprise Resource Planning Systems in Tanzanian Higher Education Institutions: The Influence of Task-technology Fit on Staff Performance

Streamlining Stock Price Analysis: Hadoop Ecosystem for Machine Learning Models and Big Data Analytics

Applying Clustering to Predict Attackers Trace in Deceptive Ecosystem by Harmonizing Multiple Decoys Interactions Logs

A Comparative Analysis of Algorithms for Heart Disease Prediction Using Data Mining