Work place: Department of Computer Science, Sunyani Technical University, Sunyani, Ghana
E-mail: aning421@gmail.com
Website:
Research Interests: Interaction Design, Computer systems and computational processes, Computational Learning Theory, Data Structures and Algorithms, Algorithm Design
Biography
Justice Aning received the MSc. degree in Information Technology from The Kwame Nkrumah University of Science and Technology (KNUST), Ghana, in 2017. He is currently a lecturer with the Department of Computer Science, Sunyani Technical University, Sunyani, Ghana. J. Aning has authored more than five papers in journals. His current research interests include Web design, Machine learning, intelligent systems for modelling and optimisation.
By Isaac Kofi Nti Owusu Nyarko-Boateng Justice Aning
DOI: https://doi.org/10.5815/ijitcs.2021.06.05, Pub. Date: 8 Dec. 2021
The numerical value of k in a k-fold cross-validation training technique of machine learning predictive models is an essential element that impacts the model’s performance. A right choice of k results in better accuracy, while a poorly chosen value for k might affect the model’s performance. In literature, the most commonly used values of k are five (5) or ten (10), as these two values are believed to give test error rate estimates that suffer neither from extremely high bias nor very high variance. However, there is no formal rule. To the best of our knowledge, few experimental studies attempted to investigate the effect of diverse k values in training different machine learning models. This paper empirically analyses the prevalence and effect of distinct k values (3, 5, 7, 10, 15 and 20) on the validation performance of four well-known machine learning algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) and K-Nearest Neighbours (KNN)). It was observed that the value of k and model validation performance differ from one machine-learning algorithm to another for the same classification task. However, our empirical suggest that k = 7 offers a slight increase in validations accuracy and area under the curve measure with lesser computational complexity than k = 10 across most MLA. We discuss in detail the study outcomes and outline some guidelines for beginners in the machine learning field in selecting the best k value and machine learning algorithm for a given task.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals