Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining

Full Text (PDF, 454KB), PP.44-54

Views: 0 Downloads: 0

Author(s)

Souvik Sengupta 1,*

1. Aliah University, Kolkata, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2023.03.04

Received: 20 Oct. 2022 / Revised: 24 Nov. 2022 / Accepted: 12 Jan. 2023 / Published: 8 Jun. 2023

Index Terms

Educational Data Mining, Machine Learning, Students performance prediction, Feature analysis, Feature selection, Decision Support System

Abstract

An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011- 2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups- one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.

Cite This Paper

Souvik Sengupta, "Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining", International Journal of Modern Education and Computer Science(IJMECS), Vol.15, No.3, pp. 44-54, 2023. DOI:10.5815/ijmecs.2023.03.04

Reference

[1]Han, M., Tong, M., Chen, M., Liu, J., & Liu, C. (2017, July). Application of ensemble algorithm in students' performance prediction. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 735-740). IEEE.
[2]Anuradha, C., & Velmurugan, T. (2015). A comparative analysis on the evaluation of classification algorithms in the prediction of students performance. Indian Journal of Science and Technology, 8(15), 1-12.
[3]Ismail, L., Materwala, H., & Hennebelle, A. (2021, February). Comparative Analysis of Machine Learning Models for Students' Performance Prediction. In International Conference on Advances in Digital Science (pp. 149-160). Springer, Cham.
[4]Osmanbegović, E., Suljić, M., & Agić, H. (2014). Determining dominant factor for students performance prediction by using data mining classification algorithms. Tranzicija, 16(34), 147-158.
[5]Acharya, A., & Sinha, D. (2014). Early prediction of students performance using machine learning techniques. International Journal of Computer Applications, 107(1).
[6]Amra, I. A. A., & Maghari, A. Y. (2017, May). Students performance prediction using KNN and Naïve Bayesian. In 2017 8th International Conference on Information Technology (ICIT) (pp. 909-913). IEEE.
[7]Kabra, R. R., & Bichkar, R. S. (2011). Performance prediction of engineering students using decision trees. International Journal of computer applications, 36(11), 8-12.
[8]Jalota, C., & Agrawal, R. (2021). Feature selection algorithms and student academic performance: A study. In International Conference on Innovative Computing and Communications (pp. 317-328). Springer, Singapore.
[9]Devasia, T., Vinushree, T. P., & Hegde, V. (2016, March). Prediction of students performance using Educational Data Mining. In 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE) (pp. 91-95). IEEE.
[10]Bhardwaj, B. K., & Pal, S. (2012). Data Mining: A prediction for performance improvement using classification. arXiv preprint arXiv:1201.3418.
[11]Pandey, M., & Taruna, S. (2016). Towards the integration of multiple classifier pertaining to the student's performance prediction. Perspectives in Science, 8, 364-366.
[12]Abdullah, A. L., Malibari, A., & Alkhozae, M. (2014). STUDENTS PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE. International Journal of Data Mining & Knowledge Management Process, 4(5), 1.
[13]Arsad, P. M., & Buniyamin, N. (2013, November). A neural network students' performance prediction model (NNSPPM). In 2013 IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) (pp. 1-5). IEEE.
[14]Sharma, D., & Aggarwal, D. (2021). A Predictive Approach to Academic Performance Analysis of Students Based on Parental Influence. In International Conference on Innovative Computing and Communications (pp. 75-84). Springer, Singapore.
[15]Alshabandar, R., Hussain, A., Keight, R., & Khan, W. (2020, July). Students performance prediction in online courses using machine learning algorithms. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.
[16]Ahmad, S., El-Affendi, M. A., Anwar, M. S., & Iqbal, R. (2022). Potential Future Directions in Optimization of Students' Performance Prediction System. Computational Intelligence and Neuroscience, 2022.
[17]Nabil, A., Seyam, M., & Abou-Elfetouh, A. (2021). Prediction of students’ academic performance based on courses’ grades using deep neural networks. IEEE Access, 9, 140731-140746.
[18]Shingari, I., Kumar, D., & Khetan, M. (2017). A review of applications of data mining techniques for prediction of students' performance in higher education. Journal of Statistics and Management Systems, 20(4), 713-722.
[19]Sarker, F., Tiropanis, T., & Davis, H. C. (2013). Students' performance prediction by using institutional internal and external open data sources.
[20]Raut, A. B., & Nichat, M. A. A. (2017). Students performance prediction using decision tree. International Journal of Computational Intelligence Research, 13(7), 1735-1741.
[21]Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
[22]Ketui, N., Wisomka, W., & Homjun, K. (2019). Using classification data mining techniques for students performance prediction. In 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON) (pp. 359-363). IEEE.
[23]Saa, A. A. (2016). Educational data mining & students' performance prediction. International Journal of Advanced Computer Science and Applications, 7(5).
[24]Marbouti, F., Diefes-Dux, H. A., & Madhavan, K. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education, 103, 1-15.
[25]Hussain, S., Dahan, N. A., Ba-Alwib, F. M., & Ribata, N. (2018). Educational data mining and analysis of students' academic performance using WEKA. Indonesian Journal of Electrical Engineering and Computer Science, 9(2), 447-459.
[26]Goga, M., Kuyoro, S., & Goga, N. (2015). A recommender for improving the student academic performance. Procedia-Social and Behavioral Sciences, 180, 1481-1488.
[27]Miguéis, V. L., Freitas, A., Garcia, P. J., & Silva, A. (2018). Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems, 115, 36-51.
[28]Kotsiantis, S., Patriarcheas, K., & Xenos, M. (2010). A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowledge-Based Systems, 23(6), 529-535.
[29]UCI Student Performance Data Set, url: https://archive.ics.uci.edu/ml/machine-learning-databases/00320/
[30]Kaggle Students' Academic Performance Dataset url: https://www.kaggle.com/datasets/aljarah/xAPI-Edu-Data
[31]Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.