An Empirical Comparison of Missing Value Imputation Techniques on APS Failure Prediction

Full Text (PDF, 405KB), PP.21-29

Views: 0 Downloads: 0

Author(s)

Siam Rafsunjani 1,* Rifat Sultana Safa 1 Abdullah Al Imran 1 Md. Shamsur Rahim 1 Dip Nandi 1

1. Department of Computer Science, Faculty of Information Technology, American International University-Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2019.02.03

Received: 5 Sep. 2018 / Revised: 16 Sep. 2018 / Accepted: 24 Sep. 2018 / Published: 8 Feb. 2019

Index Terms

Air Pressure System Failure, Missing value imputation techniques, Classification

Abstract

The Air Pressure System (APS) is a type of function used in heavy vehicles to assist braking and gear changing. The APS failure dataset consists of the daily operational sensor data from failed Scania trucks. The dataset is crucial to the manufacturer as it allows to isolate components which caused the failure. However, missing values and imbalanced class problems are the two most challenging limitations of this dataset to predict the cause of the failure. The prediction results can be affected by the way of handling these missing values and imbalanced class problem. In this paper, we have examined and presented the impact of five different missing value imputation techniques namely: Expectation Maximization, Mean Imputation, Soft Impute, MICE, and Iterative SVD in producing significantly better results. We have also performed an empirical comparison of their performance by applying five different classifiers namely: Naive Bayes, KNN, SVM, Random Forest, and Gradient Boosted Tree on this highly imbalanced dataset. The primary aim of this study is to observe the impact of the mentioned missing value imputation techniques in the enhancement of the prediction results, performing an empirical comparison to figure out the best classification model and imputation technique. We found that the MICE imputation and the random under-sampling techniques are the highest influential techniques for improving the prediction performance and false negative rate.

Cite This Paper

Siam Rafsunjani, Rifat Sultana Safa, Abdullah Al Imran, Shamsur Rahim, Dip Nandi, "An Empirical Comparison of Missing Value Imputation Techniques on APS Failure Prediction", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.2, pp.21-29, 2019. DOI:10.5815/ijitcs.2019.02.03

Reference

[1]Costa, Camila Ferreira, and Mario A. Nascimento. "Ida 2016 industrial challenge: Using machine learning for predicting failures." International Symposium on Intelligent Data Analysis. Springer, Cham, 2016.

[2]Cerqueira, Vítor, et al. "Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance." International Symposium on Intelligent Data Analysis. Springer, Cham, 2016.

[3]Gondek, Christopher, Daniel Hafner, and Oliver R. Sampson. "Prediction of failures in the air pressure system of scania trucks using a random forest and feature engineering." International Symposium on Intelligent Data Analysis. Springer, Cham, 2016.

[4]Ozan, Ezgi Can, et al. "An Optimized k-NN Approach for Classification on Imbalanced Datasets with Missing Data." International Symposium on Intelligent Data Analysis. Springer, Cham, 2016.

[5]IDA 16: http://ida2016.blogs.dsv.su.se/, Last visit: 5/11/18

[6]Air Brakes: https://www.britannica.com/technology/air-brake, Last visit: 5/12/18

[7]Azur, Melissa J., et al. "Multiple imputation by chained equations: what is it and how does it work?." International journal of methods in psychiatric research 20.1 (2011): 40-49.

[8]Open source Fancy impute library for scikit learn by ishkndar: https://github.com/iskandr/fancyimpute, Last Visit: 6/23/18

[9]Open source Impyute library for scikit learn by eltonlow- https://github.com/eltonlaw/impyute, Last Visit: 6/24/18

[10]White, Ian R., Patrick Royston, and Angela M. Wood. "Multiple imputation using chained equations: issues and guidance for practice." Statistics in medicine 30.4 (2011): 377-399.

[11]GoogleColaboratory: https://colab.research.google.com/, Last Visit: 8/25/18

[12]Mazumder, Rahul, Trevor Hastie, and Robert Tibshirani. "Spectral regularization algorithms for learning large incomplete matrices." Journal of machine learning research 11.Aug (2010): 2287-2322.

[13]Gold, Michael Steven, and Peter M. Bentler. "Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization." Structural Equation Modeling 7.3 (2000): 319-355.

[14]Grace-Martin, K. "EM imputation and missing data: Is men imputation really so terrible?[Web log post]." (2009).

[15]Do, Chuong B., and Serafim Batzoglou. "What is the expectation maximization algorithm?." Nature biotechnology 26.8 (2008): 897.

[16]Troyanskaya, Olga, et al. "Missing value estimation methods for DNA microarrays." Bioinformatics 17.6 (2001): 520-525.

[17]Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of machine learning research 12.Oct (2011): 2825-2830.

[18]Hastie, Tibshirani, and R. Tibshirani. "& Friedman, J.(2008). The Elements of Statistical Learning; Data Mining, Inference and Prediction."

[19]Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.

[20]Zhang, Tong. "An introduction to support vector machines and other kernel-based learning methods." AI Magazine 22.2 (2001): 103.

[21]Chen, Yifei, et al. "A gradient boosting algorithm for survival analysis via direct optimization of concordance index." Computational and mathematical methods in medicine 2013 (2013). [22]  https://archive.ics.uci.edu/ml/datasets/APS+Failure+at+Scania+Trucks  Last visit: 5/20/18

[22]Biteus, Jonas, and Tony Lindgren. "Planning Flexible Maintenance for Heavy Trucks using Machine Learning Models, Constraint Programming, and Route Optimization." SAE International Journal of Materials and Manufacturing 10.2017-01-0237 (2017): 306-315.

[23]Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research 16 (2002): 321-357.

[24]Aljuaid, Tahani, and Sreela Sasi. "Proper imputation techniques for missing values in data sets." Data Science and Engineering (ICDSE), 2016 International Conference on. IEEE, 2016.

[25]M. G. Rahman, M. Z. Islam, Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques, Knowledge-Based Systems 53 (2013) 51–65.

[26]X. Zhu, S. Zhang, Z. Jin, Z. Zhang, Z. Xu, Missing value estimation for mixed-attribute data sets, IEEE Transactions on Knowledge and Data Engineering 23 (1) (2011) 110-121

[27]A. Farhangfar, L. Kurgan, J. Dy, Impact of imputation of missing values on classification error for discrete data, Pattern Recognition 41 (12) (2008) 745 3692–3705. 

[28]A. Farhangfar, L. A. Kurgan, W. Pedrycz, A novel framework for imputation of missing values in databases, IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37 (5) (2007) 692–709.

[29]Morshedizadeh, Majid, et al. "Application of imputation techniques and Adaptive Neuro-Fuzzy Inference System to predict wind turbine power production." Energy 138 (2017): 394-404.

[30]Sallam, Elsayed, et al. "Handling numerical missing values via rough sets." International Journal of Mathematical Sciences and Computing (IJMSC) 3.2 (2017): 22-36.

[31]Fahri, Muhammad U., and Sani M. Isa. "Data Mining to Prediction Student Achievement based on Motivation, Learning and Emotional Intelligence in MAN 1 Ketapang." (2018).

[32]Hu, Zhengbing, et al. "Fuzzy clustering data arrays with omitted observations." International Journal of Intelligent Systems and Applications 9.6 (2017): 24.