Detailed Study of Wine Dataset and its Optimization

Full Text (PDF, 494KB), PP.35-46

Views: 0 Downloads: 0

Author(s)

Parneeta Dhaliwal 1,* Suyash Sharma 1 Lakshay Chauhan 1

1. Department of Computer Science and Technology, Manav Rachna University, Faridabad Sector 43, Haryana 121001, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2022.05.04

Received: 7 Mar. 2022 / Revised: 16 Jun. 2022 / Accepted: 12 Aug. 2022 / Published: 8 Oct. 2022

Index Terms

Machine Learning, Optimisation, Data Analytics, Wine dataset

Abstract

The consumption of wine these days is becoming more common in social gatherings and to monitor the health of individuals it's very important to maintain the quality of the wine. For the assessment of wine quality many methods have been proposed. We have described a technique to pre-process the “Vinho Verde” wine dataset. The dataset consists of red and white wine samples. The wine dataset size has been reduced from a total of 13 attributes to 9 attributes without any loss of performance. This has been validated through various classification techniques like Random Forest Classifier, Decision tree Classifiers, K-Nearest Neighbor Classifier and Artificial Neural Network Classifier. These classifiers have been compared based on two performance metrics of accuracy and RMSE values. Among the three classifiers Random Forest tends to outperform the other two classifiers in various measures for predicting the quality of the wine.

Cite This Paper

Parneeta Dhaliwal, Suyash Sharma, Lakshay Chauhan, "Detailed Study of Wine Dataset and its Optimization", International Journal of Intelligent Systems and Applications(IJISA), Vol.14, No.5, pp.35-46, 2022. DOI:10.5815/ijisa.2022.05.04

Reference

[1] F. Balali, J. Nouri, A. Nasiri, and T. Zhao, “Data Analytics,” in Data Intensive Industrial Asset Management, Cham: Springer International Publishing, 2020, pp. 105–113.
[2] “Big Data Analytics,” IBM. [Online]. Available: https://www.ibm.com/analytics/big-data-analytics. [Accessed: 24-Apr-2022].
[3] Y. Er and A. Atasoy, "The Classification of White Wine and Red Wine According to Their Physicochemical Qualities", International Journal of Intelligent Systems and Applications in Engineering, vol. 4, no. Special Issue-1, pp. 23-26, Dec. 2016, doi:10.18201/ijisae.265954
[4] I.H. Sarker, M. H. Furhad, and R. Nowrozy, “AI-driven cybersecurity: An overview, security intelligence modeling and research directions,” SN Computer Science, vol. 2, no. 3,2021
[5] Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2, 160 (2021). https://doi.org/10.1007/s42979-021-00592-x
[6] H. Sarker, A. S. M. Kayes, S. Badsha, H. Alqahtani, P. Watters, and A. Ng, “Cybersecurity data science: an overview from machine learning perspective,” J. Big Data, vol.7, no. 1, 2020.
[7] Marchand and P. Marx, Automated product recommendations with preference-based explanations, J. Retail., vol. 96, no. 3, pp. 328–343, 2020.
[8] V. Singh, S. Singh, and P. Gupta, “Real-time anomaly recognition through CCTV using neural networks,” Procedia Comput. Sci., vol. 173, pp. 254–263, 2020.
[9] Sarker, M. Hoque, M. Uddin and T. Alsanoosy, "Mobile Data Science and Intelligent Apps: Concepts, AI-Based Modeling and Research Directions'', Mobile Networks and Applications, vol. 26, no. 1, pp. 285-303, 2020, doi: 10.1007/s11036-020-01650-z
[10] A.Gandomi and M. Haider, “Beyond the hype: Big data concepts, methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, 2015.
[11] D. Tamburini, “Describe, diagnose, and predict with IoT Analytics,” Microsoft.com. [Online]. Available: https://azure.microsoft.com/en-in/blog/answering-whats-happening-whys-happening-and-what-will-happen-with-iot-analytics/. [Accessed: 24-Feb-2022]
[12] Y. Gupta, Selection of important features and predicting wine quality using machine learning techniques, Procedia Comput. Sci., vol. 125, pp. 305–312, 2018.
[13] S. Kumar, K. Agrawal, and N. Mandan, "Red Wine Quality Prediction Using Machine Learning Techniques," 2020 International Conference on Computer Communication and Informatics (ICCCI), 2020, pp. 1-6, Doi: 10.1109/ICCCI48352.2020.9104095.
[14] P. Sidhu and M. Bhatia, "A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority", International Journal of Machine Learning and Cybernetics, vol. 9, no. 1, pp. 37-61, (2015) [Online]. Available: https://link.springer.com/article/10.1007/s13042-015-0333-x
[15] P. Cortez, A. Cerderia, F. Almeida, T. Matos, and J. Reis, “Modeling wine preferences by data mining from physicochemical properties,” In Decision Support Systems, Elsevier, 47 (4):547-553. ISSN: 0167-9236.
[16] K. R. Dahal, J. N. Dahal, H. Banjade, and S. Gaire, “Prediction of wine quality using machine learning algorithms,” Open J. Stat., vol. 11, no. 02, pp. 278–289, 2021.
[17] P. Appalasamy, A. Mustapha, N. Rizal, F. Johari and A. Mansor, "Classification-based Data Mining Approach for Quality Control in Wine Production", Journal of Applied Sciences, vol. 12, no. 6, pp. 598-601, 2012. Available: 10.3923/jas.2012.598.601.
[18] S. Lee, J. Park and K. Kang, "Assessing wine quality using a decision tree," 2015 IEEE International Symposium on Systems Engineering (ISSE), 2015, pp. 176-178, doi: 10.1109/SysEng.2015.7302752.
[19] S. Aich, A. A. Al-Absi, K. Lee Hui and M. Sain, "Prediction of Quality for Different Type of Wine based on Different Feature Sets Using Supervised Machine Learning Techniques," 2019 21st International Conference on Advanced Communication Technology (ICACT), 2019, pp. 1122-1127, Doi: 10.23919/ICACT.2019.8702017.
[20] L. Breiman, Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[21] X. Ying, "An Overview of Overfitting and its Solutions", Journal of Physics: Conference Series, vol. 1168, p. 022022, 2019. Available: 10.1088/1742-6596/1168/2/022022.
[22] B. Baranidharan, A. Pal and P. Muruganandam, "Cardio-Vascular Disease Prediction based on Ensemble Technique Enhanced using Extra Tree Classifier for Feature Selection", International Journal of Recent Technology and Engineering, vol. 8, no. 3, pp. 3236-3242, 2019.doi:10.35940/ijrte.C5404.098319.
[23] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Mach. Learn., vol.63, no. 1, pp. 3–42, 2006.
[24] S. Walker, W. Khan, K. Katic, W. Maassen and W. Zeiler, "Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings", Energy and Buildings, vol. 209, p. 109705, 2020. Available: 10.1016/j.enbuild.2019.109705 [Accessed 24 Feb 2022].
[25] T. Chai and R. R. Draxler, Root mean square error (RMSE) or mean absolute error (MAE)?, Geosci, Geosci. Model Dev. Discuss, vol. 7, pp. 1525–1534, 2014.
[26] Wikipedia contributors, “Kernel density estimation,” Wikipedia, The Free Encyclopedia,11 Apr. 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Kernel_density_estimation&oldid=1082123335.
[27] Ahmed Iqbal, Shabib Aftab, "A Feed-Forward and Pattern Recognition ANN Model for Network Intrusion Detection", International Journal of Computer Network and Information Security, Vol.11, No.4, pp.19-25, 2019.
[28] J. Hintze and R. Nelson, "Violin Plots: A Box Plot-Density Trace Synergism", The American Statistician, vol. 52, no. 2, pp. 181-184, 1998 [Online]. Available: https://www.jstor.org/stable/2685478