Novel Feature Selection Algorithms Based on Crowding Distance and Pearson Correlation Coefficient

Full Text (PDF, 318KB), PP.37-42

Views: 0 Downloads: 0

Author(s)

Abdesslem Layeb 1,*

1. Constantine 2 university of Abdelhamid Mehri, NTIC faculty, LISIA laboratory

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2023.02.04

Received: 4 Apr. 2022 / Revised: 21 Aug. 2022 / Accepted: 1 Feb. 2023 / Published: 8 Apr. 2023

Index Terms

Feature Selection, Classification, Filter Methods, Crowding Distance, Pearson Correlation

Abstract

Feature Selection is an important phase in classification models. Feature Selection is an effective task used to decrease the dimensionality and eliminate redundant and unrelated features. In this paper, three novel algorithms for feature selection problem are proposed. The first one is a filter method, the second one is a wrapper method, and the last one is a hybrid filter method. Both the proposed algorithms use the crowding distance used in the multiobjective optimization as a new metric to assess the importance of the features. The idea behind the use of the crowding distance is that the less crowded features have great impacts on the target attribute (class), and the crowded features have generally the same impact on the class attribute. To enhance the crowded distance, a combination with other metrics will give good results. In this work, the hybrid method combines between the crowding distance and Pearson correlation coefficient to well order the importance of features. Experiments on well-known benchmark datasets including large microarray datasets have shown the effectiveness and the robustness of the proposed algorithms.

Cite This Paper

Abdesslem Layeb, "Novel Feature Selection Algorithms Based on Crowding Distance and Pearson Correlation Coefficient", International Journal of Intelligent Systems and Applications(IJISA), Vol.15, No.2, pp.37-42, 2023. DOI:10.5815/ijisa.2023.02.04

Reference

[1]Kuhn, M., & Johnson, K. (2013). An introduction to feature selection. In Applied predictive modeling (pp. 487-519). Springer, New York, NY.
[2]Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70-79.
[3]Sabilla, S. I., Sarno, R., & Triyana, K. (2019). Optimizing threshold using pearson correlation for selecting features of electronic nose signals. Int. J. Intell. Eng. Syst, 12(6), 81-90.
[4]Urbanowicz, R. J., Meeker, M., La Cava, W., Olson, R. S., & Moore, J. H. (2018). Relief-based feature selection: Introduction and review. Journal of biomedical informatics, 85, 189-203.
[5]Somol, P., Pudil, P., & Kittler, J. (2004). Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on pattern analysis and machine intelligence, 26(7), 900-912.
[6]Liu, L., Kang, J., Yu, J., & Wang, Z. (2005, October). A comparative study on unsupervised feature selection methods for text clustering. In 2005 International Conference on Natural Language Processing and Knowledge Engineering (pp. 597-601). IEEE.
[7]Sharma, M., & Kaur, P. (2021). A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives of Computational Methods in Engineering, 28, 1103-1127.
[8]Ogutu, J. O., Schulz-Streeck, T., & Piepho, H. P. (2012, December). Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. In BMC proceedings (Vol. 6, No. 2, pp. 1-6). BioMed Central.
[9]Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6(2), 182-197.
[10]Bouzoubia, S., Layeb, A., & Chikhi, S. (2014). A multi-objective chemical reaction optimisation algorithm for multi-objective travelling salesman problem. International Journal of Innovative Computing and Applications, 6(2), 87-101.
[11]Cohen, I., Huang, Y., Chen, J., Benesty, J., Benesty, J., Chen, J., ... & Cohen, I. (2009). Pearson correlation coefficient. Noise reduction in speech processing, 1-4.
[12]Ghanem, K., & Layeb, A. (2021). Feature Selection and Knapsack Problem Resolution Based on a Discrete Backtracking Optimization Algorithm. International Journal of Applied Evolutionary Computation (IJAEC), 12(2), 1-15.
[13]Hamla, H., & Ghanem, K. (2021). Comparative Study of Embedded Feature Selection Methods on Microarray Data. In Artificial Intelligence Applications and Innovations: 17th IFIP WG 12.5 International Conference, AIAI 2021, Hersonissos, Crete, Greece, June 25–27, 2021, Proceedings 17 (pp. 69-77). Springer International Publishing.
[14]Layeb, A. (2021). Two novel feature selection algorithms based on crowding distance. arXiv preprint arXiv:2105.05212.
[15]Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.