Modifying one of the Machine Learning Algorithms kNN to Make it Independent of the Parameter k by Re-defining Neighbor

Full Text (PDF, 233KB), PP.12-25

Views: 0 Downloads: 0

Author(s)

Pushpam Kumar Sinha 1,*

1. Department of Mechanical Engineering, Netaji Subhas Institute of Technology, Amhara, Bihta, Patna, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2020.04.02

Received: 29 May 2020 / Revised: 26 Jun. 2020 / Accepted: 15 Jul. 2020 / Published: 8 Aug. 2020

Index Terms

Euclidean distance, k-Nearest Neighbor, Training data, Linearly separable data, Non-linearly separable data

Abstract

When we are given a data set where in based upon the values and or characteristics of attributes each data point is assigned a class, it is known as classification. In machine learning a very simple and powerful tool to do this is the k-Nearest Neighbor (kNN) algorithm. It is based on the concept that the data points of a particular class are neighbors of each other. For a given test data or an unknown data, to find the class to which it is the neighbor one measures in kNN the Euclidean distances of the test data or the unknown data from all the data points of all the classes in the training data. Then out of the k nearest distances, where k is any number greater than or equal to 1, the class to which the test data or unknown data is the nearest most number of times is the class assigned to the test data or unknown data. In this paper, I propose a variation of kNN, which I call the ANN method (Alternative Nearest Neighbor) to distinguish it from kNN. The striking feature of ANN that makes it different from kNN is its definition of neighbor. In ANN the class from whose data points the maximum Euclidean distance of the unknown data is less than or equal to the maximum Euclidean distance between all the training data points of the class, is the class to which the unknown data is neighbor. It follows, henceforth, naturally that ANN gives a unique solution to each unknown data. Where as , in kNN the solution may vary depending on the value of the number of nearest neighbors k. So, in kNN, as k is varied the performance may vary too. But this is not the case in ANN, its performance for a particular training data is unique.

For the training data [1] considered in this paper, the ANN gives 100% accurate result.

 

Cite This Paper

Pushpam Kumar Sinha. " Modifying one of the Machine Learning Algorithms kNN to Make it Independent of the Parameter k by Re-defining Neighbor ", International Journal of Mathematical Sciences and Computing (IJMSC), Vol.6, No.4, pp.12-25, 2020. DOI: 10.5815/ijMSC.2020.04.02

Reference

[1]Dutt S, Chandramouli S and Das AK. Machine Learning. Pearson India Education Services Pvt Ltd. Noida. 2020.

[2]Dhanabal S and Chandramathi Dr S. A Review of various k-Nearest Neighbor Query Processing Techniques. International Journal of Computer Applications 2011; Vol 31-No 7: 14-22.

[3]Uhlmann JK. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 1991; 40: 175-179.

[4]Liu T, Moore AW and Gray A. New Algorithms for Efficient High-Dimensional Nonparametric Classification. Journal of Machine Learning Research 2006; 7: 1135-1158

[5]Sproull RF. Refinements to Nearest Neighbor Searching in k-Dimensional Trees. Algorithmica 1991; 6:579-589.

[6]Li SZ, Chan KL and Wang C. Performance Evaluation of the NFL Method in Image Classification and Retrieval. IEEE Trans on Pattern Analysis and Machine Intelligence 2000; Vol 22-Issue 11.

[7]Zhou Y and Zhang C. Tunable Nearest Neighbor Class. Pattern Recognition 2007: 346-349 pp

[8]Liaw YC, Wu CM and Leou ML. Fast Exact k Nearest Neighbors Search using an Orthogonal Search Tree. Pattern Recognition 2010; Vol 43-Issue 6: 2351-2358.

[9]McNames J. Fast Nearest Neighbor Algorithm based on Principal Axis Search Tree. IEEE Trans on Pattern Analysis and Machine Intelligence 2001;Vol 23-Issue 9:964-976.

[10]Cover TM and Hart PE. Nearest Neighbor Pattern Classification. IEEE Trans Inform. Theory 1967; IT-13:21-27.

[11]Bailey T and Jain AK. A note on Distance weighted k-nearest neighbor rules. IEEE Trans Systems, Man Cybernatics 1978; 8:311-313.

[12]Kollios G, Gunopulos D and Tsotras VJ. Nearest Neighbor Queries in a Mobile Environment. Proceedings of the International Workshop on Spatio-Temporal Database Management 1999: 119-134 pp.

[13]Xia T and Zhang D. Continuous Reverse Nearest Neighbor Monitoring. Proceedings of the IEEE International Conference on Data Engineering 2006.

[14]Song Z and Roussopoulos N. K-nearest neighbor search for moving query point. Proceedings of the International Symposium on Spatial and Temporal Databases 2001: 79-96 pp.