Jason Fang - Projects

This is the process of how I implemented a K-Nearest Neighbors algorithm from scratch. K-Nearest Neighbors is one of the most popular machine learning models. To understand how it really works, I implemented the algorithm from scratch. In the end, I compared my KNN model to sklearn’s KNN model. The accuracies are almost the same.

By defining the euclidean distance, we could find the nearest neighbors of a data point. However, if we only use the euclidean distance function, we can only solve two-dimension features problems. I chose to use NumPy square root to solve this problem.

During the implementation process, I also defined the confidence of the vote result, so that I can see the wrong predictions’ confidences. .

Also, I wanted to compare my KNN model with the sklearn KNN model. I used a dataset from UCI Machine Learning Repository to test my accuracy.

Run a For Loop for 25 times ,and I get an accuracy of 0.9663.

After I tried the KNN model from sklearn. Its accuracy is higher than mine which is 0.9857. I can’t say my version is as good as the sklearn’s model, but in this case, these two KNN models are almost identical in terms of accuracy.

Software used: Python, Pycharm, Git

Packages used: pandas, numpy, random, matplotlib, sklearn, warnings

Click the "Code" button to see the code of this project

Portfolio Code