This repository has the implementation of hyperparameter tuning techniques (GridSearchCV and RandomSearchCV) on K-Nearest Neighbour (KNN) algorithm, from scratch.
A 10,000 sampled dataset is randomly generated, having 2 classes 0 and 1. So, its a randomly generated binary classification dataset.
Hyperparameter Tuning is the process of selecting the best set of hyperparameters which will result in the best ML model. Consider KNN algorithm which has a hyperparameter called 'k' (k is the number of nearest neighbours to the query data point).
For different values of k, we get different scores for train and cv/test dataset and then we select best k value given some selection criteria. Now, this set of hyperparameters can be selected randomly or we can hardcode them.
Cross Validation is a technique in which ML models are trained on the subset of the dataset(part of the dataset), which we call training data, and then evaluate the performance of the trained model on the another subset of the same dataset, which we call cross-validated/test dataset.
Without cross validation we can't say much about the trained model's performance on the test data. So, cross validation plays a vital role in Machine Learning.
In this method,the data-set is split into k number of subsets(known as folds). All the subsets/folds of data are trained except 1 which is (k-1) fold/subset which is considered for the evaluation of the trained model. In this method, we iterate k times with a different subset reserved for testing purpose each time.
There are different techniques to perform hyperparameter tuning. The techniques implemented in this project are RandomSearchCV and GridSearchCV.
In Grid Search Cross Validation technique, all the possible parameters combinations are taken into consideration, which makes it computationally more expensive than Random Search CV. Benefit however is that if you run it on a broad parameter space you will get the "best" parameter settings possible.