hyperparameter-tuning-from-scratch

This repository has the implementation of hyperparameter tuning techniques (GridSearchCV and RandomSearchCV) on K-Nearest Neighbour (KNN) algorithm, from scratch.

Dataset used

A 10,000 sampled dataset is randomly generated, having 2 classes 0 and 1. So, its a randomly generated binary classification dataset.

Hyperparameter Tuning

Hyperparameter Tuning is the process of selecting the best set of hyperparameters which will result in the best ML model. Consider KNN algorithm which has a hyperparameter called 'k' (k is the number of nearest neighbours to the query data point).
For different values of k, we get different scores for train and cv/test dataset and then we select best k value given some selection criteria. Now, this set of hyperparameters can be selected randomly or we can hardcode them.

Cross Validation

Cross Validation is a technique in which ML models are trained on the subset of the dataset(part of the dataset), which we call training data, and then evaluate the performance of the trained model on the another subset of the same dataset, which we call cross-validated/test dataset.
Without cross validation we can't say much about the trained model's performance on the test data. So, cross validation plays a vital role in Machine Learning.

K-Fold Cross Validation

In this method,the data-set is split into k number of subsets(known as folds). All the subsets/folds of data are trained except 1 which is (k-1) fold/subset which is considered for the evaluation of the trained model. In this method, we iterate k times with a different subset reserved for testing purpose each time.

Hyperparameter Tuning techniques

There are different techniques to perform hyperparameter tuning. The techniques implemented in this project are RandomSearchCV and GridSearchCV.

1. RandomSearchCV

In Random Search Cross Validation technique, the hyperparameter values are selected randomly from a given range. This technique is computationally more optimized way of selecting hyperparameters compared to GridSearchCV.

2. GridSearchCV

In Grid Search Cross Validation technique, all the possible parameters combinations are taken into consideration, which makes it computationally more expensive than Random Search CV. Benefit however is that if you run it on a broad parameter space you will get the "best" parameter settings possible.

sachelsout / hyperparameter-tuning-from-scratch Goto Github PK