A hybrid optimization algorithm combining gradient descent for network parameter optimization coupled with an evolutionary algorithm for network topology and training hyperparameter optimization.
The following graphs show four test runs. All runs used the same set of initial hyperparameters for a fully connected neural network trained on the Fashion-MNIST dataset.
Test loss | Test accuracy |
Learning rate | Batch size | Dropout rate | Weight decay |
Neurons hidden layer 1 | Neurons hidden layer 2 | Neurons hidden layer 3 | Neurons hidden layer 4 |
Currently, only fully-connected neural networks (Multilayer perceptrons (MLPs)) are supported.
This implementation comes with different types of mutation operators for hyperparameter and network topology optimization. Although genetic optimization already works reasonably well with these simple rules, many improvements are conceivable.
The Proportional Mutation Operator changes the hyperparameter's value proportionally to the value's
size. The rate of change is defined by the local_mutation_rate
parameter.
eta = rand_sign() * value * local_mutation_rate
value = value + eta
The Discrete Mutation Operator changes the hyperparameter's value according to a predefined step size.
eta = rand_sign() * step_size
value = value + eta
To speed up genetic optimization, a custom dataloader allows to train every epoch on a new set of randomly selected data coming from the original dataset using a random mapping from subset indices to original dataset indices:
subset_length = int(len(data) * subset_ratio)
rand_map = random.sample(list(range(len(data))), subset_length)
Random subsets are also used for testing the current population to further accelerate the genetic optimization process. Setting the random subset ratio to small values such as 1 % significantly accelerates optimization.