Skoltech Machine Learning 2021 course project
Gradient history requires large amounts of disk storage. Training a network with 2-4 layers for 5-10 epochs and calculating SG noise statistics may require up to 150Gb of storage and several hours of GPU time.
Experiments in this repository are intended to reproduce some of the results provided in papers https://arxiv.org/abs/1901.06053 and https://arxiv.org/abs/1912.00018. Main goals are:
- to calculate Stochastic Gradient noise for several deep neural networks
- to perform an extensive empirical analysis of the tail-index of the SG noise in these networks
- bring an alternative perspective to the existing approaches for analyzing SGD
A series of experiments for different neural network architectures and datasets was conducted.
- MNIST notebook
- CIFAR10 notebook
- notebook
Due to inner problems of scipy, part of the work with alpha-stable distributions has been made in R.
- R notebook