(UAI 2022) Sup-tickets

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training
Lu Yin, Vlado Menkovski, Meng Fang, Tianjin Huang, Yulong Pei, Mykola Pechenizkiy, Decebal Constantin Mocanu, Shiwei Liu
https://arxiv.org/abs/2205.15322

Abstract: Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch. Existing sparse training methods usually strive to find the best sparse subnetwork possible in one single run, without involving any expensive dense or pre-training steps. For instance, dynamic sparse training (DST), is capable of reaching a competitive performance of dense training by iteratively evolving the sparse topology during the course of training. In this paper, we argue that it is better to allocate the limited resources to create multiple low-loss sparse subnetworks and superpose them into a stronger one, instead of allocating all resources entirely to find an individual subnetwork. To achieve this, two desiderata are required: (1) efficiently producing many low-loss subnetworks, the so-called cheap tickets, within one training process limited to the standard training time used in dense training; (2) effectively superposing these cheap tickets into one stronger subnetwork. To corroborate our conjecture, we present a novel sparse training approach, termed Sup-tickets, which can satisfy the above two desiderata concurrently in a single sparse-to-sparse training process. Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods and demonstrates consistent performance improvement.

This code base is created by Lu Yin [email protected] during his Ph.D. at Eindhoven University of Technology.

This repository contains implementaions of sparse training methods: GraNet, RigL, In-Time Over-Parameterization, SET

The implementation is heavily based on Shiwei Liu' implemenation for GraNet

Requirements

The library requires Python 3.7, PyTorch v1.10.0, and CUDA v11.3.1. Other version of Pytorch should also work.

How to Run Experiments

Options

Options for sparse training
* --sparse - Enable sparse mode (remove this if want to train dense model)
* --method - type of sparse training method. Choose from: GraNet, GraNet_uniform, DST, GMP, GMP_uniform
* --sparse-init - type of sparse initialization. Choose from: ERK, uniform, GMP, prune_uniform, prune_global, prune_and_grow_uniform, prune_and_grow_global, prune_structured, prune_and_grow_structured
* --model (str) - type of networks
* --growth (str) - growth mode. Choose from: random, gradient, momentum
* --prune (str) - removing mode. Choose from: magnitude, SET, threshold
* --redistribution (str) - redistribution mode. Choose from: magnitude, nonzeros, or none. (default none)
* --init-density (float) - initial density of the sparse model. (default 0.50)
* --final-density (float) - target density of the sparse model. (default 0.05)
* --init-prune-epoch (int) - the starting epoch of gradual pruning.
* --final-prune-epoch (int) - the ending epoch of gradual pruning.
* --prune-rate (float) - The pruning rate for Zero-Cost Neuroregeneration.
* --update-frequency (int) - number of training iterations between two steps of zero-cost neuroregeneration.



Options for creating and superposing cheap tickets
* --superposing_method - indicate_method for superposing
* --CIMA_factor -CIMA factor for superposing
* --cyc_lr - whether using cyc_lr decay
* --cycle - number of epochs to train in each cycle(default: 4
* --cycle_rate - the ratio of cycle trainig of whole training budges
* --lr_1 - maximum learning rate of cyclic RL schedual (default: 0.05)
* --lr_2 - minimum learning rate of cyclic RL schedual (default: 0.0001)

CIFAR-10/100 Experiments

cd CIFAR

Superpose cheap tickets created by GraNet (s_i = 5) at sparsity 0.95:

python3 main_suptickets.py --indicate_method granet --update_bn --pre_train --cyclic_train --re_explore --no_maskupdates --cyc_lr --lr_2 0.005 --lr_1 0.001 --cycle 8 --cycle_rate 0.1 --sparse  --decay-schedule constant --seed 41 --sparse-init ERK --update-frequency 1000 --batch-size 128 --prune-rate 0.5 --large-death-rate 0.5 --method GraNet --growth gradient --prune magnitude --init-density 0.5 --final-density 0.05  --epochs 250  --model ResNet50 --data cifar100

Superpose cheap tickets created by Rigl at sparsity 0.95:

python3 main_suptickets.py --indicate_method rigl --update_bn --pre_train --cyclic_train --re_explore --no_maskupdates --cyc_lr --lr_2 0.005 --lr_1 0.001 --cycle 8 --cycle_rate 0.1 --sparse  --decay-schedule constant --seed 41 --sparse-init ERK --update-frequency 1000 --batch-size 128 --prune-rate 0.5 --large-death-rate 0.5 --method DST --growth gradient --prune magnitude --init-density 0.05 --final-density 0.05  --epochs 250  --model ResNet50 --data cifar100

Superpose cheap tickets created by SET at sparsity 0.95:

python3 main_suptickets.py --indicate_method set --update_bn --pre_train --cyclic_train --re_explore --no_maskupdates --cyc_lr --lr_2 0.005 --lr_1 0.001 --cycle 8 --cycle_rate 0.1 --sparse  --decay-schedule constant --seed 41 --sparse-init ERK --update-frequency 1000 --batch-size 128 --prune-rate 0.5 --large-death-rate 0.5 --method DST --growth random --prune magnitude --init-density 0.05 --final-density 0.05  --epochs 250  --model ResNet50 --data cifar100

Imagenet Experiments

cd ImageNet

Superpose cheap tickets created by GraNet (s_i = 5) at sparsity 0.90:

python $1multiproc.py --nproc_per_node 2 $1main.py --sparse --sparse-init ERK --first_m 30 --second_m 60 --third_m 85 --method DST --init-prune-epoch 0 --final-prune-epoch 30 --init-density 0.5  --final-density 0.1  --multiplier 1 --growth gradient --seed 17 --master_port 7768 -j20 -p 500 --arch resnet50 -c fanin --update-frequency 4000 --label-smoothing 0.1 -b 64 --lr 0.1 --warmup 5 --density 0.1  $2 ../../imagenet2012/ --epochs 100 --indicate_method Rigl --cyclic_epochs 8 --pretrain_epoch 92 --pre_train --cyclic_train --bn_update --lr_2 0.0005 --lr_1 0.0001 --cycle 2 --large-death-rate 0.5 --cyc_lr

Superpose cheap tickets created by Rigl at sparsity 0.90:

python $1multiproc.py --nproc_per_node 2 $1main.py --sparse --sparse-init ERK --first_m 30 --second_m 60 --third_m 85 --method GraNet --init-prune-epoch 0 --final-prune-epoch 30 --init-density 0.1  --final-density 0.1  --multiplier 1 --growth gradient --seed 17 --master_port 7768 -j20 -p 500 --arch resnet50 -c fanin --update-frequency 4000 --label-smoothing 0.1 -b 64 --lr 0.1 --warmup 5 --density 0.1  $2 ../../imagenet2012/ --epochs 100 --indicate_method Granet --cyclic_epochs 8 --pretrain_epoch 92 --pre_train --cyclic_train --bn_update --lr_2 0.0005 --lr_1 0.0001 --cycle 2 --large-death-rate 0.5 --cyc_lr

Citation

if you find this repo is helpful, please cite

@article{yin2022superposing,
  title={Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training},
  author={Yin, Lu and Menkovski, Vlado and Fang, Meng and Huang, Tianjin and Pei, Yulong and Pechenizkiy, Mykola and Mocanu, Decebal Constantin and Liu, Shiwei},
  journal={arXiv preprint arXiv:2205.15322},
  year={2022}
}

luuyin / sup-tickets Goto Github PK

sup-tickets's Introduction

(UAI 2022) Sup-tickets

Requirements

How to Run Experiments

Options

CIFAR-10/100 Experiments

Superpose cheap tickets created by GraNet (s_i = 5) at sparsity 0.95:

Superpose cheap tickets created by Rigl at sparsity 0.95:

Superpose cheap tickets created by SET at sparsity 0.95:

Imagenet Experiments

Superpose cheap tickets created by GraNet (s_i = 5) at sparsity 0.90:

Superpose cheap tickets created by Rigl at sparsity 0.90:

Citation

sup-tickets's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent