The wsn from ihaeyong

Forget-free Continual Learning with Winning Subnetworks-ICML2022

This is the official implementation of WSN in the paper in Pytorch.

Dependency

PyTorch > 1.5

Dataset

Permuted MNIST (available current version)
5 Datasets (available current version)
Omniglot Rotation (available current version)
CIFAR-100 Split (available current version)
CIFAR-100 Superclass (available current version)
TinyImageNet (available current version)

Installation

To execute the codes for running experiments, run the following.

pip install -r requirements.txt

Training

We provide several training examples with this repositories:

To train WSN on Permuted MNIST on GPU [GPU_ID] with seed number [SEED] and sparsity [SPARSITY], simply run the following

>> ./scripts/wsn/wsn_pmnist.sh [GPU_ID] [SEED] [SPARSITY]

To train WSN on Cifar100-100 on GPU [GPU_ID] with seed number [SEED] and sparsity [SPARSITY], simply run the following

>> ./scripts/wsn/wsn_cifar100_100.sh [GPU_ID] [SEED] [SPARSITY]

To train WSN + FSO on Cifar100-100 on GPU [GPU_ID] with seed number [SEED] and sparsity [SPARSITY], simply run the following

>> update soon

References

Haeyong Kang, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, 
Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, Chang D Yoo., 
Forget-free Continual Learning with Winning Subnetworks-ICML2022

about the experimental setting for tinyImagenet

Hi,

Thanks for continuing to update the released code!

When I read the paper and the relevant code from this repo, I have several questions about the setting of tinyImagenet experiments:

1. Which setting were the experiments on tiny-ImageNet conducted?

I went through your paper and did not find any descriptions on this point. I printed the model architecture of tiny-Imagenet experiments and found that each classification head has an output of 200. Based on my understanding, shouldn't they be 40 classifiers with output=5 in each classifier?

SubNet(
  (drop1): Dropout(p=0.0, inplace=False)
  (drop2): Dropout(p=0.0, inplace=False)
  (conv1): SubnetConv2d(3, 160, kernel_size=(3, 3), stride=2, padding=(1, 1), bias=False)
  (conv2): SubnetConv2d(160, 160, kernel_size=(3, 3), stride=2, padding=(1, 1), bias=False)
  (conv3): SubnetConv2d(160, 160, kernel_size=(3, 3), stride=2, padding=(1, 1), bias=False)
  (conv4): SubnetConv2d(160, 160, kernel_size=(3, 3), stride=2, padding=(1, 1), bias=False)
  (linear1): SubnetLinear(in_features=2560, out_features=640, bias=False)
  (linear2): SubnetLinear(in_features=640, out_features=640, bias=False)
  (last): ModuleList(
    (0): Linear(in_features=640, out_features=200, bias=False)
    (1): Linear(in_features=640, out_features=200, bias=False)
    (2): Linear(in_features=640, out_features=200, bias=False)
    (3): Linear(in_features=640, out_features=200, bias=False)
    (4): Linear(in_features=640, out_features=200, bias=False)
    (5): Linear(in_features=640, out_features=200, bias=False)
    (6): Linear(in_features=640, out_features=200, bias=False)
    (7): Linear(in_features=640, out_features=200, bias=False)
    (8): Linear(in_features=640, out_features=200, bias=False)
    (9): Linear(in_features=640, out_features=200, bias=False)
    (10): Linear(in_features=640, out_features=200, bias=False)
    (11): Linear(in_features=640, out_features=200, bias=False)
    (12): Linear(in_features=640, out_features=200, bias=False)
    (13): Linear(in_features=640, out_features=200, bias=False)
    (14): Linear(in_features=640, out_features=200, bias=False)
    (15): Linear(in_features=640, out_features=200, bias=False)
    (16): Linear(in_features=640, out_features=200, bias=False)
    (17): Linear(in_features=640, out_features=200, bias=False)
    (18): Linear(in_features=640, out_features=200, bias=False)
    (19): Linear(in_features=640, out_features=200, bias=False)
    (20): Linear(in_features=640, out_features=200, bias=False)
    (21): Linear(in_features=640, out_features=200, bias=False)
    (22): Linear(in_features=640, out_features=200, bias=False)
    (23): Linear(in_features=640, out_features=200, bias=False)
    (24): Linear(in_features=640, out_features=200, bias=False)
    (25): Linear(in_features=640, out_features=200, bias=False)
    (26): Linear(in_features=640, out_features=200, bias=False)
    (27): Linear(in_features=640, out_features=200, bias=False)
    (28): Linear(in_features=640, out_features=200, bias=False)
    (29): Linear(in_features=640, out_features=200, bias=False)
    (30): Linear(in_features=640, out_features=200, bias=False)
    (31): Linear(in_features=640, out_features=200, bias=False)
    (32): Linear(in_features=640, out_features=200, bias=False)
    (33): Linear(in_features=640, out_features=200, bias=False)
    (34): Linear(in_features=640, out_features=200, bias=False)
    (35): Linear(in_features=640, out_features=200, bias=False)
    (36): Linear(in_features=640, out_features=200, bias=False)
    (37): Linear(in_features=640, out_features=200, bias=False)
    (38): Linear(in_features=640, out_features=200, bias=False)
    (39): Linear(in_features=640, out_features=200, bias=False)
  )
)

2. Class incremental loader for tiny-Imagenet
I saw that the data loader of tiny-Imagenet was built with loader_type='class_incremental_loader'. Even though with "class incremental setting", shouldn't the 40 classifiers have output like this (0) output=5; (1) output=10; (2) output=15;...; (39) output=200?

Sorry, I am a freshman in the field of Continual Learning. Looking forward to getting any replies.

Best,

ihaeyong / wsn Goto Github PK

wsn's Introduction

Forget-free Continual Learning with Winning Subnetworks-ICML2022

Dependency

Dataset

Installation

Training

References

wsn's People

Contributors

Stargazers

Watchers

Forkers

wsn's Issues

About the BatchNorm layer setting

about the experimental setting for tinyImagenet

The code related to s-cifar10

is this work similar to yours?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent