facebookresearch / agem Goto Github PK

Official implementation of the Averaged Gradient Episodic Memory (A-GEM) in Tensorflow

License: MIT License

Python 34.49% Shell 0.58% Jupyter Notebook 64.93%

agem's Introduction

Efficient Lifelong Learning with A-GEM

This is the official implementation of the Averaged Gradient Episodic Memory (A-GEM) and Experience Replay with Tiny Memories in Tensorflow.

Requirements

TensorFlow >= v1.9.0.

Training

To replicate the results of the paper on a particular dataset, execute (see the Note below for downloading the CUB and AWA datasets):

$ ./replicate_results_iclr19.sh <DATASET> <THREAD-ID> <JE>

Example runs are:

$ ./replicate_results_iclr19.sh MNIST 3      /* Train PNN and A-GEM on MNIST */
$ ./replicate_results_iclr19.sh CUB 1 1      /* Train JE models of RWALK and A-GEM on CUB */

Note

For CUB and AWA experiments, download the dataset prior to running the above script. Run following for downloading the datasets:

$ ./download_cub_awa.sh

The plotting code is provided under the folder plotting_code/. Update the paths in the plotting code accordingly.

Experience Replay

The code provides an implementation of experience replay (ER) with reservoir sampling on MNIST and CIFAR datasets. To run the ER experiments execute the following script:

$ ./replicate_results_er.sh

When using this code, please cite our papers:

@inproceedings{AGEM,
  title={Efficient Lifelong Learning with A-GEM},
  author={Chaudhry, Arslan and Ranzato, Marc’Aurelio and Rohrbach, Marcus and Elhoseiny, Mohamed},
  booktitle={ICLR},
  year={2019}
}

@article{chaudhryER_2019,
  title={Continual Learning with Tiny Episodic Memories},
  author={Chaudhry, Arslan and Rohrbach, Marcus and Elhoseiny, Mohamed and Ajanthan, Thalaiyasingam and Dokania, Puneet K and Torr, Philip HS and Ranzato, Marc’Aurelio},
  journal={arXiv preprint arXiv:1902.10486, 2019},
  year={2019}
}

@inproceedings{chaudhry2018riemannian,
  title={Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence},
  author={Chaudhry, Arslan and Dokania, Puneet K and Ajanthan, Thalaiyasingam and Torr, Philip HS},
  booktitle={ECCV},
  year={2018}
}

Questions/ Bugs

For questions, contact the author Arslan Chaudhry ([email protected]).
Feel free to open the bugs if anything is broken.

License

This source code is released under The MIT License found in the LICENSE file in the root directory of this source tree.

agem's People

Stargazers

Watchers

agem's Issues

Reproducing experiments "On Tiny Episodic Memories in Continual Learning"

Hi,

I tried to reproduce your results that you described in section 5.5. of your paper "On Tiny Episodic Memories in Continual Learning" because I couldn't find an implementation in your codebase and the experiment seem relatively easy to reproduce. I'm mostly interested in the results for 20-degrees rotation, where fine-tuning on the second task does not harm performance on the first one, so in fact I am only interested to reproduce this figure:

I've skimmed the paper and listed the following hyperparameters:

MLP with 2 hidden layers, 256 units each, followed by ReLU
SGD with lr=0.1
CrossEntropy loss
Minibatch size=10
A single pass through the whole dataset

Unfortunately, after reproducing the experiments I found that after finishing the first task my network has 96% accuracy on test set in contrast to 85% that you reported and finetuining only on the second task indeed leads to catastrophic forgetting (which is not so catastrophic in this case, but leads to the loss of ~5% of accuracy on the test set).

Could you please provide me any details about your experimental setup? Am I missing something?

LCA implementation

Hi,
I could not find the LCA implementation in this repo. I am wondering where can I find the LCA implementation? Thank you so much in advance

Could you provide the ResNet-18 ImageNet weights ckpt file?

Hi,

I am trying to run the CUB-200 experiments from pre-trained ImageNet weights and was wondering if you could provide the following file:

RESNET18_IMAGENET_CHECKPOINT = './resnet-18-pretrained-imagenet/model.ckpt'

Thanks!

Code for Imagenet

Hi,

I would like to reproduce your results for mini Imagenet; is there a main file I should use ?

Thanks for making your code public, it's greatly appreciated!

-Lucas

Is it R-walk code?

I'm trying to recreate the results of cifar100 and MNIST in r-walk paper, but it doesn't work.
To compare it to your code, i open this Issue.

This repo is based on A-GEM paper(1-epoch setting suggested in GEM) not R-walk.

How do i get the code based on R-walk setting(single/multi-head incremental setting, simple 4-layer Network)?

And there is no reference result(Multi-task learning) in r-walk paper. So, i can't calculate F, I score.
I've already learned one(Multi-task learning). Can I use this?
Well, isn't it unstable?

Number of Runs for Cross-Validate

Dear authors,

I am reading your code, however I found something confusing about the cross-validate-mode. Could you please clarify it for me ?

When cross_validate_mode = True, the results are ftask. However, ftask is actually not the average across all runs, but only for the last run. Did you tune the hyper-parameters by one single run ?

agem/conv_split_cub_hybrid.py

Line 655 in 4542149

return np.mean(ftask)

eval_single_head flag not used

I am interested in seeing the performance of A-GEM and RWALK in the single-head evaluation setting. I noticed you have a eval_single_head flag but you don't really use it in train_task_sequence() or test_task_sequence().
Any plan or suggestions to fix this?
Thanks!

file is too short to be an sstable

Dear author, I made slight changes to your code to adapt to Python3 and TensorFlow2. But I encountered this issue when I run the CUB experiment:
"./replicate_results_iclr19.sh CUB 1 1 "
I have attached the error log for your reference. Could you please help me with that?

Traceback (most recent call last):
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1367, in _do_call
    return fn(*args)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
    target_list, run_metadata)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: file is too short to be an sstable
	 [[{{node save_1/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "conv_split_cub_hybrid.py", line 904, in <module>
    main()
  File "conv_split_cub_hybrid.py", line 880, in main
    args.batch_size, args.num_runs, args.init_checkpoint, args.online_cross_val, args.random_seed)
  File "conv_split_cub_hybrid.py", line 220, in train_task_sequence
    load(loader, sess, init_checkpoint)
  File "conv_split_cub_hybrid.py", line 109, in load
    saver.restore(sess, ckpt_path)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 960, in run
    run_metadata_ptr)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1183, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1361, in _do_run
    run_metadata)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1386, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: file is too short to be an sstable
	 [[node save_1/RestoreV2 (defined at conv_split_cub_hybrid.py:219) ]]

Original stack trace for 'save_1/RestoreV2':
  File "conv_split_cub_hybrid.py", line 904, in <module>
    main()
  File "conv_split_cub_hybrid.py", line 880, in main
    args.batch_size, args.num_runs, args.init_checkpoint, args.online_cross_val, args.random_seed)
  File "conv_split_cub_hybrid.py", line 219, in train_task_sequence
    loader = tf.train.Saver(restore_vars)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1506, in restore_v2
    name=name)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal
    op_def=op_def)
  File "/home/yilu/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1756, in __init__
    self._traceback = tf_stack.extract_stack()```

Implementation of MER

Hi,

In the "On Tiny Episodic Memories in Continual Learning", MER is one of the baselines but I can't find it in the repo. I am wondering did you implement MER in this repo or use the MER from the original repo.

Thank you so much

missing attribute file CUB_attr_in_order.pickle

hi, thank for your effort to public A-GEM. I have some troubles when running replicate_results_iclr19.sh with CUB dataset, i 've download the dataset but missing CUB_attr_in_order.pickle file, I wonder where you take this file or can you tell me it 's format I'll create a new one pls.

Reproduce your results in your ECCV 2018 paper.

Hi Arslan,

Can you please let me know how I can reproduce your results in Table 1 in your paper below.

"Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence" ECCV 2018

There are some input parameters in the code that I don't know what values to set to. For instance, how can I set the model to be Multi-Headed?

I appreciate your help in advance.

Best wishes,
Mahsa

Question About Reproduction Experiment A-GEM

Dear author and everyone,

I got the A-GEM software from here github and tried the reproduction experiment described in the paper, but I could not reproduce it well.

I have tried the following steps.

environment:
-OS: ubuntu18.04
-GPU: RTX2080ti
-Anaconda ver4.8.3 (* by "conda -V")

steps:
mkdir anaconda3/work/zwk_agem
cd anaconda3/work/zwk_agem
git clone https://github.com/facebookresearch/agem
cd agem
conda create -n agem27 python=2.7
conda activate agem27
conda install numpy -y
conda install tensorflow-gpu==1.9.0 -y
conda install opencv -y
conda install matplotlib -y
conda install ipython -y

./replicate_results_iclr19.sh MNIST 3

I ran the above and got the pickle file (PERMUTE_MNIST_HERDING_FC-S_True_A-GEM_...._.pickle).
Then I inputed the pickle file into agem_plots.ipynb.
The resulting accuracy was different from the value stated in the paper.
(Thesis: 89.1%, my result: 65.3%)

Could you please tell me how can I get the same results as yours?
Please let me know if there is any information missing.
Thank you very much.

Sincerely,
Yoichi Goda
e-mail: [email protected]