Hi! Firstly, I wanted to tell thank you for your great work. It was really interesting

Reproducing BI+SI method about continual-learning HOT 9 CLOSED

valeriya-khan commented on May 28, 2024

Reproducing BI+SI method

from continual-learning.

Comments (9)

GMvandeVen commented on May 28, 2024

Hi, thanks for your interest in the code! To reproduce the results from BI-R + SI as reported in the papers "Brain-inspired replay for continual learning with artificial neural networks" and "Class-incremental learning with generative classifiers", it is best to use not this repository, but the repository accompanying the brain-inspired replay paper (https://github.com/GMvandeVen/brain-inspired-replay). In that repository, the command python main_cl.py --experiment=CIFAR100 --scenario=class --brain-inspired --replay=generative --si --dg-prop=0.6 --c=100000000. can be used to run the same BI-R + SI experiment on the class-incremental version of Split-CIFAR100 as reported in the above two papers.

In this repository (https://github.com/GMvandeVen/continual-learning) there are a few things implemented slightly differently compared to the brain-inspired replay paper. For example, the default in this repository is to run class-incremental learning experiments with an output layer in which always the output units of all classes are set to active, while in the brain-inspired replay paper an "expanding head" was used. (See for example the explanation under the header "BI-R" in the methods section (top of p.14) of this paper.) In the paper accompanying this repository I only tested BI-R by itself, and not combined with SI. I expect the reason that the second experiment you describe fails, is because using SI with a very high regularization strength might be problematic in a class-incremental learning experiment with always all units set to active (while it is OK with an expanding head).

Hope this helps!

from continual-learning.

valeriya-khan commented on May 28, 2024

Thank you very much for such a detailed answer :)
Can I ask why you excluded BI-R + SI from new paper results table? It gave one of the best results if I am not mistaken, especially among generative methods.
Also I still tried to reproduce BI-R+SI on new repository too with one feature extractor, given in the repository, and with argument --active-classes="all-so-far". But the results differ a lot. What can be the reason? Was something else changed not mentioned in your paper?
Thank you very much for you time :)

from continual-learning.

GMvandeVen commented on May 28, 2024

Regarding the first part of your comment, that's a good question. In this new paper (although the preprint of this new paper is older than the paper on brain-inspired replay) I didn't include BI-R + SI in the comparison because the goal of the experiments in this paper is not to verify/champion a method as achieving state-of-the-art performance, but rather the goal is to compare the performance of different computational strategies for continual learning, and to do that on each of the three continual learning scenarios. To do this I tried to select for each strategy a few representative example methods. As the approach BI-R + SI combines two of those strategies, it wasn't suitable for this comparison. (But if you are interested in doing as well as possible on some continual learning problem, it might indeed often be best to combine multiple strategies.)

Regarding the second part of your comment, it seems you are right. Thank you for pointing this out. It indeed seems to be the case that also when using the argument --active-classes="all-so-far", the performance of BI-R + SI with the code in this repository is somewhat lower than the performance of BI-R + SI with the code in the repository of the brain-inspired replay paper. I will try to figure out what is causing this difference!

from continual-learning.

valeriya-khan commented on May 28, 2024

Thank you very much for the answer! Can I leave this issue open as a mean of communication? If you will find what causes the difference I would be happy to hear from you :)

from continual-learning.

GMvandeVen commented on May 28, 2024

Yes, please leave the issue open. I'm intending to get back on this when I figure it out!

from continual-learning.

GMvandeVen commented on May 28, 2024

Hi, I found one difference in the implementation of BI-R + SI between this repository and the repository of the BI-R paper, which seems to explain at least most of the difference in results you got. In the repository of the BI-R paper the method SI is only applied to the layers of the classifier (so not to the the layers of the decoder network), while in this repository the method SI is by default applied to all the layers of the network (so also to the layers of the decoder network).
The lines in the repository of the BI-R paper where this is specified are here: https://github.com/GMvandeVen/brain-inspired-replay/blob/1a030f75666c656416e1ca02466758ca32cf2fe4/train.py#L296-L318
To mimic this behavior in this repository, you could replace the following line:

continual-learning/models/cl/continual_learner.py

Lines 19 to 20 in b4bd69a

    
           self.param_list = [self.named_parameters]  #-> lists the parameters to regularize with SI or diagonal Fisher 
        
                                                      #   (default is to apply it to all parameters of the network)

by self.param_list = [self.convE.named_parameters, self.fcE.named_parameters, self.classifier.named_parameters].
Note that the approach BI-R + SI can also work quite well when SI is applied to all the layers of the network, but this setting has different optimal hyper-parameter values (in particular the hyperparameter --dg-prop should be lower).
Hope this helps!

from continual-learning.

valeriya-khan commented on May 28, 2024

Hi! Thank you very much for your help. Now, I was able to obtain 30% accuracy which is much better than before. If it is not difficult, can you tell what can be the reason of 2-5% difference between brain-inspired repository and this repository? Even with all-so-far option. Are there any other implementation differences? Thank you for your help :)

from continual-learning.

GMvandeVen commented on May 28, 2024

Hi, there are quite some other differences between the code in this repository and the other repository, but I haven’t been able to figure out which of those differences could cause a difference in performance when combining BI-R and SI. For example, one quite large difference is that, when a fixed feature extractor is used (i.e., the option --freeze-convE), in this repository all data are put through the feature extractor once at the beginning (which speeds up things considerably), while in the other repository the data are put through the feature extractor every time they are presented to the network. In principle I don’t think this difference should lead to a difference in performance, but perhaps for some reason it does.

If it is important to replicate the performance reported in the brain-inspired replay paper, my suggestion would be the use the original repository accompanying that paper (this one). Otherwise it should be fine to use this repository.

from continual-learning.

valeriya-khan commented on May 28, 2024

Thank you very much for your explanations. I want to replicate the results here, as I like that this repository includes other methods additionally to generative and regularization. Thank you very much for your help. I will close this issue, if you will remember something else, please reopen it, or write on my email: [email protected]. Have a nice day!

from continual-learning.

Reproducing BI+SI method about continual-learning HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	self.param_list = [self.named_parameters] #-> lists the parameters to regularize with SI or diagonal Fisher
	# (default is to apply it to all parameters of the network)