Comments (3)
Thanks for your comment! You’re right that the order in which the square and average operations are applied matters, exactly for the reason you point out. But the code in my repository does use the correct order, because the Fisher Information matrix is calculated with batches of size 1 (see here).
I admit this might not be the most efficient implementation (alternative suggestions are very welcome!), but the reason I chose this implementation for now is that (as far as I’m aware) in PyTorch it is currently not possible to access the gradients of individual elements in a sum.
from continual-learning.
I see. Estimating the Fisher information matrix with a single example seems to result in high variance (see this notebook for example). Have you seen any difference if a larger batch size is used? As you said, I don't think it's possible to access individual gradients with just one backward call.
from continual-learning.
Sorry, I should have been clearer, I realise my use of ‘batches’ is a bit confusing here. What I meant is that the backward passes(*) are done one-by-one for each sample used the calculate the Fisher Information matrix. The number of samples used to calculate the Fisher Information matrix is typically not 1 (this is set by the option --fisher-n
; the default is using the full training-set).
(*) actually also the forward passes; I now realise this could be made more efficient by at least performing the forward passes with larger ‘batches’. I’ll look into that when I get some time.
from continual-learning.
Related Issues (20)
- Performance HOT 7
- Datasets more complicated than MNIST HOT 1
- Just a request
- Grad in SI HOT 4
- Wrong dataset? HOT 2
- why batch_size has to be 1 when update fisher? HOT 1
- Lower/Upper Bound Experiments HOT 2
- one little confusion about the loss_fn_kd function HOT 1
- Suspicious Precision HOT 3
- Link error HOT 2
- Reproducing BI+SI method HOT 9
- about kafc fisher infromation matrix HOT 1
- How to create Resnet34 HOT 2
- Joint training results different for different types of incremental learning? HOT 3
- Task-IL evaluation HOT 2
- Single head or multihead task incremental HOT 1
- 0 accuracy values for task-free setting HOT 9
- Whether context identity must be inferred in case of domain increment? HOT 1
- Question about Online EWC HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from continual-learning.