Comments (2)
Hi,
I'll start with a long explanation :-), and then I'll take your questions.
Regularization can be a means to achieve sparsity - but there is an important distinction between sparsity and pruning which relates to the rest of my answer. Sparsity is a measure of the absolute zeros in a tensor. Pruning algorithms are one approach to achieve sparsity. But the distinction is even deeper.
Consider what happens when we prune connections: we remove those connections entirely from the network which means that no information flows through these connections: neither forward data, nor backward gradients. Practically, we mask both weights during the forward pass, and gradients during the backward pass. But you know this 😉
What happens when we regularize? At first glance, there is no relation between pruning and regularization, because in regularization we just use an added loss term to put “downward pressure” on the weights (individually; or in grouped structures) - We don't remove connections. So no masking should be involved, right?
Well, not quite: we use a “soft-thresholding operator” (i.e. thresholding + masking) to prevent the weights from oscillating around zero (I tried to show this in this notebook using L1 regularization on a toy example).
OK, so when we regularize, we also mask the weights – but what about the gradients? No, we leave the gradients alone, because we don’t want to completely remove the regularized connections from the network: i.e. we want the regularized connections to continue passing information in the backward direction. Another way to look at this difference between pruning and regularization: pruned connections are removed forever, however regularized connections that are masked out (they are below some threshold) can sometimes grow back in size.
-
" I think it's natural that this happens on the end of one epoch or end of whole training when the regularization terms have been decreased enough for pruning."
This is an interesting idea. If we implemented it, we wouldn't be able to easily see in the logs the sparsity of the weights during part of the training (because we wouldn't have absolute zeros, most likely). But this is not the reason I chose to threshold regularization at the end of each mini-batch. You see, pruning is iterative and therefore not "continuous": We prune, then we fine-tune for a "long" time, then we prune some more, and fine-tune some more, and so on. Regularization is "continuous" by definition: every time we compute the data loss, we also also compute the regularization loss. And as far as I understand, the “soft-thresholding operator” is part of every regularization calculation (on_minibatch_end
).
BTW, you can also configure the regularizer not to threshold. BTW 2: Today we can only prune at the beginning epochs, but in the future I want to allow scheduling of pruning at the mini-batch granularity. -
"The regularization and pruning both use the same zeros_mask_dict, it may brings some messes." This is a good comment and it tells me that I didn't document the interaction between pruning and regularization. I think that when you choose to mix these two, you want to smoothly push the solution towards sparsity (using the regularization loss term), but prune using a more "clumsy" pruning schedule. Now, the only reason to use a pruner when you're already using a regularizer, is if the pruner is more aggressive than the regularizer (otherwise, the pruner does nothing - it's mask is below the regularization mask). To sum up: if you're both pruning and regularizing, don't enable the regularizer's mask.
-
"What is the purpose of keeping the regulatization mask of the last epoch. I guess it may be used by some remover in
thinning.py
, right?" - Correct: we keep the mask to get sparsity, which we can exploit to remove structures (thinning.py
).
Thanks for the interesting comments,
Neta
from distiller.
@hunterkun I'm closing because this has been idle for 19 days. If you have questions remaining we can reopen, or use another issue.
Cheers,
Neta
from distiller.
Related Issues (20)
- Could you provide the checkpoints for structural pruning experiments?
- checkpoints example in example jupyter-notebook download denied. HOT 1
- Support for PyTorch 1.7? HOT 3
- Can't install pyglet, even when i cloned it form github
- Higher than 8-bit Quantization not working properly!?
- yolo4 custom object detection deep compression
- Why can't I use multi-GPU training
- How can I use the distilled model in embedded device?
- Combining quantization and pruning in Distiller
- Issue running compress_classifier.py HOT 1
- Reduce the yolov3 model size of keras(.h5) or darknet(.weight)
- Quantization don't reduce the model file size
- How to train my original dataset in distiller? HOT 1
- Error running 'pip install mintapi' on Raspberry Pi
- --load-serialized will make model fail to prune HOT 1
- QAT for LSTM
- outdated requirements? HOT 2
- Sensitivity Analysis
- Does it support translation model?
- Load quantization aware model checkpoint (inference) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from distiller.