intellabs / distiller Goto Github PK

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

License: Apache License 2.0

Python 23.98% CSS 0.01% Jupyter Notebook 76.01%

pytorch pruning quantization pruning-structures jupyter-notebook network-compression deep-neural-networks regularization group-lasso distillation

distiller's Introduction

⚠️ DISCONTINUATION OF PROJECT - This project will no longer be maintained by Intel. This project has been identified as having known security escapes. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project.

Wiki and tutorials | Documentation | Getting Started | Algorithms | Design | FAQ

Distiller is an open-source Python package for neural network compression research.

Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.

Table of Contents
Highlighted features
Installation
Getting Started
- Basic Usage Examples
- Explore the sample Jupyter notebooks
Running the tests
Generating the HTML documentation site
Versioning
License
Community
Acknowledgments
Disclaimer

Highlighted features

Automatic Compression
- Automated Model Compression (AMC)
Weight pruning
- Element-wise pruning using magnitude thresholding, sensitivity thresholding, target sparsity level, and activation statistics
Structured pruning
- Convolution: 2D (kernel-wise), 3D (filter-wise), 4D (layer-wise), and channel-wise structured pruning.
- Fully-connected: column-wise and row-wise structured pruning.
- Structure groups (e.g. structures of 4 filters).
- Structure-ranking with using weights or activations criteria (Lp-norm, APoZ, gradients, random, etc.).
- Support for new structures (e.g. block pruning)
Control
- Soft (mask on forward-pass only) and hard pruning (permanently disconnect neurons)
- Dual weight copies (compute loss on masked weights, but update unmasked weights)
- Model thinning (AKA "network garbage removal") to permanently remove pruned neurons and connections.
Schedule
- Flexible scheduling of pruning, regularization, and learning rate decay (compression scheduling)
- One-shot and iterative pruning (and fine-tuning) are supported.
- Easily control what is performed each training step (e.g. greedy layer by layer pruning to full model pruning).
- Automatic gradual schedule (AGP) for pruning individual connections and complete structures.
- The compression schedule is expressed in a YAML file so that a single file captures the details of experiments. This dependency injection design decouples the Distiller scheduler and library from future extensions of algorithms.
Element-wise and filter-wise pruning sensitivity analysis (using L1-norm thresholding). Examine the data from some of the networks we analyzed, using this notebook.
Regularization
- L1-norm element-wise regularization
- Group Lasso an group variance regularization
Quantization
- Automatic mechanism to transform existing models to quantized versions, with customizable bit-width configuration for different layers. No need to re-write the model for different quantization methods.
- Post-training quantization of trained full-precision models, dynamic and static (statistics-based)
- Support for quantization-aware training in the loop
Knowledge distillation
- Training with knowledge distillation, in conjunction with the other available pruning / regularization / quantization methods.
Conditional computation
- Sample implementation of Early Exit
Low rank decomposition
- Sample implementation of truncated SVD
Lottery Ticket Hypothesis training
Export statistics summaries using Pandas dataframes, which makes it easy to slice, query, display and graph the data.
A set of Jupyter notebooks to plan experiments and analyze compression results. The graphs and visualizations you see on this page originate from the included Jupyter notebooks.
- Take a look at this notebook, which compares visual aspects of dense and sparse Alexnet models.
- This notebook creates performance indicator graphs from model data.
Sample implementations of published research papers, using library-provided building blocks. See the research papers discussions in our model-zoo.
Logging to the console, text file and TensorBoard-formatted file.
Export to ONNX (export of quantized models pending ONNX standardization)

Installation

These instructions will help get Distiller up and running on your local machine.

1. Clone Distiller

Clone the Distiller code repository from github:

$ git clone https://github.com/IntelLabs/distiller.git

The rest of the documentation that follows, assumes that you have cloned your repository to a directory called distiller.

2. Create a Python virtual environment

We recommend using a Python virtual environment, but that of course, is up to you. There's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness.
Before creating the virtual environment, make sure you are located in directory distiller. After creating the environment, you should see a directory called distiller/env.

Using virtualenv

If you don't have virtualenv installed, you can find the installation instructions here.

To create the environment, execute:

$ python3 -m virtualenv env

This creates a subdirectory named env where the python virtual environment is stored, and configures the current shell to use it as the default python environment.

Using venv

If you prefer to use venv, then begin by installing it:

$ sudo apt-get install python3-venv

Then create the environment:

$ python3 -m venv env

As with virtualenv, this creates a directory called distiller/env.

Activate the environment

The environment activation and deactivation commands for venv and virtualenv are the same.
!NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages:

$ source env/bin/activate

3. Install the Distiller package

Finally, install the Distiller package and its dependencies using pip3:

$ cd distiller
$ pip3 install -e .

This installs Distiller in "development mode", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository).

Notes:

Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
If you are not using a GPU, you might need to make small adjustments to the code.

Required PyTorch Version

Distiller is tested using the default installation of PyTorch 1.3.1, which uses CUDA 10.1. We use TorchVision version 0.4.2. These are included in Distiller's requirements.txt and will be automatically installed when installing the Distiller package as listed above.

If you do not use CUDA 10.1 in your environment, please refer to PyTorch website to install the compatible build of PyTorch 1.3.1 and torchvision 0.4.2.

Getting Started

Distiller comes with sample applications and tutorials covering a range of model types:

Model Type	Sparsity	Post-training quantization	Quantization-aware training	Auto Compression (AMC)	Knowledge Distillation
Image classification	✅	✅	✅	✅	✅
Word-level language model	✅	✅
Translation (GNMT)		✅
Recommendation System (NCF)		✅
Object Detection	✅

Head to the examples directory for more details.

Other resources to refer to, beyond the examples:

Basic Usage Examples

The following are simple examples using Distiller's image classifcation sample, showing some of Distiller's capabilities.

Training-only
Getting parameter statistics of a sparsified model
Post-training quantization

Example: Simple training-only session (no compression)

The following will invoke training-only (no compression) of a network named 'simplenet' on the CIFAR10 dataset. This is roughly based on TorchVision's sample Imagenet training application, so it should look familiar if you've used that application. In this example we don't invoke any compression mechanisms: we just train because for fine-tuning after pruning, training is an essential part.

Note that the first time you execute this command, the CIFAR10 code will be downloaded to your machine, which may take a bit of time - please let the download process proceed to completion.

The path to the CIFAR10 dataset is arbitrary, but in our examples we place the datasets in the same directory level as distiller (i.e. ../../../data.cifar10).

First, change to the sample directory, then invoke the application:

$ cd distiller/examples/classifier_compression
$ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01

You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss.

Example: Getting parameter statistics of a sparsified model

We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.
With the following command-line arguments, the sample application loads the model (--resume) and prints statistics about the model weights (--summary=sparsity). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you resume a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (-a=resnet20_cifar):

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=sparsity

You should see a text table detailing the various sparsities of the parameter tensors. The first column is the parameter name, followed by its shape, the number of non-zero elements (NNZ) in the dense model, and in the sparse model. The next set of columns show the column-wise, row-wise, channel-wise, kernel-wise, filter-wise and element-wise sparsities.
Wrapping it up are the standard-deviation, mean, and mean of absolute values of the elements.

In the Compression Insights notebook we use matplotlib to plot a bar chart of this summary, that indeed show non-impressive footprint compression.

Although the memory footprint compression is very low, this model actually saves 26.6% of the MACs compute.

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute

Example: Post-training quantization

This example performs 8-bit quantization of ResNet20 for CIFAR10. We've included in the git repository the checkpoint of a ResNet20 model that we've trained with 32-bit floats, so we'll take this model and quantize it:

$ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10 --resume ../ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize-eval --evaluate

The command-line above will save a checkpoint named quantized_checkpoint.pth.tar containing the quantized model parameters. See more examples here.

Explore the sample Jupyter notebooks

The set of notebooks that come with Distiller is described here, which also explains the steps to install the Jupyter notebook server.
After installing and running the server, take a look at the notebook covering pruning sensitivity analysis.

Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis.

Running the tests

We are currently light-weight on test and this is an area where contributions will be much appreciated.
There are two types of tests: system tests and unit-tests. To invoke the unit tests:

$ cd distiller/tests
$ pytest

We use CIFAR10 for the system tests, because its size makes for quicker tests. To invoke the system tests, you need to provide a path to the CIFAR10 dataset which you've already downloaded. Alternatively, you may invoke full_flow_tests.py without specifying the location of the CIFAR10 dataset and let the test download the dataset (for the first invocation only). Note that --cifar1o-path defaults to the current directory.
The system tests are not short, and are even longer if the test needs to download the dataset.

$ cd distiller/tests
$ python full_flow_tests.py --cifar10-path=<some_path>

The script exits with status 0 if all tests are successful, or status 1 otherwise.

Generating the HTML documentation site

Install mkdocs and the required packages by executing:

$ pip3 install -r doc-requirements.txt

To build the project documentation run:

$ cd distiller/docs-src
$ mkdocs build --clean

This will create a folder named 'site' which contains the documentation website. Open distiller/docs/site/index.html to view the documentation home page.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details

Community

Github projects using Distiller

DeGirum Pruned Models - a repository containing pruned models and related information.
TorchFI - TorchFI is a fault injection framework build on top of PyTorch for research purposes.
hsi-toolbox - Hyperspectral CNN compression and band selection

Research papers citing Distiller

Brunno F. Goldstein, Sudarshan Srinivasan, Dipankar Das, Kunal Banerjee, Leandro Santiago, Victor C. Ferreira, Alexandre S. Nery, Sandip Kundu, Felipe M. G. Franca.
Reliability Evaluation of Compressed Deep Learning Models,
In IEEE 11th Latin American Symposium on Circuits & Systems (LASCAS), San Jose, Costa Rica, 2020, pp. 1-5.
Pascal Bacchus, Robert Stewart, Ekaterina Komendantskaya.
Accuracy, Training Time and Hardware Efficiency Trade-Offs for Quantized Neural Networks on FPGAs,
In Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2020. Lecture Notes in Computer Science, vol 12083. Springer, Cham
Indranil Chakraborty, Mustafa Fayez Ali, Dong Eun Kim, Aayush Ankit, Kaushik Roy.
GENIEx: A Generalized Approach to Emulating Non-Ideality in Memristive Xbars using Neural Networks,
arXiv:2003.06902, 2020.
Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Tarek Elgindi, Charles-Alban Deledalle, Hadi Esmaeilzadeh.
Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization,
arXiv:2003.00146, 2020.
Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu.
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing,
arXiv:2002.12620, 2020.
Alexander Kozlov, Ivan Lazarevich, Vasily Shamporov, Nikolay Lyalyushkin, Yury Gorbachev.
Neural Network Compression Framework for fast model inference,
arXiv:2002.08679, 2020.
Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yuri Nahshan, Alex Bronstein, Uri Weiser.
Robust Quantization: One Model to Rule Them All,
arXiv:2002.07686, 2020.
Muhammad Abdullah Hanif, Muhammad Shafique.
SalvageDNN: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping,
In Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering SciencesVolume 378, Issue 2164, 2019.
https://doi.org/10.1098/rsta.2019.0164
Meiqi Wang, Jianqiao Mo, Jun Lin, Zhongfeng Wang, Li Du.
DynExit: A Dynamic Early-Exit Strategy for Deep Residual Networks,
In IEEE International Workshop on Signal Processing Systems (SiPS), 2019.
Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ganesh Gopalakrishnan.
A Programmable Approach to Model Compression,
arXiv:1911.02497, 2019
code
Hui Guan, Lin Ning, Zhen Lin, Xipeng Shen, Huiyang Zhou, Seung-Hwan Lim.
In-Place Zero-Space Memory Protection for CNN,
In Conference on Neural Information Processing Systems (NeurIPS), 2019.
arXiv:1910.14479, 2019
code
Hossein Baktash, Emanuele Natale, Laurent Viennot.
A Comparative Study of Neural Network Compression,
arXiv:1910.11144, 2019.
Maxim Zemlyanikin, Alexander Smorkalov, Tatiana Khanova, Anna Petrovicheva, Grigory Serebryakov.
512KiB RAM Is Enough! Live Camera Face Recognition DNN on MCU,
In IEEE International Conference on Computer Vision (ICCV), 2019.
Ziheng Wang, Jeremy Wohlwend, Tao Lei.
Structured Pruning of Large Language Models,
arXiv:1910.04732, 2019.
Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh.
Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic,
arXiv:1906.11915, 2019.
Gil Shomron, Tal Horowitz, Uri Weiser.
SMT-SA: Simultaneous Multithreading in Systolic Arrays,
In IEEE Computer Architecture Letters (CAL), 2019.
Shangqian Gao , Cheng Deng , and Heng Huang.
Cross Domain Model Compression by Structurally Weight Sharing,
In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8973-8982.
Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, James Glass.
FAKTA: An Automatic End-to-End Fact Checking System,
In North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
Ahmed T. Elthakeb, Prannoy Pilligundla, Hadi Esmaeilzadeh.
SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training,
arXiv:1905.01416, 2019. code
Goncharenko A., Denisov A., Alyamkin S., Terentev E.
Trainable Thresholds for Neural Network Quantization,
In: Rojas I., Joya G., Catala A. (eds) Advances in Computational Intelligence Lecture Notes in Computer Science, vol 11507. Springer, Cham. International Work-Conference on Artificial Neural Networks (IWANN 2019).
Ahmed T. Elthakeb, Prannoy Pilligundla, Hadi Esmaeilzadeh.
Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks,
arXiv:1906.06033, 2019
Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang.
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting,
arXiv:1901.09504, 2019
Code
Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, C. Thomas Gray.
Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference,
Nvidia Research, 2019.
Norio Nakata.
Recent Technical Development of Artificial Intelligence for Diagnostic Medical Imaging,
In Japanese Journal of Radiology, February 2019, Volume 37, Issue 2, pp 103–108.
Alexander Goncharenko, Andrey Denisov, Sergey Alyamkin, Evgeny Terentev.
Fast Adjustable Threshold For Uniform Neural Network Quantization,
arXiv:1812.07872, 2018

If you used Distiller for your work, please use the following citation:

@article{nzmora2019distiller,
  author       = {Neta Zmora and
                  Guy Jacob and
                  Lev Zlotnik and
                  Bar Elharar and
                  Gal Novik},
  title        = {Neural Network Distiller: A Python Package For DNN Compression Research},
  month        = {October},
  year         = {2019},
  url          = {https://arxiv.org/abs/1910.12232}
}

Acknowledgments

Any published work is built on top of the work of many other people, and the credit belongs to too many people to list here.

The Python and PyTorch developer communities have shared many invaluable insights, examples and ideas on the Web.
The authors of the research papers implemented in the Distiller model-zoo have shared their research ideas, theoretical background and results.

Built With

PyTorch - The tensor and neural network framework used by Distiller.
Jupyter - Notebook serving.
TensorBoard - Used to view training graphs.
Cadene - Pretrained PyTorch models.

Disclaimer

Distiller is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and features are planned to be added to the library. Feedback and contributions from the open source and research communities are more than welcome.

distiller's People

Contributors

Stargazers

Watchers

Forkers

keyky shubhampachori12110095 lamhocn kuroro-tian vuiseng9 wanjinchang hurmean daydreamcoding baiyancheng20 kingofoz dreadlord1984 bhuwendongchao yuechengli pinglmlcv hhgxx123 rkshuai weitaoatvison birdylinch cwlacewe tae618 ruotianluo nyyznyyz1991 levelsethu abc3436645 zhaomeng1028 wzheng1983 gjylt msnqqer nzmora houhlin hyzcn soledad89 shiyongde chenxingqiang yinxx jiaqun123 yanghaha11514 templeblock 10imaging starstylesky ml-lab hsyi lebyni aolansili thomasjpfan laoyingbu xyuan hzhang57 yanwang2014 qq332982511 mmmonkeytao shlpu dennistang742 zhulei1109 hanson13 tianyunkeml hephaex hiyoung-asr zmsunnyday huipengzhang chybhao666 arunkumarramanan zxshinxz anneshachowdhury jimmysue jerrybonjour ist-daslab lengmm tonyle9 yc2017fd lsheiba chanmi168 jackcc goodluckcwl yangyangl gmayday1997 binbinmeng xiangliu886 auserj jacke121 hulalazz allenanswerzq liuguoyou jamesdeng88 k9sret dongfangduoshou123 wait1988 codeaudit dl-85 velikodniy zhaozhichao4515 13476279840 liujingcs cwlseu ricardozzf amjad-twalo nsl2014fm cf2220160244 g-wang shivamsaboo17

distiller's Issues

Quantization with 1D convolutions

Hi, I have a question, does Distiller support 1D convolutions? I'm trying to compress a CNN with 1D convolutions for binary classification of strings using quantization.

Symmetric Linear Quantization

Hi,

The math for the derivation of y_q under Symmetric Linear Quantization on the page
https://nervanasystems.github.io/distiller/algo_quantization/index.html seems incorrect.
I am not able to reason out to myself, the scaling of the bias term.

Thanks

lr_scheduler doesn't work when start training from checkpoint

Hello:
I want to use MultiStepMultiGammaLR scheduler in my pruning lr_scheduler.
When I using the compress_classifier.py to pruning the res_net20_cifar from the begining and define the lr_scheduler in the yaml file, it works well.
But when I using the checkpoint to train and prune, the lr_scheduler defined in the yaml file doesn't work. The lr doesn't decay when the epoch achieve defined milestone.

I use the script below:
python3 compress_classifier.py --arch resnet20_cifar dataset/ -p=50 --lr=0.3 --epochs=150 -b 128 --compress=resnet20_cifar_ele_pruning.yaml -j=1 --vs 0 --deterministic --resume=logs/resnet20_cifar_baseline/checkpoint.pth.tar

Below is my yaml setting

version: 1
pruners:
  low_pruner:
    class: AutomatedGradualPruner
    initial_sparsity : 0.05
    final_sparsity: 0.60
    weights: [module.layer1.2.conv1.weight,  module.layer1.2.conv1.weight,
              module.layer1.0.conv1.weight,  module.layer1.0.conv2.weight,
              module.layer1.1.conv1.weight,  module.layer1.1.conv2.weight]

  mid_pruner:
    class:  AutomatedGradualPruner
    initial_sparsity : 0.05
    final_sparsity: 0.67
    weights: [module.layer2.2.conv1.weight,  module.layer2.2.conv2.weight,
              module.layer2.0.conv2.weight,  module.layer2.0.downsample.1.weight,
              module.layer2.0.conv1.weight,  module.layer2.0.downsample.0.weight,
              module.layer2.1.conv1.weight,  module.layer2.1.conv2.weight]

  high_pruner:
    class:  AutomatedGradualPruner
    initial_sparsity : 0.05
    final_sparsity: 0.76
    weights: [module.layer3.0.conv1.weight,  module.layer3.1.conv1.weight,
              module.layer3.1.conv2.weight,  module.layer3.0.conv2.weight,
              module.layer3.0.downsample.0.weight, module.layer3.0.downsample.1.weight,
              module.fc.weight]
lr_schedulers:
  training_lr:
    class: MultiStepMultiGammaLR
    milestones: [300, 302, 400]
    gammas: [0.1, 0.1, 0.5]

policies:
    - pruner:
        instance_name: low_pruner
      starting_epoch: 300
      ending_epoch: 400
      frequency: 2
    - pruner:
        instance_name: mid_pruner
      starting_epoch: 300
      ending_epoch: 400
      frequency: 2
    - pruner:
        instance_name: high_pruner
      starting_epoch: 300
      ending_epoch: 400
      frequency: 2
    - lr_scheduler:
        instance_name: training_lr
      starting_epoch: 0
      ending_epoch: 400
      frequency: 1

Is there any problem in my script and yaml setting?

sensitivity analysis fail

Hi Neta,

I tried to run the sensitivity analysis for filter with the following command 'python3 compress_classifier.py -a resnet20_cifar --data ../../../data.cifar10/ -j 12 --resume=../ssl/checkpoints/checkpoint_trained_dense.pth.tar --sense=filter', but got an error, detailed log:

Logging to TensorBoard - remember to execute the server:

tensorboard --logdir='./logs'

=> loading checkpoint ../ssl/checkpoints/checkpoint_trained_dense.pth.tar
Checkpoint keys:
arch
optimizer
compression_sched
state_dict
best_top1
epoch
best top@1: 92.540
Loaded compression schedule from checkpoint (epoch 179)
=> loaded checkpoint '../ssl/checkpoints/checkpoint_trained_dense.pth.tar' (epoch 179)
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Running sensitivity tests
Testing sensitivity of module.conv1.weight [0.0% sparsity]
Traceback (most recent call last):
File "compress_classifier.py", line 782, in
main()
File "compress_classifier.py", line 339, in main
return sensitivity_analysis(model, criterion, test_loader, pylogger, args)
File "compress_classifier.py", line 750, in sensitivity_analysis
group=args.sensitivity)
File "/home/chongyu/application/distiller/distiller/sensitivity.py", line 108, in perform_sensitivity_analysis
scheduler.on_epoch_begin(0)
File "/home/chongyu/application/distiller/distiller/scheduler.py", line 112, in on_epoch_begin
policy.on_epoch_begin(self.model, self.zeros_mask_dict, meta)
File "/home/chongyu/application/distiller/distiller/policy.py", line 123, in on_epoch_begin
self.is_last_epoch = meta['current_epoch'] == (meta['ending_epoch'] - 1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

It looks like there is no valid value for meta['ending_epoch'].
Can you kindly suggest how to solve it? Thanks.

About resume quantized model to summary

I tried to resume quantized model to get MACs summary like this

python3 compress_classifier.py --resume=./resnet20_quantized.pth.tar -a=resnet20_cifar ../../../data..cifar10 --summary=compute

But it failed and came with this error

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Another problem, the first time I ran compress_classifier.py to train simplenet_cifar, msglogger worked well. When I did it again to train resnet20 (or other models), however, the terminal only show these printed message

Logging to TesnsorBoard - remember to execute the server:
tensorboard --logdir='./logs'
Files already downloaded and verified
Files already downloaded and verified

No more message was printed, and log file which should be saved in './logs/time_stamp/log' was no longer generated either while the training process was still working. How to fix these two problems? Thx a lot

Can'r run prune and quantization together

Hi,

When I try to run prune and quantization together on resnet20_cifar in one yaml, it failed. It said Key Error on xxx_float_weight. So what's the correct procedure to mix both of them together?

Thomas

Some confusions about pruning procedure

You set the mask for parameters in model on the beginning of epoch. Considering you may set two or more different pruners for parameters in different layers, it is reasonable that the set_param_mask is called in on_epoch_begin of Class PruningPolicy.

But I think masker's method apply_mask should not be called in on_minibatch_begin because you would call the apply_mask method two or more times when you have two or more pruners in your policies. I think calling it one time is enough due to zeros_mask_dict have included all parameters you want to prune although the results are same no matter how many times you call it.

Interestingly, I found you implement this idea in on_minibatch_end in scheduler.py, which calls apply_mask only one time by usingweights_are_masked flags.

The second question is that you call apply_mask on the end of minibatch due to the weights are updated during the backward pass. However, I think it is no need to do that. Because the weights you mask cannot be updated due to it's grad attribute has been masked(or set to zero) in backward by using register_hook function.

invalid choice: 'resnet20-cifar'?

I use “python3 compress_classifier.py -a resnet20-cifar ../../../data.cifar10 --resume ../examples/ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize --evaluate”

but error occurs:

compress_classifier.py: error: argument --arch/-a: invalid choice: 'resnet20-cifar' (choose from 'alexnet', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'inception_v3', 'mobilenet', 'mobilenet_025', 'mobilenet_050', 'mobilenet_075', 'resnet101', 'resnet152', 'resnet18', 'resnet20_cifar', 'resnet32_cifar', 'resnet34', 'resnet44_cifar', 'resnet50', 'resnet56_cifar', 'simplenet_cifar', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn')

GroupThresholdMixin for Column Pruning

In https://github.com/NervanaSystems/distiller/blob/c2a429374f424ab357f55fd89d9d0d9289a570fe/distiller/thresholding.py#L124 there is an unconditional reduction in case threshold_criteria == Mean_Abs and unconditional Max finding in case threshold_criteria == max along dim =1 . Now if the group_type == Cols then dim=0 will be required ?

Structure pruning is broken for models with non-serial connections

Structure pruning is broken for models with non-serial connections.
Models such as Alexnet and VGG are have serial data-dependencies (connections) and are fine.
More complex models, with parallel-data dependencies (paths), such as ResNets (skip connections) and GoogLeNet (Inception layers) might fail when pruning filters or channels.

This is because a module, such as torch.nn.modules.batchnorm.BatchNorm2d layers, might depend on multiple inputs. This is not always a problem. For example, if the dependent module has type torch.nn.Conv2d and we are pruning weight filters.
But if the dependent module has type torch.nn.modules.batchnorm.BatchNorm2d, and we are pruning weight filters, then it is possible that each of the inputs selects different activation channels to prune. In such a case, how should we prune the BatchNorm's scale and shift tensors (.weight and .bias)?

To solve this we need to define one of the modules as the leader which determines what activation channels to prune; and define the rest of the modules in the dependency sub-graph as followers. Followers do not choose which activation channels to prune, so their sparsity masks is determined by the choice of the leader.
Because the sparsity maps of different follower modules may have different shapes, the leader defines a binary map which is a binary vector of active (1) and pruned (0) channels. Each "follower" expands this single binary map to create its own private pruning mask.

This requires changing the way we express filter/channel pruning in YAML, and how we create pruning masks.

I'm trying to make this fix available soon.
This is related to issues #79 and #73.

loss.backward() --> RuntimeError

Is there anyone who can help me?

Ubuntu 16.04

command:
time python3 compress_classifier.py --arch resnet20_cifar ../../../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --compress=../quantization/preact_resnet20_cifar_pact.yaml -j=1 --deterministic

Error message:

--- validate (epoch=199)-----------
5000 samples (256 per mini-batch)
==> Top1: 90.300 Top5: 99.700 Loss: 0.297

==> Best Top1: 90.860 on Epoch: 187
Saving checkpoint to: logs/2018.11.29-140224/checkpoint.pth.tar

Training epoch: 45000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.11.29-140224/2018.11.29-140224.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 391, in main
msglogger.info(distiller.masks_sparsity_tbl_summary(model, compression_scheduler))
File "/usr/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/media/walker/DATA/work/new_quant/distiller/distiller/data_loggers/collector.py", line 301, in collectors_context
yield collectors_dict
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 495, in train
loss.backward()
File "/home/walker/.local/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/walker/.local/lib/python3.5/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

real 301m20.430s
user 204m21.640s
sys 99m42.978s

Query about adjusting the forward() method of the model post Thinning

In tests/test_pruning.py we have def test_conv_fc_interface

    # Remove filters
    fc = common.find_module_by_name(model, fc_name)
    assert fc is not None

    # Test thinning
    fm_size = fc.in_features // conv.out_channels
    num_nnz_filters = num_filters - expected_cnt_removed_filters
    distiller.remove_filters(model, zeros_mask_dict, arch, dataset, optimizer)
    assert conv.out_channels == num_nnz_filters
    assert fc.in_features == fm_size * num_nnz_filters

    # Run again, to make sure the optimizer and gradients shapes were updated correctly
    run_forward_backward(model, optimizer, dummy_input)
    run_forward_backward(model, optimizer, dummy_input)

and run_forward_backward does this:
https://github.com/NervanaSystems/distiller/blob/11490f6fe71ce7ccf5ef74511834d43b658630d2/tests/test_pruning.py#L230

How does this work without overloading the forward method of the model class ? Because now we are removing filters from Conv2d lets say this has Linear layer that follows it, dont we need to change the forward method of the model in-order for the forward pass to go through ?

Compressing seq2seq

Hey,

We've recently written a tutorial on compressing a PyTorch language model using the element-wise AGP pruner.
We're seeking community help to add an example of pruning a PyTorch seq2seq model (for example).
Thanks,
Neta

Thresholding 4D Tensors

In https://github.com/NervanaSystems/distiller/blob/c2a429374f424ab357f55fd89d9d0d9289a570fe/distiller/thresholding.py#L90 Is the comparison operation Reversed?

My understanding is that, If the mean or the max is greater than the threshold, We would expect to have zeros in the mask.

Does quantization support custom model,like object detection model？

I have a object detection model, like mobilenet+ssd, and I use compress_classifier.py, I just want to quantize to 8 bits. I use command
python compress_classifier.py -a mobilenet_v1_ssd_lite_voc ../data.cifar10 --resume ../../data/models/mobilenet_v1_ssd_lite_voc_72.7.pth --quantize-eval
But, there is a error,

compress_classifier.py: error: argument --arch/-a: invalid choice: 'mobilenet_v1_ssd_lite_voc' (choose from 'alexnet', 'alexnet_bn', 'densenet121', 'densenet161', 'desenet169', 'densenet201', 'inception_v3', 'mobilenet', 'mobilenet_025', 'mobilenet_050', 'mobilenet_075', 'preact_resnet101', 'preact_resnet110_cifar', 'preact_resnet10_cifar_conv_ds', 'preact_resnet152', 'preact_resnet18', 'preact_resnet20_cifar', 'preact_resnet20_cifar_conv_ds', 'preact_resnet32_cifar', 'preact_resnet32_cifar_cov_ds', 'preact_resnet34', 'preact_resnet44_cifar', 'preact_resnet44_cifar_conv_ds', 'preact_resnet50', 'preact_resnet56_cifar', 'preact_resnet56_cifar_conv_ds', 'resnt101', 'resnet101_earlyexit', 'resnet110_cifar_earlyexit', 'resnet1202_cifar_earlyexit', 'resnet152', 'resnet152_earlyexit', 'resnet18', 'resnet18_earlyexit', 'resnet0_cifar', 'resnet20_cifar_earlyexit', 'resnet32_cifar', 'resnet32_cifar_earlyexit', 'resnet34', 'resnet34_earlyexit', 'resnet44_cifar', 'resnet44_cifar_earlyexit', 'rsnet50', 'resnet50_earlyexit', 'resnet56_cifar', 'resnet56_cifar_earlyexit', 'simplenet_cifar', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg11_bn_cifar, 'vgg11_cifar', 'vgg13', 'vgg13_bn', 'vgg13_bn_cifar', 'vgg13_cifar', 'vgg16', 'vgg16_bn', 'vgg16_bn_cifar', 'vgg16_cifar', 'vgg19', 'vgg19_bn', 'vgg19_bn_cifar', 'vg19_cifar')
Could you tell me how to do it? Thank you~

initialization for GradientRankedFilterPruner

hello:

In the initialization function of GradientRankedFilterPruner, it is using RandomRankedFilterPruner. Is it typo? IMHO, we should use super(GradientRankedFilterPruner, self) instead.

thanks!

Unable to reproduce 6.96% test error for resnet-56 on cifar10

I run the following command to run a baseline model for resnet56 on cifar10:

python3 compress_classifier.py
--arch resnet56_cifar ../data.cifar10 -p=50
--lr=0.4 --epochs=180
--compress=../pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml
-j=1 --deterministic

I am unable to reproduce to accuracy claimed in file resnet56_cifar_baseline_training.yaml which says that they achieve top1 accuracy of 92.97%.
However, when i run the code, the reported accuracy is only 90.38%.

Further, I notice that the learning rate schedule used in this config file is different from the original resnet paper and also the original paper the code wants to reproduce. So i change the learning rate schedule to decrease by 0.1 in epoch 80 and 120. In total I train for 160 epochs. I achieve this by modifying the file resnet56_cifar_baseline_training.yaml.
Even with this learning rate schedule, the final accuracy is still 92.20%.

Query on Quantization of Input batches.

    if args.evaluate:
        if args.quantize:
            model.cpu()
            quantizer = quantization.SymmetricLinearQuantizer(model, 8, 8)
            quantizer.prepare_model()
            model.cuda()
        top1, _, _ = test(test_loader, model, criterion, [pylogger], args.print_freq)

I wanted some help understanding the flow for evaluating a quantized model.
From this code, I see that the model parameters and activations are quantized after quantizer.prepare_model() but then I was expecting the image batches will be quantized too, before performing inference using test(). But I could not find the place where you are attempting to quantize the inputs during forward pass. Once you feed it into the model, post_quantized_forward() will take care of quantizing the activations.

I am guessing Its being taken care of in one of the quantizer methods. Not exactly sure where.
Could you please elaborate on the flow for quantization of inputs.

Query on quantization of Bias

I am expecting the final model after quantization to have all integers in the range -128 to 127 , for 8 -bit symmetric linear quantization, but when I print out the model parameters I noticed that the bias are still as floats.
so I am currently setting inplace = True in this line https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/range_linear.py#L121 .

At one point we need need to quantize the bias, before writing into the model. Currently I do not see that happening.

About LASSO based channel pruning

Hi there,

Thanks for open-source the code.

Is there any plan for implementation of the LASSO based channel-pruning algorithm (i.e. the paper: Channel pruning for accelerating very deep neural networks)?

cannot resume model for training

I use the test ：
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10/ -j 1 --resume ../../../data.cifar10/models/best.pth.tar --epochs 200 --compress=../quantization/preact_resnet20_cifar_pact.yaml --out-dir="logs/" --wd=0.0002 --vs=0

some error：

=> loading checkpoint ../../../data.cifar10/models/best.pth.tar
Checkpoint keys:
arch
        compression_sched
        epoch
        optimizer
        state_dict
        quantizer_metadata
        best_top1
   best top@1: 39.310
Loaded compression schedule from checkpoint (epoch 0)
Loaded quantizer metadata from the checkpoint
{'params': {'bits_weights': 3, 'bits_activations': 4, 'quantize_bias': False, 'bits_overrides': OrderedDict([('conv1', OrderedDict([('wts', None), ('acts', None)])), ('layer1.0.pre_relu', OrderedDict([('wts', None), ('acts', None)])), ('final_relu', OrderedDict([('wts', None), ('acts', None)])), ('fc', OrderedDict([('wts', None), ('acts', None)]))])}, 'type': <class 'distiller.quantization.clipped_linear.PACTQuantizer'>}
Traceback (most recent call last):
  File "compress_classifier.py", line 686, in <module>
    main()
  File "compress_classifier.py", line 244, in main
    model, chkpt_file=args.resume)
  File "D:\pytorchProject\distiller\apputils\checkpoint.py", line 117, in load_checkpoint
    quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() missing 1 required positional argument: 'optimizer'

how to fix it？

Do you have a plan to implement INQ in distiller?

Hi,
I read the distiller's documentation carefully and find it mentioned the INQ method proposed by Zhou.A. And now the Distiller does not support this method, so I want to know there is any plan to implement INQ in distiller? Thanks

RuntimeError: cuda runtime error (30)

An error occurred:
RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCTensorRandom.cu:25
My env is ubuntu 18.04, cuda 8.0, torch-0.4.0, python 3.6. Which one is wrong？Or what's the reason?
In the document "readme" shows that:
PyTorch is included in the requirements.txt file, and will currently download PyTorch version 3.1 for CUDA 8.0. This is the setup we've used for testing Distiller.
But the requirements.txt shows that the torch version is 0.4.0.
What are the final versions? Which cuda and which torch and the others.

Not available link in docs pruning AGP section

Hello, I'm reading the docs on this link:
https://nervanasystems.github.io/distiller/algo_pruning/index.html#automated-gradual-pruner-agp
I find that one link on the text is not available:

...
You can play with the scheduling parameters in the agp_schedule.ipynb notebook.

Maybe you can fix the link to some jupyter notebook on the github?
Thanks.

Early Exit Inference

How to train an early exit model? Here is the command I used:

python3 compress_classifier.py --arch resnet20_cifar_earlyexit ../../../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/resnet20_cifar_baseline_training.yaml -j=1 --deterministic --earlyexit_thresholds 0.9 1.2 --earlyexit_lossweights 0.2 0.3

But Distiller shows me the following error message:

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
==> using cifar10 dataset
=> creating resnet20_cifar_earlyexit model for CIFAR10

Logging to TensorBoard - remember to execute the server:

tensorboard --logdir='./logs'

=> using early-exit threshold values of [0.9, 1.2]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'dampening': 0, 'weight_decay': 0.0001, 'momentum': 0.9, 'nesterov': False, 'lr': 0.3}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml

Training epoch: 45000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 477, in train
loss = earlyexit_loss(output, target, criterion, args)
File "compress_classifier.py", line 645, in earlyexit_loss
loss += (1.0 - sum_lossweights) * criterion(output[args.num_exits-1], target)
IndexError: list index out of range

resnet20_cifar_baseline_training.yaml ==>
lr_schedulers:
training_lr:
class: StepLR
step_size: 45
gamma: 0.10

policies:
- lr_scheduler:
instance_name: training_lr
starting_epoch: 45
ending_epoch: 200
frequency: 1

why the "direct" quantization operate can't make model smaller？

I have tried to use Symmetric Linear Quantization to quantize my model. I'm wondering why the parameters of model is still the float (11.) rather than int (11). The Quantization seems only help me to change the parameters from float (11.12345) into integer of float (11.)

prune conv filters will not process successor bn layers

I meet a problem when i run the pruning_filters_for_efficient_convnets example which uses resnet56_cifar_filter_rank.yaml. One mistake i find is outdated document described this yaml. On the document website, it writes:

extensions:
  net_thinner:
      class: 'ResnetCifarFilterRemover'
      thinning_func_str: resnet_cifar_remove_filters

but in the yaml file, it use:

extensions:
  net_thinner:
      class: 'FilterRemover'
      thinning_func_str: remove_filters
      arch: 'resnet56_cifar'
      dataset: 'cifar10'

The net_thinner is different. There are no ResnetCifarFilterRemover class and resnet_cifar_remove_filters in the source code.

The biggest problem is the example cant work. I find when a conv layer remove some filters, it will not change the following bn layer. The error is below:

RuntimeError: running_mean should contain 7 elements not 16

I debug the code, it seems create_thinning_recipe_filters function in the thinning.py exists some bugs, it won't handle bn layers. the line to handle bn layers

The problem of example command

Hi, nzmora:
When I ran the command "python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01", I got the following error:

2018-10-22 17:03:03,745 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log
2018-10-22 17:03:03,745 - Number of CPUs: 24
2018-10-22 17:03:03,850 - Number of GPUs: 8
2018-10-22 17:03:03,850 - CUDA version: 8.0.61
2018-10-22 17:03:03,850 - CUDNN version: 7102
2018-10-22 17:03:03,851 - Kernel: 4.4.0-98-generic
2018-10-22 17:03:03,851 - Python: 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
2018-10-22 17:03:03,851 - PyTorch: 0.4.0
2018-10-22 17:03:03,851 - Numpy: 1.14.3
2018-10-22 17:03:03,852 - Traceback (most recent call last):
File "compress_classifier.py", line 686, in
main()
File "compress_classifier.py", line 179, in main
apputils.log_execution_env_state(sys.argv, gitroot=module_path)
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 78, in log_execution_env_state
log_git_state()
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 56, in log_git_state
repo = Repo(gitroot, search_parent_directories=True)
File "/home/project/compress/distiller-master/env/lib/python3.5/site-packages/git/repo/base.py", line 168, in init
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /home/project/compress/distiller-master

2018-10-22 17:03:03,852 -
2018-10-22 17:03:03,852 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log

How can I solve the problem？

filter pruning error

Hi Neta,
I met a an error when doing filter pruning, after debugging, I found it might because Distiller does not support concatenate operation.

The related layers of my network:
(aspp): ASPP_module(
(aspp0): Sequential(
(0): Conv2d(116, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp1): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp2): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp3): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(global_avg_pool): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(116, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(conv): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)

where input of layer 'conv' is concatenating outputs of aspp0. aspp1. aspp2, aspp3 and global_avg_pool (at dim = 1)
my configuration for pruning:

        module.aspp.aspp0.0.weight: [0.5, '3D']
        module.aspp.aspp1.0.weight: [0.5, '3D']
        module.aspp.aspp2.0.weight: [0.5, '3D']
        module.aspp.aspp3.0.weight: [0.5, '3D']
        module.global_avg_pool.1.weight: [0.5, '3D']

Then it is supposed that Distiller should prune the following layer 'conv' to be Conv2d(640, 256, kernel_size=(1, 1), stride=(1, 1), bias=False), but I got error ' Given groups=1, weight of size [256, 128, 1, 1], expected input[8, 640, 60, 80] to have 128 channels, but got 640 channels instead', which means Distiller does not recognise the concatenated inputs.
Please advise, thanks.

quant_aware_train_linear_quant doesn't work on resnet20_cifar

I tried the quant_aware_train_linear_quant.yaml on the resnet20_cifar model, the model seems to be messed up, and cannot get any reasonable prediction and also cannot train.

Is the quant_aware_train_linear_quant.yaml is only suitable for the resnet18 ? It seems not, could anyone help ? Thanks very much.

ONNX export for quantization?

I am experimenting with exporting a quantized network to ONNX. This ultimately does not succeed because there is no round operator in ONNX, and PyTorch does not define an ATen for round either.

I'm not sure what the best strategy would be (perhaps using floor, which exists in ONNX but is still missing in the PyTorch exporter), and some guidance would be appreciated.

In case anyone would like to duplicate the experiment, the first step was to modify the forward() method in ClippedLinearQuantization. Instead of the call to LinearQuantizeSTE.apply (the PyTorch ONNX exporter doesn't know what to do with that), inline the contents of LinearQuantizeSTE.forward, like so:

    def forward(self, input_):
        input_ = clamp(input_, 0, self.clip_val, self.inplace)
        if self.inplace:
            input_.mark_dirty(input_)
        input_ = linear_quantize(input_, self.scale_factor, self.inplace)
        if self.dequantize:
            input_ = linear_dequantize(self.input_, self.scale_factor, self.inplace)
        return input_

This should be functionally equivalent and the export trace will now complain about round. In q_utils.py, modify linear_quantize (for example, remove the calls to round_() and round() and replace them with... something else).

two things to confirm about weights and activations quantization

When setting the train_with_fp_copy true, you change the attribute of conv/fc layer. You substitute conv.weights with conv.float_weights and conv.weights become the buffer instead parameter. The forward pass of conv/fc layer still use conv.weights, quantized weights, which is determined by Pytorch default conv implementation. But in backward pass, the gradients calculated with respect to q_weights(quantized weights) is stored in float_weights.gard rather than weightsdue to it has no grad attribute. So you implicitly back-prop the grad with respect to quantized weights to the grad with full-precision weights using straight-through estimator namely both are equal.
You implement activation quantization by replacing relu layer with new own-defined layer. Also you again directly make the gradient with respect of activations before and after quantisation equal using STE.
So I want to confirm to you whether there are some misunderstanding in above points.

Resume from checkpoint with quantization

https://github.com/NervanaSystems/distiller/blob/e749ea6288431a53f839b621cc3e38facbf824de/distiller/quantization/range_linear.py#L165
I got an error message after resume symmetric linear quantized model

Traceback (most recent call last):
  File "compress_classifier.py", line 684, in <module>
    main()
  File "compress_classifier.py", line 244, in main
    model, chkpt_file=args.resume)
  File "/chenys/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
    quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() got an unexpected keyword argument 'bits_weights'

I modify the __init__ argument and fix the problem

def __init__(self, model, bits_activations=8, bits_weights=8, **kw):
     super(SymmetricLinearQuantizer, self).__init__(model, bits_activations=bits_activations,
                                                    bits_weights=bits_weights, 
                                                    train_with_fp_copy=False,
                                                    **kw)

I think the quantizer may be consistent in naming the parameters, or the quantizer_metadata will cause initialize error

tensorboard backend

The link in the getting started section on the readme is not available

As the title say, the link

Usage
Tutorial: Using Distiller to prune a PyTorch language model
in the section "getting started" give me 404 message. Can you move or update these links?
Thanks.

Resume from checkpoint with quantization

I am using a workaround to allow resuming from checkpoint with active quantization. The requires_grad flags aren't set in the restored biases and weights (they seem to be present at checkpoint save time). So as a quick fix I use:

    def set_grad(m):
        """
        Force the `requires_grad` flag on all weights and biases
        """
        if isinstance(m, (nn.Linear, nn.Conv2d)):
            m.weight.requires_grad_()
            if hasattr(m, 'bias') and m.bias is not None:
                m.bias.requires_grad_()

    model.apply(set_grad)

Without this, I get the PyTorch error message element 0 of tensors does not require grad and does not have a grad_fn.

Looking for a proper way to fix this.

Knowledge Distillation

I had read "Knowledge Distillation" https://nervanasystems.github.io/distiller/schedule/index.html#knowledge-distillation

Would you please help to give me a simple example about knowledge distillation?

Thinning FC layers

The thinning methods support only removing channels or filters of a CONV layer

        # We are only interested in 4D weights (of Convolution layers)
        if param.dim() != 4:
            continue

[1]
How about thinning FC layers, even if you are not going to support it, can you provide, what all one should take care of if one wants to implement say remove_rows( ) or remove_columns( ) corresponding to neuron pruning ?

[2]
Its seems hard to simply extend the thinning_recipe approach as it seems to be too tied to removing CONV structures. Any suggestions ?

[3]
Also If we are thinning, pruned pytorch models, what could be the reason for its accuracy drop ?
Because we are strictly removing only zero structures, the math should be about the same and cause the same classificaiton ?
You seem to be taking into consideration a possible perofrmace drop by preparing to thin even the gradient tensors.

There is probelm in usage of Documentation

When I run the sample code: $time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../imagenet/alexnet/pruning/alexnet.schedule_sensitivity.yaml,

I find it cannot work , it says cannot find the alexnet.schedule_sensitivity.yaml file. I think the compress should be "../sensitivity-pruning/alexnet.schedule_sensitivity.yaml". Thanks.

Questions about regularization and pruning

I found you take regularization as another means of pruning. But the procedure is different between them. Pruning is taking effect on the beginning of batchon_minibatch_begin while regularization is on the end of batchon_minibatch_end. It means that you set the regularization term zero below the threshold every batch iteration during training.
What is the reason for this? I think it's natural that this happens on the end of one epoch or end of whole training when the regularization terms have been decreased enough for pruning.
The regularization and pruning both use the same zeros_mask_dict, it may brings some messes. for example apply_mask in on_minibatch_end of class RegularizationPolicy would be called by regularization mask, but also pruning mask if there are both regularizer and pruner.
What is the purpose of keeping the regulatization mask of the last epoch. I guess it may be used by some remover in thinning.py, right?

About Knowledge Distillation

I've read the Q&A in #90 .And I want to train a student model(preact_resnet20_cifar)from a preact_resnet44_cifar.Here is the command line I used to train the teacher model:
python compress_classifier.py -a preact_resnet44_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 .
The KD command line:
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 --kd-teacher preact_resnet44_cifar --kd-resume logs/2018.12.11-130318/checkpoint.pth.tar --kd-temp 5.0 --kd-dw 0.7 --kd-sw 0.3
I got the wrong message:
`==> using cifar10 dataset
=> creating preact_resnet44_cifar model for CIFAR10
=> loading checkpoint logs/2018.12.11-130318/checkpoint.pth.tar
Checkpoint keys:
epoch
arch
state_dict
best_top1
optimizer
compression_sched
quantizer_metadata
best top@1: 48.000
Loaded compression schedule from checkpoint (epoch 2)
Loaded quantizer metadata from the checkpoint

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, , _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'
`
I don't know how could it happen.The other question is:Must the teacher model be deeper than the student model?_

A version for tensorflow framework?

Would it be possible that there will be a distiller version for tensorflow in the future?

Automated Deep Compression status

Hello there,
I am wondering about the state of the ADC implementation, and what remains to bring it to a functional state.
In the ADC merge commit message, you mentioned that it is still WiP and that it is using an unreleased version of Coach. Is that still the case?
Also, is there any documentation for how to use ADC in Distiller?

Thanks

yolov2 and darknet19

Hello, I wanna prune yolov2's pretrained model, just wanna it to have fewer filters for each layer. But, it is not in the Torchvision'model set. Does a model have to be in Torchvision'model set if I wanna prune it? I studied your documentation for a week, and i did not find a clear way to do that. Yolov2 is first trained on ImageNet then we got Darknet19 model. And then make a little change about darknet19 network, and train it again on object detection dataset and we got yolov2. And I wanna to prune this model. I am new in Pytorch. Can I do this with Distiller? Can you give me some detailed instructions? If yes, I would like to contribute my work to the nice Distiller.

Query on optimizer

This line of code self.optimizer.setstate({'param_groups': new_optimizer.param_groups}) in quantizer.py cannot change the parameter optimizer used to initialize the quantizer. So the optimizer in compress_classifier.py still can not changed by the function _get_updated_optimizer_params_groups in PACT class.

Does quantization support zero-point?

Seems there isn't zero-point handling in the code.

So does distill support zero-point in quantization?

Thanks.

Export quantized model

Hi Neta,

I looked into the doc and find out that in https://nervanasystems.github.io/distiller/design/index.html#quantization, it is mentioned 'We also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions.' Does this mean we can export this model with such 'quantized versions' operation?
If it is not supported in current Distiller, can you kindly suggest how I can export it?
Thanks so much.

Minor Clean Up of Quantizer.py

Warnings / Weak Warnings

In this line : https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/quantizer.py#L114 , Local variable keys_list might be referenced before assignment, due to the fact that keys_list is declared only when we request for bits_overrides.
In this line : https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/quantizer.py#L175 Variable module is used, but this Variable module is already declared in for loop above .

No such file or directory: '../../../data.imagenet/train'

The example in ""Direct" Quantization Without Training" contains the following code
"python3 compress_classifier.py -a resnet18 ../../../data.imagenet --pretrained --quantize --evaluate"
When I run it I get: " No such file or directory: '../../../data.imagenet/train'"

Regression Issues with Resnet 'pretrainedmodels'

In a recent patch, titles 'Activation statistics collection: add a patched version of ResNet', Distiller "overloads" torch_models' Resnet pretrained models. Following the introduction of this patch, Resnet models from pretrainedmodels fail to load. Is that by intention?

save_intermediate_feature_maps (2).txt

In my opinion, this change has two cons:

It breaks access to pretrainedmodels resnet models with a misleading error. It's also incoherent from user perspective, because non-resnet models load perfectly well.
I cannot compare resnet and resnext models that originate from the same repository.