intellabs / distiller Goto Github PK

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

License: Apache License 2.0

Python 23.98% CSS 0.01% Jupyter Notebook 76.01%

pytorch pruning quantization pruning-structures jupyter-notebook network-compression deep-neural-networks regularization group-lasso distillation

distiller's Issues

filter pruning error

Hi Neta,
I met a an error when doing filter pruning, after debugging, I found it might because Distiller does not support concatenate operation.

The related layers of my network:
(aspp): ASPP_module(
(aspp0): Sequential(
(0): Conv2d(116, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp1): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp2): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp3): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(global_avg_pool): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(116, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(conv): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)

where input of layer 'conv' is concatenating outputs of aspp0. aspp1. aspp2, aspp3 and global_avg_pool (at dim = 1)
my configuration for pruning:

        module.aspp.aspp0.0.weight: [0.5, '3D']
        module.aspp.aspp1.0.weight: [0.5, '3D']
        module.aspp.aspp2.0.weight: [0.5, '3D']
        module.aspp.aspp3.0.weight: [0.5, '3D']
        module.global_avg_pool.1.weight: [0.5, '3D']

Then it is supposed that Distiller should prune the following layer 'conv' to be Conv2d(640, 256, kernel_size=(1, 1), stride=(1, 1), bias=False), but I got error ' Given groups=1, weight of size [256, 128, 1, 1], expected input[8, 640, 60, 80] to have 128 channels, but got 640 channels instead', which means Distiller does not recognise the concatenated inputs.
Please advise, thanks.

Resume from checkpoint with quantization

I am using a workaround to allow resuming from checkpoint with active quantization. The requires_grad flags aren't set in the restored biases and weights (they seem to be present at checkpoint save time). So as a quick fix I use:

    def set_grad(m):
        """
        Force the `requires_grad` flag on all weights and biases
        """
        if isinstance(m, (nn.Linear, nn.Conv2d)):
            m.weight.requires_grad_()
            if hasattr(m, 'bias') and m.bias is not None:
                m.bias.requires_grad_()

    model.apply(set_grad)

Without this, I get the PyTorch error message element 0 of tensors does not require grad and does not have a grad_fn.

Looking for a proper way to fix this.

Knowledge Distillation

I had read "Knowledge Distillation" https://nervanasystems.github.io/distiller/schedule/index.html#knowledge-distillation

Would you please help to give me a simple example about knowledge distillation?

Not available link in docs pruning AGP section

Hello, I'm reading the docs on this link:
https://nervanasystems.github.io/distiller/algo_pruning/index.html#automated-gradual-pruner-agp
I find that one link on the text is not available:

...
You can play with the scheduling parameters in the agp_schedule.ipynb notebook.

Maybe you can fix the link to some jupyter notebook on the github?
Thanks.

Query on Quantization of Input batches.

    if args.evaluate:
        if args.quantize:
            model.cpu()
            quantizer = quantization.SymmetricLinearQuantizer(model, 8, 8)
            quantizer.prepare_model()
            model.cuda()
        top1, _, _ = test(test_loader, model, criterion, [pylogger], args.print_freq)

I wanted some help understanding the flow for evaluating a quantized model.
From this code, I see that the model parameters and activations are quantized after quantizer.prepare_model() but then I was expecting the image batches will be quantized too, before performing inference using test(). But I could not find the place where you are attempting to quantize the inputs during forward pass. Once you feed it into the model, post_quantized_forward() will take care of quantizing the activations.

I am guessing Its being taken care of in one of the quantizer methods. Not exactly sure where.
Could you please elaborate on the flow for quantization of inputs.

Quantization with 1D convolutions

Hi, I have a question, does Distiller support 1D convolutions? I'm trying to compress a CNN with 1D convolutions for binary classification of strings using quantization.

There is probelm in usage of Documentation

When I run the sample code: $time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../imagenet/alexnet/pruning/alexnet.schedule_sensitivity.yaml,

I find it cannot work , it says cannot find the alexnet.schedule_sensitivity.yaml file. I think the compress should be "../sensitivity-pruning/alexnet.schedule_sensitivity.yaml". Thanks.

Minor Clean Up of Quantizer.py

Warnings / Weak Warnings

In this line : https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/quantizer.py#L114 , Local variable keys_list might be referenced before assignment, due to the fact that keys_list is declared only when we request for bits_overrides.
In this line : https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/quantizer.py#L175 Variable module is used, but this Variable module is already declared in for loop above .

Does quantization support custom model,like object detection model？

I have a object detection model, like mobilenet+ssd, and I use compress_classifier.py, I just want to quantize to 8 bits. I use command
python compress_classifier.py -a mobilenet_v1_ssd_lite_voc ../data.cifar10 --resume ../../data/models/mobilenet_v1_ssd_lite_voc_72.7.pth --quantize-eval
But, there is a error,

compress_classifier.py: error: argument --arch/-a: invalid choice: 'mobilenet_v1_ssd_lite_voc' (choose from 'alexnet', 'alexnet_bn', 'densenet121', 'densenet161', 'desenet169', 'densenet201', 'inception_v3', 'mobilenet', 'mobilenet_025', 'mobilenet_050', 'mobilenet_075', 'preact_resnet101', 'preact_resnet110_cifar', 'preact_resnet10_cifar_conv_ds', 'preact_resnet152', 'preact_resnet18', 'preact_resnet20_cifar', 'preact_resnet20_cifar_conv_ds', 'preact_resnet32_cifar', 'preact_resnet32_cifar_cov_ds', 'preact_resnet34', 'preact_resnet44_cifar', 'preact_resnet44_cifar_conv_ds', 'preact_resnet50', 'preact_resnet56_cifar', 'preact_resnet56_cifar_conv_ds', 'resnt101', 'resnet101_earlyexit', 'resnet110_cifar_earlyexit', 'resnet1202_cifar_earlyexit', 'resnet152', 'resnet152_earlyexit', 'resnet18', 'resnet18_earlyexit', 'resnet0_cifar', 'resnet20_cifar_earlyexit', 'resnet32_cifar', 'resnet32_cifar_earlyexit', 'resnet34', 'resnet34_earlyexit', 'resnet44_cifar', 'resnet44_cifar_earlyexit', 'rsnet50', 'resnet50_earlyexit', 'resnet56_cifar', 'resnet56_cifar_earlyexit', 'simplenet_cifar', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg11_bn_cifar, 'vgg11_cifar', 'vgg13', 'vgg13_bn', 'vgg13_bn_cifar', 'vgg13_cifar', 'vgg16', 'vgg16_bn', 'vgg16_bn_cifar', 'vgg16_cifar', 'vgg19', 'vgg19_bn', 'vgg19_bn_cifar', 'vg19_cifar')
Could you tell me how to do it? Thank you~

Query about adjusting the forward() method of the model post Thinning

In tests/test_pruning.py we have def test_conv_fc_interface

    # Remove filters
    fc = common.find_module_by_name(model, fc_name)
    assert fc is not None

    # Test thinning
    fm_size = fc.in_features // conv.out_channels
    num_nnz_filters = num_filters - expected_cnt_removed_filters
    distiller.remove_filters(model, zeros_mask_dict, arch, dataset, optimizer)
    assert conv.out_channels == num_nnz_filters
    assert fc.in_features == fm_size * num_nnz_filters

    # Run again, to make sure the optimizer and gradients shapes were updated correctly
    run_forward_backward(model, optimizer, dummy_input)
    run_forward_backward(model, optimizer, dummy_input)

and run_forward_backward does this:
https://github.com/NervanaSystems/distiller/blob/11490f6fe71ce7ccf5ef74511834d43b658630d2/tests/test_pruning.py#L230

How does this work without overloading the forward method of the model class ? Because now we are removing filters from Conv2d lets say this has Linear layer that follows it, dont we need to change the forward method of the model in-order for the forward pass to go through ?

Regression Issues with Resnet 'pretrainedmodels'

In a recent patch, titles 'Activation statistics collection: add a patched version of ResNet', Distiller "overloads" torch_models' Resnet pretrained models. Following the introduction of this patch, Resnet models from pretrainedmodels fail to load. Is that by intention?

save_intermediate_feature_maps (2).txt

In my opinion, this change has two cons:

It breaks access to pretrainedmodels resnet models with a misleading error. It's also incoherent from user perspective, because non-resnet models load perfectly well.
I cannot compare resnet and resnext models that originate from the same repository.

No such file or directory: '../../../data.imagenet/train'

The example in ""Direct" Quantization Without Training" contains the following code
"python3 compress_classifier.py -a resnet18 ../../../data.imagenet --pretrained --quantize --evaluate"
When I run it I get: " No such file or directory: '../../../data.imagenet/train'"

Thresholding 4D Tensors

In https://github.com/NervanaSystems/distiller/blob/c2a429374f424ab357f55fd89d9d0d9289a570fe/distiller/thresholding.py#L90 Is the comparison operation Reversed?

My understanding is that, If the mean or the max is greater than the threshold, We would expect to have zeros in the mask.

prune conv filters will not process successor bn layers

I meet a problem when i run the pruning_filters_for_efficient_convnets example which uses resnet56_cifar_filter_rank.yaml. One mistake i find is outdated document described this yaml. On the document website, it writes:

extensions:
  net_thinner:
      class: 'ResnetCifarFilterRemover'
      thinning_func_str: resnet_cifar_remove_filters

but in the yaml file, it use:

extensions:
  net_thinner:
      class: 'FilterRemover'
      thinning_func_str: remove_filters
      arch: 'resnet56_cifar'
      dataset: 'cifar10'

The net_thinner is different. There are no ResnetCifarFilterRemover class and resnet_cifar_remove_filters in the source code.

The biggest problem is the example cant work. I find when a conv layer remove some filters, it will not change the following bn layer. The error is below:

RuntimeError: running_mean should contain 7 elements not 16

I debug the code, it seems create_thinning_recipe_filters function in the thinning.py exists some bugs, it won't handle bn layers. the line to handle bn layers

Symmetric Linear Quantization

Hi,

The math for the derivation of y_q under Symmetric Linear Quantization on the page
https://nervanasystems.github.io/distiller/algo_quantization/index.html seems incorrect.
I am not able to reason out to myself, the scaling of the bias term.

Thanks

Some confusions about pruning procedure

You set the mask for parameters in model on the beginning of epoch. Considering you may set two or more different pruners for parameters in different layers, it is reasonable that the set_param_mask is called in on_epoch_begin of Class PruningPolicy.

But I think masker's method apply_mask should not be called in on_minibatch_begin because you would call the apply_mask method two or more times when you have two or more pruners in your policies. I think calling it one time is enough due to zeros_mask_dict have included all parameters you want to prune although the results are same no matter how many times you call it.

Interestingly, I found you implement this idea in on_minibatch_end in scheduler.py, which calls apply_mask only one time by usingweights_are_masked flags.

The second question is that you call apply_mask on the end of minibatch due to the weights are updated during the backward pass. However, I think it is no need to do that. Because the weights you mask cannot be updated due to it's grad attribute has been masked(or set to zero) in backward by using register_hook function.

Does quantization support zero-point?

Seems there isn't zero-point handling in the code.

So does distill support zero-point in quantization?

Thanks.

sensitivity analysis fail

Hi Neta,

I tried to run the sensitivity analysis for filter with the following command 'python3 compress_classifier.py -a resnet20_cifar --data ../../../data.cifar10/ -j 12 --resume=../ssl/checkpoints/checkpoint_trained_dense.pth.tar --sense=filter', but got an error, detailed log:

Logging to TensorBoard - remember to execute the server:

tensorboard --logdir='./logs'

=> loading checkpoint ../ssl/checkpoints/checkpoint_trained_dense.pth.tar
Checkpoint keys:
arch
optimizer
compression_sched
state_dict
best_top1
epoch
best top@1: 92.540
Loaded compression schedule from checkpoint (epoch 179)
=> loaded checkpoint '../ssl/checkpoints/checkpoint_trained_dense.pth.tar' (epoch 179)
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Running sensitivity tests
Testing sensitivity of module.conv1.weight [0.0% sparsity]
Traceback (most recent call last):
File "compress_classifier.py", line 782, in
main()
File "compress_classifier.py", line 339, in main
return sensitivity_analysis(model, criterion, test_loader, pylogger, args)
File "compress_classifier.py", line 750, in sensitivity_analysis
group=args.sensitivity)
File "/home/chongyu/application/distiller/distiller/sensitivity.py", line 108, in perform_sensitivity_analysis
scheduler.on_epoch_begin(0)
File "/home/chongyu/application/distiller/distiller/scheduler.py", line 112, in on_epoch_begin
policy.on_epoch_begin(self.model, self.zeros_mask_dict, meta)
File "/home/chongyu/application/distiller/distiller/policy.py", line 123, in on_epoch_begin
self.is_last_epoch = meta['current_epoch'] == (meta['ending_epoch'] - 1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

It looks like there is no valid value for meta['ending_epoch'].
Can you kindly suggest how to solve it? Thanks.

GroupThresholdMixin for Column Pruning

In https://github.com/NervanaSystems/distiller/blob/c2a429374f424ab357f55fd89d9d0d9289a570fe/distiller/thresholding.py#L124 there is an unconditional reduction in case threshold_criteria == Mean_Abs and unconditional Max finding in case threshold_criteria == max along dim =1 . Now if the group_type == Cols then dim=0 will be required ?

Do you have a plan to implement INQ in distiller?

Hi,
I read the distiller's documentation carefully and find it mentioned the INQ method proposed by Zhou.A. And now the Distiller does not support this method, so I want to know there is any plan to implement INQ in distiller? Thanks

why the "direct" quantization operate can't make model smaller？

I have tried to use Symmetric Linear Quantization to quantize my model. I'm wondering why the parameters of model is still the float (11.) rather than int (11). The Quantization seems only help me to change the parameters from float (11.12345) into integer of float (11.)

ONNX export for quantization?

I am experimenting with exporting a quantized network to ONNX. This ultimately does not succeed because there is no round operator in ONNX, and PyTorch does not define an ATen for round either.

I'm not sure what the best strategy would be (perhaps using floor, which exists in ONNX but is still missing in the PyTorch exporter), and some guidance would be appreciated.

In case anyone would like to duplicate the experiment, the first step was to modify the forward() method in ClippedLinearQuantization. Instead of the call to LinearQuantizeSTE.apply (the PyTorch ONNX exporter doesn't know what to do with that), inline the contents of LinearQuantizeSTE.forward, like so:

    def forward(self, input_):
        input_ = clamp(input_, 0, self.clip_val, self.inplace)
        if self.inplace:
            input_.mark_dirty(input_)
        input_ = linear_quantize(input_, self.scale_factor, self.inplace)
        if self.dequantize:
            input_ = linear_dequantize(self.input_, self.scale_factor, self.inplace)
        return input_

This should be functionally equivalent and the export trace will now complain about round. In q_utils.py, modify linear_quantize (for example, remove the calls to round_() and round() and replace them with... something else).

The link in the getting started section on the readme is not available

As the title say, the link

Usage
Tutorial: Using Distiller to prune a PyTorch language model
in the section "getting started" give me 404 message. Can you move or update these links?
Thanks.

cannot resume model for training

I use the test ：
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10/ -j 1 --resume ../../../data.cifar10/models/best.pth.tar --epochs 200 --compress=../quantization/preact_resnet20_cifar_pact.yaml --out-dir="logs/" --wd=0.0002 --vs=0

some error：

=> loading checkpoint ../../../data.cifar10/models/best.pth.tar
Checkpoint keys:
arch
        compression_sched
        epoch
        optimizer
        state_dict
        quantizer_metadata
        best_top1
   best top@1: 39.310
Loaded compression schedule from checkpoint (epoch 0)
Loaded quantizer metadata from the checkpoint
{'params': {'bits_weights': 3, 'bits_activations': 4, 'quantize_bias': False, 'bits_overrides': OrderedDict([('conv1', OrderedDict([('wts', None), ('acts', None)])), ('layer1.0.pre_relu', OrderedDict([('wts', None), ('acts', None)])), ('final_relu', OrderedDict([('wts', None), ('acts', None)])), ('fc', OrderedDict([('wts', None), ('acts', None)]))])}, 'type': <class 'distiller.quantization.clipped_linear.PACTQuantizer'>}
Traceback (most recent call last):
  File "compress_classifier.py", line 686, in <module>
    main()
  File "compress_classifier.py", line 244, in main
    model, chkpt_file=args.resume)
  File "D:\pytorchProject\distiller\apputils\checkpoint.py", line 117, in load_checkpoint
    quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() missing 1 required positional argument: 'optimizer'

how to fix it？

Resume from checkpoint with quantization

https://github.com/NervanaSystems/distiller/blob/e749ea6288431a53f839b621cc3e38facbf824de/distiller/quantization/range_linear.py#L165
I got an error message after resume symmetric linear quantized model

Traceback (most recent call last):
  File "compress_classifier.py", line 684, in <module>
    main()
  File "compress_classifier.py", line 244, in main
    model, chkpt_file=args.resume)
  File "/chenys/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
    quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() got an unexpected keyword argument 'bits_weights'

I modify the __init__ argument and fix the problem

def __init__(self, model, bits_activations=8, bits_weights=8, **kw):
     super(SymmetricLinearQuantizer, self).__init__(model, bits_activations=bits_activations,
                                                    bits_weights=bits_weights, 
                                                    train_with_fp_copy=False,
                                                    **kw)

I think the quantizer may be consistent in naming the parameters, or the quantizer_metadata will cause initialize error

Query on optimizer

This line of code self.optimizer.setstate({'param_groups': new_optimizer.param_groups}) in quantizer.py cannot change the parameter optimizer used to initialize the quantizer. So the optimizer in compress_classifier.py still can not changed by the function _get_updated_optimizer_params_groups in PACT class.

About resume quantized model to summary

I tried to resume quantized model to get MACs summary like this

python3 compress_classifier.py --resume=./resnet20_quantized.pth.tar -a=resnet20_cifar ../../../data..cifar10 --summary=compute

But it failed and came with this error

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Another problem, the first time I ran compress_classifier.py to train simplenet_cifar, msglogger worked well. When I did it again to train resnet20 (or other models), however, the terminal only show these printed message

Logging to TesnsorBoard - remember to execute the server:
tensorboard --logdir='./logs'
Files already downloaded and verified
Files already downloaded and verified

No more message was printed, and log file which should be saved in './logs/time_stamp/log' was no longer generated either while the training process was still working. How to fix these two problems? Thx a lot

About LASSO based channel pruning

Hi there,

Thanks for open-source the code.

Is there any plan for implementation of the LASSO based channel-pruning algorithm (i.e. the paper: Channel pruning for accelerating very deep neural networks)?

lr_scheduler doesn't work when start training from checkpoint

Hello:
I want to use MultiStepMultiGammaLR scheduler in my pruning lr_scheduler.
When I using the compress_classifier.py to pruning the res_net20_cifar from the begining and define the lr_scheduler in the yaml file, it works well.
But when I using the checkpoint to train and prune, the lr_scheduler defined in the yaml file doesn't work. The lr doesn't decay when the epoch achieve defined milestone.

I use the script below:
python3 compress_classifier.py --arch resnet20_cifar dataset/ -p=50 --lr=0.3 --epochs=150 -b 128 --compress=resnet20_cifar_ele_pruning.yaml -j=1 --vs 0 --deterministic --resume=logs/resnet20_cifar_baseline/checkpoint.pth.tar

Below is my yaml setting

version: 1
pruners:
  low_pruner:
    class: AutomatedGradualPruner
    initial_sparsity : 0.05
    final_sparsity: 0.60
    weights: [module.layer1.2.conv1.weight,  module.layer1.2.conv1.weight,
              module.layer1.0.conv1.weight,  module.layer1.0.conv2.weight,
              module.layer1.1.conv1.weight,  module.layer1.1.conv2.weight]

  mid_pruner:
    class:  AutomatedGradualPruner
    initial_sparsity : 0.05
    final_sparsity: 0.67
    weights: [module.layer2.2.conv1.weight,  module.layer2.2.conv2.weight,
              module.layer2.0.conv2.weight,  module.layer2.0.downsample.1.weight,
              module.layer2.0.conv1.weight,  module.layer2.0.downsample.0.weight,
              module.layer2.1.conv1.weight,  module.layer2.1.conv2.weight]

  high_pruner:
    class:  AutomatedGradualPruner
    initial_sparsity : 0.05
    final_sparsity: 0.76
    weights: [module.layer3.0.conv1.weight,  module.layer3.1.conv1.weight,
              module.layer3.1.conv2.weight,  module.layer3.0.conv2.weight,
              module.layer3.0.downsample.0.weight, module.layer3.0.downsample.1.weight,
              module.fc.weight]
lr_schedulers:
  training_lr:
    class: MultiStepMultiGammaLR
    milestones: [300, 302, 400]
    gammas: [0.1, 0.1, 0.5]

policies:
    - pruner:
        instance_name: low_pruner
      starting_epoch: 300
      ending_epoch: 400
      frequency: 2
    - pruner:
        instance_name: mid_pruner
      starting_epoch: 300
      ending_epoch: 400
      frequency: 2
    - pruner:
        instance_name: high_pruner
      starting_epoch: 300
      ending_epoch: 400
      frequency: 2
    - lr_scheduler:
        instance_name: training_lr
      starting_epoch: 0
      ending_epoch: 400
      frequency: 1

Is there any problem in my script and yaml setting?

About Knowledge Distillation

I've read the Q&A in #90 .And I want to train a student model(preact_resnet20_cifar)from a preact_resnet44_cifar.Here is the command line I used to train the teacher model:
python compress_classifier.py -a preact_resnet44_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 .
The KD command line:
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 --kd-teacher preact_resnet44_cifar --kd-resume logs/2018.12.11-130318/checkpoint.pth.tar --kd-temp 5.0 --kd-dw 0.7 --kd-sw 0.3
I got the wrong message:
`==> using cifar10 dataset
=> creating preact_resnet44_cifar model for CIFAR10
=> loading checkpoint logs/2018.12.11-130318/checkpoint.pth.tar
Checkpoint keys:
epoch
arch
state_dict
best_top1
optimizer
compression_sched
quantizer_metadata
best top@1: 48.000
Loaded compression schedule from checkpoint (epoch 2)
Loaded quantizer metadata from the checkpoint

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, , _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'
`
I don't know how could it happen.The other question is:Must the teacher model be deeper than the student model?_

Unable to reproduce 6.96% test error for resnet-56 on cifar10

I run the following command to run a baseline model for resnet56 on cifar10:

python3 compress_classifier.py
--arch resnet56_cifar ../data.cifar10 -p=50
--lr=0.4 --epochs=180
--compress=../pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml
-j=1 --deterministic

I am unable to reproduce to accuracy claimed in file resnet56_cifar_baseline_training.yaml which says that they achieve top1 accuracy of 92.97%.
However, when i run the code, the reported accuracy is only 90.38%.

Further, I notice that the learning rate schedule used in this config file is different from the original resnet paper and also the original paper the code wants to reproduce. So i change the learning rate schedule to decrease by 0.1 in epoch 80 and 120. In total I train for 160 epochs. I achieve this by modifying the file resnet56_cifar_baseline_training.yaml.
Even with this learning rate schedule, the final accuracy is still 92.20%.

yolov2 and darknet19

Hello, I wanna prune yolov2's pretrained model, just wanna it to have fewer filters for each layer. But, it is not in the Torchvision'model set. Does a model have to be in Torchvision'model set if I wanna prune it? I studied your documentation for a week, and i did not find a clear way to do that. Yolov2 is first trained on ImageNet then we got Darknet19 model. And then make a little change about darknet19 network, and train it again on object detection dataset and we got yolov2. And I wanna to prune this model. I am new in Pytorch. Can I do this with Distiller? Can you give me some detailed instructions? If yes, I would like to contribute my work to the nice Distiller.

Thinning FC layers

The thinning methods support only removing channels or filters of a CONV layer

        # We are only interested in 4D weights (of Convolution layers)
        if param.dim() != 4:
            continue

[1]
How about thinning FC layers, even if you are not going to support it, can you provide, what all one should take care of if one wants to implement say remove_rows( ) or remove_columns( ) corresponding to neuron pruning ?

[2]
Its seems hard to simply extend the thinning_recipe approach as it seems to be too tied to removing CONV structures. Any suggestions ?

[3]
Also If we are thinning, pruned pytorch models, what could be the reason for its accuracy drop ?
Because we are strictly removing only zero structures, the math should be about the same and cause the same classificaiton ?
You seem to be taking into consideration a possible perofrmace drop by preparing to thin even the gradient tensors.

Early Exit Inference

How to train an early exit model? Here is the command I used:

python3 compress_classifier.py --arch resnet20_cifar_earlyexit ../../../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/resnet20_cifar_baseline_training.yaml -j=1 --deterministic --earlyexit_thresholds 0.9 1.2 --earlyexit_lossweights 0.2 0.3

But Distiller shows me the following error message:

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
==> using cifar10 dataset
=> creating resnet20_cifar_earlyexit model for CIFAR10

Logging to TensorBoard - remember to execute the server:

tensorboard --logdir='./logs'

=> using early-exit threshold values of [0.9, 1.2]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'dampening': 0, 'weight_decay': 0.0001, 'momentum': 0.9, 'nesterov': False, 'lr': 0.3}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml

Training epoch: 45000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 477, in train
loss = earlyexit_loss(output, target, criterion, args)
File "compress_classifier.py", line 645, in earlyexit_loss
loss += (1.0 - sum_lossweights) * criterion(output[args.num_exits-1], target)
IndexError: list index out of range

resnet20_cifar_baseline_training.yaml ==>
lr_schedulers:
training_lr:
class: StepLR
step_size: 45
gamma: 0.10

policies:
- lr_scheduler:
instance_name: training_lr
starting_epoch: 45
ending_epoch: 200
frequency: 1

Structure pruning is broken for models with non-serial connections

Structure pruning is broken for models with non-serial connections.
Models such as Alexnet and VGG are have serial data-dependencies (connections) and are fine.
More complex models, with parallel-data dependencies (paths), such as ResNets (skip connections) and GoogLeNet (Inception layers) might fail when pruning filters or channels.

This is because a module, such as torch.nn.modules.batchnorm.BatchNorm2d layers, might depend on multiple inputs. This is not always a problem. For example, if the dependent module has type torch.nn.Conv2d and we are pruning weight filters.
But if the dependent module has type torch.nn.modules.batchnorm.BatchNorm2d, and we are pruning weight filters, then it is possible that each of the inputs selects different activation channels to prune. In such a case, how should we prune the BatchNorm's scale and shift tensors (.weight and .bias)?

To solve this we need to define one of the modules as the leader which determines what activation channels to prune; and define the rest of the modules in the dependency sub-graph as followers. Followers do not choose which activation channels to prune, so their sparsity masks is determined by the choice of the leader.
Because the sparsity maps of different follower modules may have different shapes, the leader defines a binary map which is a binary vector of active (1) and pruned (0) channels. Each "follower" expands this single binary map to create its own private pruning mask.

This requires changing the way we express filter/channel pruning in YAML, and how we create pruning masks.

I'm trying to make this fix available soon.
This is related to issues #79 and #73.

Export quantized model

Hi Neta,

I looked into the doc and find out that in https://nervanasystems.github.io/distiller/design/index.html#quantization, it is mentioned 'We also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions.' Does this mean we can export this model with such 'quantized versions' operation?
If it is not supported in current Distiller, can you kindly suggest how I can export it?
Thanks so much.

Can'r run prune and quantization together

Hi,

When I try to run prune and quantization together on resnet20_cifar in one yaml, it failed. It said Key Error on xxx_float_weight. So what's the correct procedure to mix both of them together?

Thomas

Query on quantization of Bias

I am expecting the final model after quantization to have all integers in the range -128 to 127 , for 8 -bit symmetric linear quantization, but when I print out the model parameters I noticed that the bias are still as floats.
so I am currently setting inplace = True in this line https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/range_linear.py#L121 .

At one point we need need to quantize the bias, before writing into the model. Currently I do not see that happening.

Compressing seq2seq

Hey,

We've recently written a tutorial on compressing a PyTorch language model using the element-wise AGP pruner.
We're seeking community help to add an example of pruning a PyTorch seq2seq model (for example).
Thanks,
Neta

Automated Deep Compression status

Hello there,
I am wondering about the state of the ADC implementation, and what remains to bring it to a functional state.
In the ADC merge commit message, you mentioned that it is still WiP and that it is using an unreleased version of Coach. Is that still the case?
Also, is there any documentation for how to use ADC in Distiller?

Thanks

initialization for GradientRankedFilterPruner

hello:

In the initialization function of GradientRankedFilterPruner, it is using RandomRankedFilterPruner. Is it typo? IMHO, we should use super(GradientRankedFilterPruner, self) instead.

thanks!

two things to confirm about weights and activations quantization

When setting the train_with_fp_copy true, you change the attribute of conv/fc layer. You substitute conv.weights with conv.float_weights and conv.weights become the buffer instead parameter. The forward pass of conv/fc layer still use conv.weights, quantized weights, which is determined by Pytorch default conv implementation. But in backward pass, the gradients calculated with respect to q_weights(quantized weights) is stored in float_weights.gard rather than weightsdue to it has no grad attribute. So you implicitly back-prop the grad with respect to quantized weights to the grad with full-precision weights using straight-through estimator namely both are equal.
You implement activation quantization by replacing relu layer with new own-defined layer. Also you again directly make the gradient with respect of activations before and after quantisation equal using STE.
So I want to confirm to you whether there are some misunderstanding in above points.

RuntimeError: cuda runtime error (30)

An error occurred:
RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCTensorRandom.cu:25
My env is ubuntu 18.04, cuda 8.0, torch-0.4.0, python 3.6. Which one is wrong？Or what's the reason?
In the document "readme" shows that:
PyTorch is included in the requirements.txt file, and will currently download PyTorch version 3.1 for CUDA 8.0. This is the setup we've used for testing Distiller.
But the requirements.txt shows that the torch version is 0.4.0.
What are the final versions? Which cuda and which torch and the others.

Questions about regularization and pruning

I found you take regularization as another means of pruning. But the procedure is different between them. Pruning is taking effect on the beginning of batchon_minibatch_begin while regularization is on the end of batchon_minibatch_end. It means that you set the regularization term zero below the threshold every batch iteration during training.
What is the reason for this? I think it's natural that this happens on the end of one epoch or end of whole training when the regularization terms have been decreased enough for pruning.
The regularization and pruning both use the same zeros_mask_dict, it may brings some messes. for example apply_mask in on_minibatch_end of class RegularizationPolicy would be called by regularization mask, but also pruning mask if there are both regularizer and pruner.
What is the purpose of keeping the regulatization mask of the last epoch. I guess it may be used by some remover in thinning.py, right?

invalid choice: 'resnet20-cifar'?

I use “python3 compress_classifier.py -a resnet20-cifar ../../../data.cifar10 --resume ../examples/ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize --evaluate”

but error occurs:

compress_classifier.py: error: argument --arch/-a: invalid choice: 'resnet20-cifar' (choose from 'alexnet', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'inception_v3', 'mobilenet', 'mobilenet_025', 'mobilenet_050', 'mobilenet_075', 'resnet101', 'resnet152', 'resnet18', 'resnet20_cifar', 'resnet32_cifar', 'resnet34', 'resnet44_cifar', 'resnet50', 'resnet56_cifar', 'simplenet_cifar', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn')

tensorboard backend

loss.backward() --> RuntimeError

Is there anyone who can help me?

Ubuntu 16.04

command:
time python3 compress_classifier.py --arch resnet20_cifar ../../../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --compress=../quantization/preact_resnet20_cifar_pact.yaml -j=1 --deterministic

Error message:

--- validate (epoch=199)-----------
5000 samples (256 per mini-batch)
==> Top1: 90.300 Top5: 99.700 Loss: 0.297

==> Best Top1: 90.860 on Epoch: 187
Saving checkpoint to: logs/2018.11.29-140224/checkpoint.pth.tar

Training epoch: 45000 samples (256 per mini-batch)

Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.11.29-140224/2018.11.29-140224.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 391, in main
msglogger.info(distiller.masks_sparsity_tbl_summary(model, compression_scheduler))
File "/usr/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/media/walker/DATA/work/new_quant/distiller/distiller/data_loggers/collector.py", line 301, in collectors_context
yield collectors_dict
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 495, in train
loss.backward()
File "/home/walker/.local/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/walker/.local/lib/python3.5/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

real 301m20.430s
user 204m21.640s
sys 99m42.978s

The problem of example command

Hi, nzmora:
When I ran the command "python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01", I got the following error:

2018-10-22 17:03:03,745 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log
2018-10-22 17:03:03,745 - Number of CPUs: 24
2018-10-22 17:03:03,850 - Number of GPUs: 8
2018-10-22 17:03:03,850 - CUDA version: 8.0.61
2018-10-22 17:03:03,850 - CUDNN version: 7102
2018-10-22 17:03:03,851 - Kernel: 4.4.0-98-generic
2018-10-22 17:03:03,851 - Python: 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
2018-10-22 17:03:03,851 - PyTorch: 0.4.0
2018-10-22 17:03:03,851 - Numpy: 1.14.3
2018-10-22 17:03:03,852 - Traceback (most recent call last):
File "compress_classifier.py", line 686, in
main()
File "compress_classifier.py", line 179, in main
apputils.log_execution_env_state(sys.argv, gitroot=module_path)
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 78, in log_execution_env_state
log_git_state()
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 56, in log_git_state
repo = Repo(gitroot, search_parent_directories=True)
File "/home/project/compress/distiller-master/env/lib/python3.5/site-packages/git/repo/base.py", line 168, in init
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /home/project/compress/distiller-master

2018-10-22 17:03:03,852 -
2018-10-22 17:03:03,852 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log

How can I solve the problem？

quant_aware_train_linear_quant doesn't work on resnet20_cifar

I tried the quant_aware_train_linear_quant.yaml on the resnet20_cifar model, the model seems to be messed up, and cannot get any reasonable prediction and also cannot train.

Is the quant_aware_train_linear_quant.yaml is only suitable for the resnet18 ? It seems not, could anyone help ? Thanks very much.

A version for tensorflow framework?

Would it be possible that there will be a distiller version for tensorflow in the future?

intellabs / distiller Goto Github PK

distiller's Issues

I tried to run the sensitivity analysis for filter with the following command 'python3 compress_classifier.py -a resnet20_cifar --data ../../../data.cifar10/ -j 12 --resume=../ssl/checkpoints/checkpoint_trained_dense.pth.tar --sense=filter', but got an error, detailed log:

Recommend Projects

Recommend Topics

Recommend Org