intellabs / distiller Goto Github PK
View Code? Open in Web Editor NEWNeural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
License: Apache License 2.0
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
License: Apache License 2.0
Hi Neta,
I met a an error when doing filter pruning, after debugging, I found it might because Distiller does not support concatenate operation.
The related layers of my network:
(aspp): ASPP_module(
(aspp0): Sequential(
(0): Conv2d(116, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp1): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp2): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(aspp3): Sequential(
(0): Conv2d(116, 256, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(global_avg_pool): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(116, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(conv): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
where input of layer 'conv' is concatenating outputs of aspp0. aspp1. aspp2, aspp3 and global_avg_pool (at dim = 1)
my configuration for pruning:
module.aspp.aspp0.0.weight: [0.5, '3D']
module.aspp.aspp1.0.weight: [0.5, '3D']
module.aspp.aspp2.0.weight: [0.5, '3D']
module.aspp.aspp3.0.weight: [0.5, '3D']
module.global_avg_pool.1.weight: [0.5, '3D']
Then it is supposed that Distiller should prune the following layer 'conv' to be Conv2d(640, 256, kernel_size=(1, 1), stride=(1, 1), bias=False), but I got error ' Given groups=1, weight of size [256, 128, 1, 1], expected input[8, 640, 60, 80] to have 128 channels, but got 640 channels instead', which means Distiller does not recognise the concatenated inputs.
Please advise, thanks.
I am using a workaround to allow resuming from checkpoint with active quantization. The requires_grad
flags aren't set in the restored biases and weights (they seem to be present at checkpoint save time). So as a quick fix I use:
def set_grad(m):
"""
Force the `requires_grad` flag on all weights and biases
"""
if isinstance(m, (nn.Linear, nn.Conv2d)):
m.weight.requires_grad_()
if hasattr(m, 'bias') and m.bias is not None:
m.bias.requires_grad_()
model.apply(set_grad)
Without this, I get the PyTorch error message element 0 of tensors does not require grad and does not have a grad_fn
.
Looking for a proper way to fix this.
I had read "Knowledge Distillation" https://nervanasystems.github.io/distiller/schedule/index.html#knowledge-distillation
Would you please help to give me a simple example about knowledge distillation?
Hello, I'm reading the docs on this link:
https://nervanasystems.github.io/distiller/algo_pruning/index.html#automated-gradual-pruner-agp
I find that one link on the text is not available:
...
You can play with the scheduling parameters in the agp_schedule.ipynb notebook.
Maybe you can fix the link to some jupyter notebook on the github?
Thanks.
if args.evaluate:
if args.quantize:
model.cpu()
quantizer = quantization.SymmetricLinearQuantizer(model, 8, 8)
quantizer.prepare_model()
model.cuda()
top1, _, _ = test(test_loader, model, criterion, [pylogger], args.print_freq)
I wanted some help understanding the flow for evaluating a quantized model.
From this code, I see that the model parameters and activations are quantized after quantizer.prepare_model()
but then I was expecting the image batches will be quantized too, before performing inference using test()
. But I could not find the place where you are attempting to quantize the inputs during forward pass. Once you feed it into the model, post_quantized_forward()
will take care of quantizing the activations.
I am guessing Its being taken care of in one of the quantizer
methods. Not exactly sure where.
Could you please elaborate on the flow for quantization of inputs.
Hi, I have a question, does Distiller support 1D convolutions? I'm trying to compress a CNN with 1D convolutions for binary classification of strings using quantization.
When I run the sample code: $time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../imagenet/alexnet/pruning/alexnet.schedule_sensitivity.yaml,
I find it cannot work , it says cannot find the alexnet.schedule_sensitivity.yaml file. I think the compress should be "../sensitivity-pruning/alexnet.schedule_sensitivity.yaml". Thanks.
Warnings / Weak Warnings
In this line : https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/quantizer.py#L114 , Local variable keys_list
might be referenced before assignment, due to the fact that keys_list
is declared only when we request for bits_overrides
.
In this line : https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/quantizer.py#L175 Variable module
is used, but this Variable module
is already declared in for
loop above .
I have a object detection model, like mobilenet+ssd, and I use compress_classifier.py, I just want to quantize to 8 bits. I use command
python compress_classifier.py -a mobilenet_v1_ssd_lite_voc ../data.cifar10 --resume ../../data/models/mobilenet_v1_ssd_lite_voc_72.7.pth --quantize-eval
But, there is a error,
compress_classifier.py: error: argument --arch/-a: invalid choice: 'mobilenet_v1_ssd_lite_voc' (choose from 'alexnet', 'alexnet_bn', 'densenet121', 'densenet161', 'desenet169', 'densenet201', 'inception_v3', 'mobilenet', 'mobilenet_025', 'mobilenet_050', 'mobilenet_075', 'preact_resnet101', 'preact_resnet110_cifar', 'preact_resnet10_cifar_conv_ds', 'preact_resnet152', 'preact_resnet18', 'preact_resnet20_cifar', 'preact_resnet20_cifar_conv_ds', 'preact_resnet32_cifar', 'preact_resnet32_cifar_cov_ds', 'preact_resnet34', 'preact_resnet44_cifar', 'preact_resnet44_cifar_conv_ds', 'preact_resnet50', 'preact_resnet56_cifar', 'preact_resnet56_cifar_conv_ds', 'resnt101', 'resnet101_earlyexit', 'resnet110_cifar_earlyexit', 'resnet1202_cifar_earlyexit', 'resnet152', 'resnet152_earlyexit', 'resnet18', 'resnet18_earlyexit', 'resnet0_cifar', 'resnet20_cifar_earlyexit', 'resnet32_cifar', 'resnet32_cifar_earlyexit', 'resnet34', 'resnet34_earlyexit', 'resnet44_cifar', 'resnet44_cifar_earlyexit', 'rsnet50', 'resnet50_earlyexit', 'resnet56_cifar', 'resnet56_cifar_earlyexit', 'simplenet_cifar', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg11_bn_cifar, 'vgg11_cifar', 'vgg13', 'vgg13_bn', 'vgg13_bn_cifar', 'vgg13_cifar', 'vgg16', 'vgg16_bn', 'vgg16_bn_cifar', 'vgg16_cifar', 'vgg19', 'vgg19_bn', 'vgg19_bn_cifar', 'vg19_cifar')
Could you tell me how to do it? Thank you~
In tests/test_pruning.py
we have def test_conv_fc_interface
# Remove filters
fc = common.find_module_by_name(model, fc_name)
assert fc is not None
# Test thinning
fm_size = fc.in_features // conv.out_channels
num_nnz_filters = num_filters - expected_cnt_removed_filters
distiller.remove_filters(model, zeros_mask_dict, arch, dataset, optimizer)
assert conv.out_channels == num_nnz_filters
assert fc.in_features == fm_size * num_nnz_filters
# Run again, to make sure the optimizer and gradients shapes were updated correctly
run_forward_backward(model, optimizer, dummy_input)
run_forward_backward(model, optimizer, dummy_input)
and run_forward_backward
does this:
https://github.com/NervanaSystems/distiller/blob/11490f6fe71ce7ccf5ef74511834d43b658630d2/tests/test_pruning.py#L230
How does this work without overloading the forward
method of the model class ? Because now we are removing filters from Conv2d
lets say this has Linear
layer that follows it, dont we need to change the forward method of the model in-order for the forward pass to go through ?
In a recent patch, titles 'Activation statistics collection: add a patched version of ResNet', Distiller "overloads" torch_models' Resnet pretrained models. Following the introduction of this patch, Resnet models from pretrainedmodels fail to load. Is that by intention?
save_intermediate_feature_maps (2).txt
In my opinion, this change has two cons:
The example in ""Direct" Quantization Without Training" contains the following code
"python3 compress_classifier.py -a resnet18 ../../../data.imagenet --pretrained --quantize --evaluate"
When I run it I get: " No such file or directory: '../../../data.imagenet/train'"
In https://github.com/NervanaSystems/distiller/blob/c2a429374f424ab357f55fd89d9d0d9289a570fe/distiller/thresholding.py#L90 Is the comparison operation Reversed?
My understanding is that, If the mean or the max is greater than the threshold, We would expect to have zeros in the mask.
I meet a problem when i run the pruning_filters_for_efficient_convnets example which uses resnet56_cifar_filter_rank.yaml. One mistake i find is outdated document described this yaml. On the document website, it writes:
extensions:
net_thinner:
class: 'ResnetCifarFilterRemover'
thinning_func_str: resnet_cifar_remove_filters
but in the yaml file, it use:
extensions:
net_thinner:
class: 'FilterRemover'
thinning_func_str: remove_filters
arch: 'resnet56_cifar'
dataset: 'cifar10'
The net_thinner is different. There are no ResnetCifarFilterRemover
class and resnet_cifar_remove_filters
in the source code.
The biggest problem is the example cant work. I find when a conv layer remove some filters, it will not change the following bn layer. The error is below:
RuntimeError: running_mean should contain 7 elements not 16
I debug the code, it seems create_thinning_recipe_filters
function in the thinning.py exists some bugs, it won't handle bn layers. the line to handle bn layers
Hi,
The math for the derivation of y_q under Symmetric Linear Quantization on the page
https://nervanasystems.github.io/distiller/algo_quantization/index.html seems incorrect.
I am not able to reason out to myself, the scaling of the bias term.
Thanks
You set the mask for parameters in model on the beginning of epoch. Considering you may set two or more different pruners for parameters in different layers, it is reasonable that the set_param_mask
is called in on_epoch_begin
of Class PruningPolicy
.
But I think masker's method apply_mask
should not be called in on_minibatch_begin
because you would call the apply_mask
method two or more times when you have two or more pruners in your policies. I think calling it one time is enough due to zeros_mask_dict
have included all parameters you want to prune although the results are same no matter how many times you call it.
Interestingly, I found you implement this idea in on_minibatch_end
in scheduler.py
, which calls apply_mask
only one time by usingweights_are_masked
flags.
The second question is that you call apply_mask
on the end of minibatch due to the weights are updated during the backward pass. However, I think it is no need to do that. Because the weights you mask cannot be updated due to it's grad
attribute has been masked(or set to zero) in backward by using register_hook
function.
Seems there isn't zero-point handling in the code.
So does distill support zero-point in quantization?
Thanks.
Hi Neta,
Logging to TensorBoard - remember to execute the server:
tensorboard --logdir='./logs'
=> loading checkpoint ../ssl/checkpoints/checkpoint_trained_dense.pth.tar
Checkpoint keys:
arch
optimizer
compression_sched
state_dict
best_top1
epoch
best top@1: 92.540
Loaded compression schedule from checkpoint (epoch 179)
=> loaded checkpoint '../ssl/checkpoints/checkpoint_trained_dense.pth.tar' (epoch 179)
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0.0001, 'nesterov': False}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Running sensitivity tests
Testing sensitivity of module.conv1.weight [0.0% sparsity]
Traceback (most recent call last):
File "compress_classifier.py", line 782, in
main()
File "compress_classifier.py", line 339, in main
return sensitivity_analysis(model, criterion, test_loader, pylogger, args)
File "compress_classifier.py", line 750, in sensitivity_analysis
group=args.sensitivity)
File "/home/chongyu/application/distiller/distiller/sensitivity.py", line 108, in perform_sensitivity_analysis
scheduler.on_epoch_begin(0)
File "/home/chongyu/application/distiller/distiller/scheduler.py", line 112, in on_epoch_begin
policy.on_epoch_begin(self.model, self.zeros_mask_dict, meta)
File "/home/chongyu/application/distiller/distiller/policy.py", line 123, in on_epoch_begin
self.is_last_epoch = meta['current_epoch'] == (meta['ending_epoch'] - 1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
It looks like there is no valid value for meta['ending_epoch'].
Can you kindly suggest how to solve it? Thanks.
In https://github.com/NervanaSystems/distiller/blob/c2a429374f424ab357f55fd89d9d0d9289a570fe/distiller/thresholding.py#L124 there is an unconditional reduction in case threshold_criteria == Mean_Abs
and unconditional Max finding in case threshold_criteria == max
along dim =1
. Now if the group_type == Cols
then dim=0
will be required ?
Hi,
I read the distiller's documentation carefully and find it mentioned the INQ method proposed by Zhou.A. And now the Distiller does not support this method, so I want to know there is any plan to implement INQ in distiller? Thanks
I have tried to use Symmetric Linear Quantization to quantize my model. I'm wondering why the parameters of model is still the float (11.) rather than int (11). The Quantization seems only help me to change the parameters from float (11.12345) into integer of float (11.)
I am experimenting with exporting a quantized network to ONNX. This ultimately does not succeed because there is no round
operator in ONNX, and PyTorch does not define an ATen
for round either.
I'm not sure what the best strategy would be (perhaps using floor
, which exists in ONNX but is still missing in the PyTorch exporter), and some guidance would be appreciated.
In case anyone would like to duplicate the experiment, the first step was to modify the forward()
method in ClippedLinearQuantization
. Instead of the call to LinearQuantizeSTE.apply
(the PyTorch ONNX exporter doesn't know what to do with that), inline the contents of LinearQuantizeSTE.forward
, like so:
def forward(self, input_):
input_ = clamp(input_, 0, self.clip_val, self.inplace)
if self.inplace:
input_.mark_dirty(input_)
input_ = linear_quantize(input_, self.scale_factor, self.inplace)
if self.dequantize:
input_ = linear_dequantize(self.input_, self.scale_factor, self.inplace)
return input_
This should be functionally equivalent and the export trace will now complain about round
. In q_utils.py
, modify linear_quantize
(for example, remove the calls to round_()
and round()
and replace them with... something else).
As the title say, the link
I use the test :
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10/ -j 1 --resume ../../../data.cifar10/models/best.pth.tar --epochs 200 --compress=../quantization/preact_resnet20_cifar_pact.yaml --out-dir="logs/" --wd=0.0002 --vs=0
some error:
=> loading checkpoint ../../../data.cifar10/models/best.pth.tar
Checkpoint keys:
arch
compression_sched
epoch
optimizer
state_dict
quantizer_metadata
best_top1
best top@1: 39.310
Loaded compression schedule from checkpoint (epoch 0)
Loaded quantizer metadata from the checkpoint
{'params': {'bits_weights': 3, 'bits_activations': 4, 'quantize_bias': False, 'bits_overrides': OrderedDict([('conv1', OrderedDict([('wts', None), ('acts', None)])), ('layer1.0.pre_relu', OrderedDict([('wts', None), ('acts', None)])), ('final_relu', OrderedDict([('wts', None), ('acts', None)])), ('fc', OrderedDict([('wts', None), ('acts', None)]))])}, 'type': <class 'distiller.quantization.clipped_linear.PACTQuantizer'>}
Traceback (most recent call last):
File "compress_classifier.py", line 686, in <module>
main()
File "compress_classifier.py", line 244, in main
model, chkpt_file=args.resume)
File "D:\pytorchProject\distiller\apputils\checkpoint.py", line 117, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() missing 1 required positional argument: 'optimizer'
how to fix it?
https://github.com/NervanaSystems/distiller/blob/e749ea6288431a53f839b621cc3e38facbf824de/distiller/quantization/range_linear.py#L165
I got an error message after resume symmetric linear quantized model
Traceback (most recent call last):
File "compress_classifier.py", line 684, in <module>
main()
File "compress_classifier.py", line 244, in main
model, chkpt_file=args.resume)
File "/chenys/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: __init__() got an unexpected keyword argument 'bits_weights'
I modify the __init__
argument and fix the problem
def __init__(self, model, bits_activations=8, bits_weights=8, **kw):
super(SymmetricLinearQuantizer, self).__init__(model, bits_activations=bits_activations,
bits_weights=bits_weights,
train_with_fp_copy=False,
**kw)
I think the quantizer may be consistent in naming the parameters, or the quantizer_metadata will cause initialize error
This line of code self.optimizer.setstate({'param_groups': new_optimizer.param_groups}) in quantizer.py cannot change the parameter optimizer used to initialize the quantizer. So the optimizer in compress_classifier.py still can not changed by the function _get_updated_optimizer_params_groups in PACT class.
I tried to resume quantized model to get MACs summary like this
python3 compress_classifier.py --resume=./resnet20_quantized.pth.tar -a=resnet20_cifar ../../../data..cifar10 --summary=compute
But it failed and came with this error
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
Another problem, the first time I ran compress_classifier.py to train simplenet_cifar, msglogger worked well. When I did it again to train resnet20 (or other models), however, the terminal only show these printed message
Logging to TesnsorBoard - remember to execute the server:
tensorboard --logdir='./logs'
Files already downloaded and verified
Files already downloaded and verified
No more message was printed, and log file which should be saved in './logs/time_stamp/log' was no longer generated either while the training process was still working. How to fix these two problems? Thx a lot
Hi there,
Thanks for open-source the code.
Is there any plan for implementation of the LASSO based channel-pruning algorithm (i.e. the paper: Channel pruning for accelerating very deep neural networks)?
Hello:
I want to use MultiStepMultiGammaLR
scheduler in my pruning lr_scheduler.
When I using the compress_classifier.py
to pruning the res_net20_cifar
from the begining and define the lr_scheduler in the yaml
file, it works well.
But when I using the checkpoint to train and prune, the lr_scheduler defined in the yaml
file doesn't work. The lr doesn't decay when the epoch achieve defined milestone.
I use the script below:
python3 compress_classifier.py --arch resnet20_cifar dataset/ -p=50 --lr=0.3 --epochs=150 -b 128 --compress=resnet20_cifar_ele_pruning.yaml -j=1 --vs 0 --deterministic --resume=logs/resnet20_cifar_baseline/checkpoint.pth.tar
Below is my yaml
setting
version: 1
pruners:
low_pruner:
class: AutomatedGradualPruner
initial_sparsity : 0.05
final_sparsity: 0.60
weights: [module.layer1.2.conv1.weight, module.layer1.2.conv1.weight,
module.layer1.0.conv1.weight, module.layer1.0.conv2.weight,
module.layer1.1.conv1.weight, module.layer1.1.conv2.weight]
mid_pruner:
class: AutomatedGradualPruner
initial_sparsity : 0.05
final_sparsity: 0.67
weights: [module.layer2.2.conv1.weight, module.layer2.2.conv2.weight,
module.layer2.0.conv2.weight, module.layer2.0.downsample.1.weight,
module.layer2.0.conv1.weight, module.layer2.0.downsample.0.weight,
module.layer2.1.conv1.weight, module.layer2.1.conv2.weight]
high_pruner:
class: AutomatedGradualPruner
initial_sparsity : 0.05
final_sparsity: 0.76
weights: [module.layer3.0.conv1.weight, module.layer3.1.conv1.weight,
module.layer3.1.conv2.weight, module.layer3.0.conv2.weight,
module.layer3.0.downsample.0.weight, module.layer3.0.downsample.1.weight,
module.fc.weight]
lr_schedulers:
training_lr:
class: MultiStepMultiGammaLR
milestones: [300, 302, 400]
gammas: [0.1, 0.1, 0.5]
policies:
- pruner:
instance_name: low_pruner
starting_epoch: 300
ending_epoch: 400
frequency: 2
- pruner:
instance_name: mid_pruner
starting_epoch: 300
ending_epoch: 400
frequency: 2
- pruner:
instance_name: high_pruner
starting_epoch: 300
ending_epoch: 400
frequency: 2
- lr_scheduler:
instance_name: training_lr
starting_epoch: 0
ending_epoch: 400
frequency: 1
Is there any problem in my script and yaml
setting?
I've read the Q&A in #90 .And I want to train a student model(preact_resnet20_cifar)from a preact_resnet44_cifar.Here is the command line I used to train the teacher model:
python compress_classifier.py -a preact_resnet44_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0
.
The KD command line:
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 --kd-teacher preact_resnet44_cifar --kd-resume logs/2018.12.11-130318/checkpoint.pth.tar --kd-temp 5.0 --kd-dw 0.7 --kd-sw 0.3
I got the wrong message:
`==> using cifar10 dataset
=> creating preact_resnet44_cifar model for CIFAR10
=> loading checkpoint logs/2018.12.11-130318/checkpoint.pth.tar
Checkpoint keys:
epoch
arch
state_dict
best_top1
optimizer
compression_sched
quantizer_metadata
best top@1: 48.000
Loaded compression schedule from checkpoint (epoch 2)
Loaded quantizer metadata from the checkpoint
Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'
Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, , _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'
`
I don't know how could it happen.The other question is:Must the teacher model be deeper than the student model?_
I run the following command to run a baseline model for resnet56 on cifar10:
python3 compress_classifier.py
--arch resnet56_cifar ../data.cifar10 -p=50
--lr=0.4 --epochs=180
--compress=../pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml
-j=1 --deterministic
I am unable to reproduce to accuracy claimed in file resnet56_cifar_baseline_training.yaml
which says that they achieve top1 accuracy of 92.97%.
However, when i run the code, the reported accuracy is only 90.38%.
Further, I notice that the learning rate schedule used in this config file is different from the original resnet paper and also the original paper the code wants to reproduce. So i change the learning rate schedule to decrease by 0.1 in epoch 80 and 120. In total I train for 160 epochs. I achieve this by modifying the file resnet56_cifar_baseline_training.yaml
.
Even with this learning rate schedule, the final accuracy is still 92.20%.
Hello, I wanna prune yolov2's pretrained model, just wanna it to have fewer filters for each layer. But, it is not in the Torchvision'model set. Does a model have to be in Torchvision'model set if I wanna prune it? I studied your documentation for a week, and i did not find a clear way to do that. Yolov2 is first trained on ImageNet then we got Darknet19 model. And then make a little change about darknet19 network, and train it again on object detection dataset and we got yolov2. And I wanna to prune this model. I am new in Pytorch. Can I do this with Distiller? Can you give me some detailed instructions? If yes, I would like to contribute my work to the nice Distiller.
The thinning methods support only removing channels or filters of a CONV layer
# We are only interested in 4D weights (of Convolution layers)
if param.dim() != 4:
continue
[1]
How about thinning FC layers, even if you are not going to support it, can you provide, what all one should take care of if one wants to implement say remove_rows( )
or remove_columns( )
corresponding to neuron pruning ?
[2]
Its seems hard to simply extend the thinning_recipe approach as it seems to be too tied to removing CONV structures. Any suggestions ?
[3]
Also If we are thinning, pruned pytorch models, what could be the reason for its accuracy drop ?
Because we are strictly removing only zero structures, the math should be about the same and cause the same classificaiton ?
You seem to be taking into consideration a possible perofrmace drop by preparing to thin even the gradient tensors.
How to train an early exit model? Here is the command I used:
python3 compress_classifier.py --arch resnet20_cifar_earlyexit ../../../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/resnet20_cifar_baseline_training.yaml -j=1 --deterministic --earlyexit_thresholds 0.9 1.2 --earlyexit_lossweights 0.2 0.3
But Distiller shows me the following error message:
Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
==> using cifar10 dataset
=> creating resnet20_cifar_earlyexit model for CIFAR10
Logging to TensorBoard - remember to execute the server:
tensorboard --logdir='./logs'
=> using early-exit threshold values of [0.9, 1.2]
Optimizer Type: <class 'torch.optim.sgd.SGD'>
Optimizer Args: {'dampening': 0, 'weight_decay': 0.0001, 'momentum': 0.9, 'nesterov': False, 'lr': 0.3}
Files already downloaded and verified
Files already downloaded and verified
Dataset sizes:
training=45000
validation=5000
test=10000
Reading compression schedule from: ../cifar10/resnet20/resnet20_cifar_baseline_training.yaml
Training epoch: 45000 samples (256 per mini-batch)
Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.12.11-162919/2018.12.11-162919.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 477, in train
loss = earlyexit_loss(output, target, criterion, args)
File "compress_classifier.py", line 645, in earlyexit_loss
loss += (1.0 - sum_lossweights) * criterion(output[args.num_exits-1], target)
IndexError: list index out of range
resnet20_cifar_baseline_training.yaml ==>
lr_schedulers:
training_lr:
class: StepLR
step_size: 45
gamma: 0.10
policies:
- lr_scheduler:
instance_name: training_lr
starting_epoch: 45
ending_epoch: 200
frequency: 1
Structure pruning is broken for models with non-serial connections.
Models such as Alexnet and VGG are have serial data-dependencies (connections) and are fine.
More complex models, with parallel-data dependencies (paths), such as ResNets (skip connections) and GoogLeNet (Inception layers) might fail when pruning filters or channels.
This is because a module, such as torch.nn.modules.batchnorm.BatchNorm2d
layers, might depend on multiple inputs. This is not always a problem. For example, if the dependent module has type torch.nn.Conv2d
and we are pruning weight filters.
But if the dependent module has type torch.nn.modules.batchnorm.BatchNorm2d
, and we are pruning weight filters, then it is possible that each of the inputs selects different activation channels to prune. In such a case, how should we prune the BatchNorm's scale and shift tensors (.weight
and .bias
)?
To solve this we need to define one of the modules as the leader which determines what activation channels to prune; and define the rest of the modules in the dependency sub-graph as followers. Followers do not choose which activation channels to prune, so their sparsity masks is determined by the choice of the leader.
Because the sparsity maps of different follower modules may have different shapes, the leader defines a binary map which is a binary vector of active (1) and pruned (0) channels. Each "follower" expands this single binary map to create its own private pruning mask.
This requires changing the way we express filter/channel pruning in YAML, and how we create pruning masks.
I'm trying to make this fix available soon.
This is related to issues #79 and #73.
Hi Neta,
I looked into the doc and find out that in https://nervanasystems.github.io/distiller/design/index.html#quantization, it is mentioned 'We also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions.' Does this mean we can export this model with such 'quantized versions' operation?
If it is not supported in current Distiller, can you kindly suggest how I can export it?
Thanks so much.
Hi,
When I try to run prune and quantization together on resnet20_cifar in one yaml, it failed. It said Key Error on xxx_float_weight. So what's the correct procedure to mix both of them together?
Thomas
I am expecting the final model after quantization to have all integers in the range -128 to 127 , for 8 -bit symmetric linear quantization, but when I print out the model parameters I noticed that the bias are still as floats.
so I am currently setting inplace = True in this line https://github.com/NervanaSystems/distiller/blob/2bb9689fe58d196ccbccd3f2f44ac27192eb64e1/distiller/quantization/range_linear.py#L121 .
At one point we need need to quantize the bias, before writing into the model. Currently I do not see that happening.
Hey,
We've recently written a tutorial on compressing a PyTorch language model using the element-wise AGP pruner.
We're seeking community help to add an example of pruning a PyTorch seq2seq model (for example).
Thanks,
Neta
Hello there,
I am wondering about the state of the ADC implementation, and what remains to bring it to a functional state.
In the ADC merge commit message, you mentioned that it is still WiP and that it is using an unreleased version of Coach. Is that still the case?
Also, is there any documentation for how to use ADC in Distiller?
Thanks
train_with_fp_copy
true, you change the attribute of conv/fc layer. You substitute conv.weights
with conv.float_weights
and conv.weights become the buffer instead parameter. The forward pass of conv/fc layer still use conv.weights, quantized weights, which is determined by Pytorch default conv implementation. But in backward pass, the gradients calculated with respect to q_weights(quantized weights) is stored in float_weights.gard
rather than weights
due to it has no grad attribute. So you implicitly back-prop the grad with respect to quantized weights to the grad with full-precision weights using straight-through estimator namely both are equal.An error occurred:
RuntimeError: cuda runtime error (30) : unknown error at /pytorch/aten/src/THC/THCTensorRandom.cu:25
My env is ubuntu 18.04, cuda 8.0, torch-0.4.0, python 3.6. Which one is wrong?Or what's the reason?
In the document "readme" shows that:
PyTorch is included in the requirements.txt file, and will currently download PyTorch version 3.1 for CUDA 8.0. This is the setup we've used for testing Distiller.
But the requirements.txt shows that the torch version is 0.4.0.
What are the final versions? Which cuda and which torch and the others.
on_minibatch_begin
while regularization is on the end of batchon_minibatch_end
. It means that you set the regularization term zero below the threshold every batch iteration during training.zeros_mask_dict
, it may brings some messes. for example apply_mask
in on_minibatch_end
of class RegularizationPolicy would be called by regularization mask, but also pruning mask if there are both regularizer and pruner.thinning.py
, right?I use “python3 compress_classifier.py -a resnet20-cifar ../../../data.cifar10 --resume ../examples/ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize --evaluate”
but error occurs:
compress_classifier.py: error: argument --arch/-a: invalid choice: 'resnet20-cifar' (choose from 'alexnet', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'inception_v3', 'mobilenet', 'mobilenet_025', 'mobilenet_050', 'mobilenet_075', 'resnet101', 'resnet152', 'resnet18', 'resnet20_cifar', 'resnet32_cifar', 'resnet34', 'resnet44_cifar', 'resnet50', 'resnet56_cifar', 'simplenet_cifar', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn')
Is there anyone who can help me?
Ubuntu 16.04
command:
time python3 compress_classifier.py --arch resnet20_cifar ../../../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --compress=../quantization/preact_resnet20_cifar_pact.yaml -j=1 --deterministic
Error message:
--- validate (epoch=199)-----------
5000 samples (256 per mini-batch)
==> Top1: 90.300 Top5: 99.700 Loss: 0.297
==> Best Top1: 90.860 on Epoch: 187
Saving checkpoint to: logs/2018.11.29-140224/checkpoint.pth.tar
Training epoch: 45000 samples (256 per mini-batch)
Log file for this run: /media/walker/DATA/work/new_quant/distiller/examples/classifier_compression/logs/2018.11.29-140224/2018.11.29-140224.log
Traceback (most recent call last):
File "compress_classifier.py", line 789, in
main()
File "compress_classifier.py", line 391, in main
msglogger.info(distiller.masks_sparsity_tbl_summary(model, compression_scheduler))
File "/usr/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/media/walker/DATA/work/new_quant/distiller/distiller/data_loggers/collector.py", line 301, in collectors_context
yield collectors_dict
File "compress_classifier.py", line 386, in main
loggers=[tflogger, pylogger], args=args)
File "compress_classifier.py", line 495, in train
loss.backward()
File "/home/walker/.local/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/walker/.local/lib/python3.5/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
real 301m20.430s
user 204m21.640s
sys 99m42.978s
Hi, nzmora:
When I ran the command "python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01", I got the following error:
2018-10-22 17:03:03,745 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log
2018-10-22 17:03:03,745 - Number of CPUs: 24
2018-10-22 17:03:03,850 - Number of GPUs: 8
2018-10-22 17:03:03,850 - CUDA version: 8.0.61
2018-10-22 17:03:03,850 - CUDNN version: 7102
2018-10-22 17:03:03,851 - Kernel: 4.4.0-98-generic
2018-10-22 17:03:03,851 - Python: 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
2018-10-22 17:03:03,851 - PyTorch: 0.4.0
2018-10-22 17:03:03,851 - Numpy: 1.14.3
2018-10-22 17:03:03,852 - Traceback (most recent call last):
File "compress_classifier.py", line 686, in
main()
File "compress_classifier.py", line 179, in main
apputils.log_execution_env_state(sys.argv, gitroot=module_path)
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 78, in log_execution_env_state
log_git_state()
File "/home/project/compress/distiller-master/apputils/execution_env.py", line 56, in log_git_state
repo = Repo(gitroot, search_parent_directories=True)
File "/home/project/compress/distiller-master/env/lib/python3.5/site-packages/git/repo/base.py", line 168, in init
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /home/project/compress/distiller-master
2018-10-22 17:03:03,852 -
2018-10-22 17:03:03,852 - Log file for this run: /home/project/compress/distiller-master/examples/classifier_compression/logs/2018.10.22-170303/2018.10.22-170303.log
How can I solve the problem?
I tried the quant_aware_train_linear_quant.yaml on the resnet20_cifar model, the model seems to be messed up, and cannot get any reasonable prediction and also cannot train.
Is the quant_aware_train_linear_quant.yaml is only suitable for the resnet18 ? It seems not, could anyone help ? Thanks very much.
Would it be possible that there will be a distiller version for tensorflow in the future?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.