Giter Site home page Giter Site logo

dmcp's People

Contributors

zx55 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dmcp's Issues

when last epoch,sampling model. error

@Zx55
ubuntu:~/work_code/dmcp$ CUDA_VISIBLE_DEVICES=3 python main.py --mode train --data /data2/ImageNet --config config/mbv2/dmcp.yaml --flops 87
/home/wangzhaoming/work_code/dmcp/utils/tools.py:61: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2020-05-20 09:47:21,712][ main.py][line: 60][ INFO] {'training': {'epoch': 40, 'sandwich': {'sample_type': 'offset', 'max_width': 1.5, 'min_width': 0.1, 'width_offset': 0.1, 'num_sample': 4}, 'label_smooth': 0.1, 'distillation': {'enable': True, 'temperature': 1, 'loss_weight': 1, 'hard_label': False}}, 'arch': {'target_flops': '87', 'train_freq': 1, 'sample_type': ['max', 'min', 'scheduled_random', 'scheduled_random'], 'floss_type': 'log_l1', 'flop_loss_weight': 0.1, 'num_flops_stats_sample': 3000, 'num_model_sample': 5, 'start_train': 400380}, 'validation': {'width': [1.5], 'calibration': {'enable': True, 'num_batch': 5}}, 'evaluation': {'width': [1.5], 'calibration': {'enable': True, 'num_batch': 5}}, 'model': {'type': 'DMCPMobileNetV2', 'kwargs': {'num_classes': 1000, 'input_size': 224, 'width': [0.1, 1.5, 0.1], 'prob_type': 'sigmoid'}, 'runner': {'type': 'DMCPRunner'}}, 'recover': {'enable': True, 'checkpoint': '/home/wangzhaoming/work_code/dmcp/results/DMCPMobileNetV2_87_051610/checkpoints/0520_0925.pth'}, 'distributed': {'enable': False}, 'optimizer': {'momentum': 0.9, 'weight_decay': 4e-05, 'nesterov': True, 'no_wd': True}, 'lr_scheduler': {'base_lr': 0.2, 'warmup_lr': 0.5, 'warmup_steps': 1000, 'min_lr': 0.08, 'max_iter': 800760}, 'arch_lr_scheduler': {'base_lr': 0.5, 'warmup_lr': 0.5, 'min_lr': 0.1, 'max_iter': 800760, 'warmup_steps': 400380}, 'dataset': {'type': 'ImageNet', 'augmentation': {'test_resize': 256, 'color_jitter': [0.2, 0.2, 0.2, 0.1]}, 'workers': 4, 'batch_size': 64, 'num_classes': 1000, 'input_size': 224, 'path': '/data2/ImageNet'}, 'logging': {'print_freq': 50}, 'random_seed': 0, 'save_path': './results/DMCPMobileNetV2_87_052009'}
[2020-05-20 09:47:21,748][normal_runner.py][line: 157][ INFO] using label_smooth: 0.1
[2020-05-20 09:47:21,748][normal_runner.py][line: 157][ INFO] sampling model...
Traceback (most recent call last):
File "main.py", line 80, in
main()
File "main.py", line 63, in main
train(config, runner, loaders, checkpoint, tb_logger)
File "main.py", line 39, in train
runner.train(train_loader, val_loader, optimizer, lr_scheduler, tb_logger)
File "/home/wangzhaoming/work_code/dmcp/runner/dmcp_runner.py", line 63, in train
dmcp_utils.sample_model(self.config, self.model)
File "/home/wangzhaoming/work_code/dmcp/models/dmcp/utils.py", line 157, in sample_model
dist.barrier()
File "/home/wangzhaoming/work_code/dmcp/utils/distributed.py", line 117, in barrier
dist.barrier()
File "/home/wangzhaoming/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1488, in barrier
_check_default_pg()
File "/home/wangzhaoming/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 193, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized

Retrain with Distillation

Hi,

First of all, thank you so much for releasing this repository.

As stated in Table 6 of the original paper, the DMCP results marked with "*" superscript are obtained by retraining pruned models with slimmable method. It is also explained in Section 4.3 that AutoSlim utilizes in-place distillation during retraining so that the comparison between AutoSlim and DMCP (without superscript) is not fair.

In that case, could you please provide some details regarding the "retraining pruned models with slimmable method" process? It would be better if relevant code and configurations could be released.

Thanks!

os.environ['RANK'] keyError: 'RANK'

I run
python main.py --mode train --data data1/ImageNetOrigin --config config/mbv2/retrain.yaml
--flops 43 --chcfg ./results/DMCPMobileNetV2_43_MMDDHH/model_sample/expected_ch
Traceback (most recent call last):
File "main.py", line 75, in
main()
File "main.py", line 42, in main
tools.init(config)
File "/data1/task/tools/dmcp/utils/tools.py", line 28, in init
dist.init_dist(config.distributed.enable)
File "/data1/task/tools/dmcp/utils/distributed.py", line 29, in init_dist
rank = int(os.environ['RANK'])
File "/usr/local/miniconda3/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK'

Issue of training on multi-GPUs

When i set distributed_enable=False and train on multi-GPUs, error happend. But single GPU work.

Traceback (most recent call last):
File "main.py", line 71, in
main()
File "main.py", line 54, in main
train(config, runner, loaders, checkpoint, tb_logger)
File "main.py", line 30, in train
runner.train(train_loader, val_loader, optimizer, lr_scheduler, tb_logger)
File "/home/Brin1/dmcp-master/runner/dmcp_runner.py", line 46, in train
self._train_one_batch(x, y, optimizer, lr_scheduler, meters, criterions, end)
File "/home/Brin1/dmcp-master/runner/dmcp_runner.py", line 145, in _train_one_batch
criterions, end)
File "/home/Brin1/dmcp-master/runner/us_runner.py", line 201, in _train_one_batch
out = self.model(x)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 142, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 147, in replicate
return replicate(module, device_ids)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 53, in replicate
param_idx = param_indices[param]
KeyError: Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
device='cuda:0', requires_grad=True)

thanks!

Why pruned mobilenet-v2 DMCP 300M and 211M has higher parameter count?

Hello,
I could understand that the DMCP prunes channel layer by layer and hence the MAC(or FLOP) count decreases. I found that DMCP 211M and 300M pruned from Mobilenet-V2 has higher parameter count than Mobilenet-V2. Why the parameter count increases ?

Classification Models FLOPS (G) Params (M) TOP_acc@1 TOP_acc@5
Mobilenet-V2 0.858 3.48 71.87 90.294
DMCP 300M 0.600 5.3 73.48 91.10
DMCP 211M 0.420 4.2 71.60 89.95

Best Regards,
Atul

error of KeyError:Parameter containing:

First of all, Thank you for sharing your source code.
I have an error when I try to train with single gpu (i.e distributed enable = False in dmcp.yaml-mbv2) for Mobilenet v2.
Here is my config modification (I used CIFAR10 and I modify loss class in source code)
$ python main.py --mode train --data ./dataset/ --config config/mbv2/dmcp.yaml --flops 43
Error message

(base) root@452fa72bec2d:/workspace/hdd/06_model_compression/dmcp# python main.py --mode train --data ./dataset/ --config config/mbv2/dmcp.yaml --flops 43
/workspace/hdd/06_model_compression/dmcp/utils/tools.py:61: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2020-07-30 02:19:50,612][ main.py][line: 51][ INFO] {'training': {'epoch': 40, 'sandwich': {'sample_type': 'offset', 'max_width': 1.5, 'min_width': 0.1, 'width_offset': 0.1, 'num_sample': 4}, 'label_smooth': 0.1, 'distillation': {'enable': True, 'temperature': 1, 'loss_weight': 1, 'hard_label': False}}, 'arch': {'target_flops': '43', 'train_freq': 1, 'sample_type': ['max', 'min', 'scheduled_random', 'scheduled_random'], 'floss_type': 'log_l1', 'flop_loss_weight': 0.1, 'num_flops_stats_sample': 3000, 'num_model_sample': 5, 'start_train': 15640}, 'validation': {'width': [1.5], 'calibration': {'enable': True, 'num_batch': 5}}, 'evaluation': {'width': [1.5], 'calibration': {'enable': True, 'num_batch': 5}}, 'model': {'type': 'DMCPMobileNetV2', 'kwargs': {'num_classes': 10, 'input_size': 32, 'width': [0.1, 1.5, 0.1], 'prob_type': 'sigmoid'}, 'runner': {'type': 'DMCPRunner'}}, 'recover': {'enable': False, 'checkpoint': 'None'}, 'distributed': {'enable': False}, 'optimizer': {'momentum': 0.9, 'weight_decay': 4e-05, 'nesterov': True, 'no_wd': True}, 'lr_scheduler': {'base_lr': 0.2, 'warmup_lr': 0.5, 'warmup_steps': 1000, 'min_lr': 0.08, 'max_iter': 31280}, 'arch_lr_scheduler': {'base_lr': 0.5, 'warmup_lr': 0.5, 'min_lr': 0.1, 'max_iter': 31280, 'warmup_steps': 15640}, 'dataset': {'type': 'CIFAR10', 'augmentation': {'test_resize': 32, 'color_jitter': [0.2, 0.2, 0.2, 0.1]}, 'workers': 4, 'batch_size': 64, 'num_classes': 10, 'input_size': 32, 'path': './dataset/'}, 'logging': {'print_freq': 50}, 'random_seed': 0, 'save_path': './results/DMCPMobileNetV2_43_073002'}
[2020-07-30 02:19:50,613][normal_runner.py][line: 159][ INFO] using label_smooth: 0.1
Traceback (most recent call last):
File "main.py", line 75, in
main()
File "main.py", line 54, in main
train(config, runner, loaders, checkpoint, tb_logger)
File "main.py", line 30, in train
runner.train(train_loader, val_loader, optimizer, lr_scheduler, tb_logger)
File "/workspace/hdd/06_model_compression/dmcp/runner/dmcp_runner.py", line 46, in train
self._train_one_batch(x, y, optimizer, lr_scheduler, meters, criterions, end)
File "/workspace/hdd/06_model_compression/dmcp/runner/dmcp_runner.py", line 145, in _train_one_batch
criterions, end)
File "/workspace/hdd/06_model_compression/dmcp/runner/us_runner.py", line 201, in _train_one_batch
out = self.model(x)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 156, in replicate
return replicate(module, device_ids, not torch.is_grad_enabled())
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 162, in replicate
param_idx = param_indices[param]
KeyError: Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
device='cuda:0', requires_grad=True)

dmcp.yaml - mbv2
training:
epoch: 40
sandwich:
sample_type: offset
max_width: &max_width 1.5
min_width: &min_width 0.1
width_offset: &width_offset 0.1
num_sample: 4
label_smooth: 0.1
distillation:
enable: true
temperature: 1
loss_weight: 1
hard_label: False

arch:
target_flops: None
train_freq: 1
sample_type: [max, min, scheduled_random, scheduled_random]
floss_type: log_l1
flop_loss_weight: 0.1
num_flops_stats_sample: 3000
num_model_sample: 5

validation:
width: [*max_width]
calibration:
enable: True
num_batch: 5

evaluation:
width: [*max_width]
calibration:
enable: True
num_batch: 5

model:
type: DMCPMobileNetV2
kwargs:
num_classes: &num_classes 10
input_size: &input_size 32
width: [*min_width, *max_width, *width_offset]
prob_type: sigmoid

runner:
    type: DMCPRunner

recover:
enable: False
checkpoint: None

distributed:
enable: False

optimizer:
momentum: 0.9
weight_decay: 0.00004
nesterov: True
no_wd: True

lr_scheduler:
base_lr: 0.2
warmup_lr: 0.5
warmup_steps: 1000
min_lr: 0.08

arch_lr_scheduler:
base_lr: 0.5
warmup_lr: 0.5
min_lr: 0.1

dataset:
type: CIFAR10
augmentation:
test_resize: 32
color_jitter: [0.2, 0.2, 0.2, 0.1]
workers: 4
batch_size: 64
num_classes: *num_classes
input_size: *input_size

logging:
print_freq: 50

random_seed: 0
save_path: ./results

Error about loss during training

Why is the loss value in max_width much greater than in min_width and random_width during the training process? For example, in max_width, the loss is 59, while the other two have a loss of only about 0.001

error

(torch1.4) wangzhaoming@ubuntu:~/work_code/dmcp$ python main.py --mode train --data /data2/ImageNet --config config/mbv2/dmcp.yaml --flops 87
/home/wangzhaoming/work_code/dmcp/utils/tools.py:62: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Traceback (most recent call last):
File "main.py", line 71, in
main()
File "main.py", line 42, in main
tools.init(config)
File "/home/wangzhaoming/work_code/dmcp/utils/tools.py", line 27, in init
dist.init_dist(config.distributed.enable)
File "/home/wangzhaoming/work_code/dmcp/utils/distributed.py", line 30, in init_dist
rank = int(os.environ['RANK'])
File "/home/wangzhaoming/anaconda3/envs/torch1.4/lib/python3.7/os.py", line 681, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
@Zx55

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.