zx55 / dmcp Goto Github PK
View Code? Open in Web Editor NEWLicense: Creative Commons Attribution 4.0 International
License: Creative Commons Attribution 4.0 International
@Zx55
ubuntu:~/work_code/dmcp$ CUDA_VISIBLE_DEVICES=3 python main.py --mode train --data /data2/ImageNet --config config/mbv2/dmcp.yaml --flops 87
/home/wangzhaoming/work_code/dmcp/utils/tools.py:61: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2020-05-20 09:47:21,712][ main.py][line: 60][ INFO] {'training': {'epoch': 40, 'sandwich': {'sample_type': 'offset', 'max_width': 1.5, 'min_width': 0.1, 'width_offset': 0.1, 'num_sample': 4}, 'label_smooth': 0.1, 'distillation': {'enable': True, 'temperature': 1, 'loss_weight': 1, 'hard_label': False}}, 'arch': {'target_flops': '87', 'train_freq': 1, 'sample_type': ['max', 'min', 'scheduled_random', 'scheduled_random'], 'floss_type': 'log_l1', 'flop_loss_weight': 0.1, 'num_flops_stats_sample': 3000, 'num_model_sample': 5, 'start_train': 400380}, 'validation': {'width': [1.5], 'calibration': {'enable': True, 'num_batch': 5}}, 'evaluation': {'width': [1.5], 'calibration': {'enable': True, 'num_batch': 5}}, 'model': {'type': 'DMCPMobileNetV2', 'kwargs': {'num_classes': 1000, 'input_size': 224, 'width': [0.1, 1.5, 0.1], 'prob_type': 'sigmoid'}, 'runner': {'type': 'DMCPRunner'}}, 'recover': {'enable': True, 'checkpoint': '/home/wangzhaoming/work_code/dmcp/results/DMCPMobileNetV2_87_051610/checkpoints/0520_0925.pth'}, 'distributed': {'enable': False}, 'optimizer': {'momentum': 0.9, 'weight_decay': 4e-05, 'nesterov': True, 'no_wd': True}, 'lr_scheduler': {'base_lr': 0.2, 'warmup_lr': 0.5, 'warmup_steps': 1000, 'min_lr': 0.08, 'max_iter': 800760}, 'arch_lr_scheduler': {'base_lr': 0.5, 'warmup_lr': 0.5, 'min_lr': 0.1, 'max_iter': 800760, 'warmup_steps': 400380}, 'dataset': {'type': 'ImageNet', 'augmentation': {'test_resize': 256, 'color_jitter': [0.2, 0.2, 0.2, 0.1]}, 'workers': 4, 'batch_size': 64, 'num_classes': 1000, 'input_size': 224, 'path': '/data2/ImageNet'}, 'logging': {'print_freq': 50}, 'random_seed': 0, 'save_path': './results/DMCPMobileNetV2_87_052009'}
[2020-05-20 09:47:21,748][normal_runner.py][line: 157][ INFO] using label_smooth: 0.1
[2020-05-20 09:47:21,748][normal_runner.py][line: 157][ INFO] sampling model...
Traceback (most recent call last):
File "main.py", line 80, in
main()
File "main.py", line 63, in main
train(config, runner, loaders, checkpoint, tb_logger)
File "main.py", line 39, in train
runner.train(train_loader, val_loader, optimizer, lr_scheduler, tb_logger)
File "/home/wangzhaoming/work_code/dmcp/runner/dmcp_runner.py", line 63, in train
dmcp_utils.sample_model(self.config, self.model)
File "/home/wangzhaoming/work_code/dmcp/models/dmcp/utils.py", line 157, in sample_model
dist.barrier()
File "/home/wangzhaoming/work_code/dmcp/utils/distributed.py", line 117, in barrier
dist.barrier()
File "/home/wangzhaoming/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1488, in barrier
_check_default_pg()
File "/home/wangzhaoming/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 193, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
Hi,
First of all, thank you so much for releasing this repository.
As stated in Table 6 of the original paper, the DMCP results marked with "*" superscript are obtained by retraining pruned models with slimmable method. It is also explained in Section 4.3 that AutoSlim utilizes in-place distillation during retraining so that the comparison between AutoSlim and DMCP (without superscript) is not fair.
In that case, could you please provide some details regarding the "retraining pruned models with slimmable method" process? It would be better if relevant code and configurations could be released.
Thanks!
As titled.
Just a warm reminder.
Hi! Great work!
Could you show the comparison of parameters? e.g. ResNet-50
I run
python main.py --mode train --data data1/ImageNetOrigin --config config/mbv2/retrain.yaml
--flops 43 --chcfg ./results/DMCPMobileNetV2_43_MMDDHH/model_sample/expected_ch
Traceback (most recent call last):
File "main.py", line 75, in
main()
File "main.py", line 42, in main
tools.init(config)
File "/data1/task/tools/dmcp/utils/tools.py", line 28, in init
dist.init_dist(config.distributed.enable)
File "/data1/task/tools/dmcp/utils/distributed.py", line 29, in init_dist
rank = int(os.environ['RANK'])
File "/usr/local/miniconda3/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
Traceback (most recent call last):
File "main.py", line 71, in
main()
File "main.py", line 54, in main
train(config, runner, loaders, checkpoint, tb_logger)
File "main.py", line 30, in train
runner.train(train_loader, val_loader, optimizer, lr_scheduler, tb_logger)
File "/home/Brin1/dmcp-master/runner/dmcp_runner.py", line 46, in train
self._train_one_batch(x, y, optimizer, lr_scheduler, meters, criterions, end)
File "/home/Brin1/dmcp-master/runner/dmcp_runner.py", line 145, in _train_one_batch
criterions, end)
File "/home/Brin1/dmcp-master/runner/us_runner.py", line 201, in _train_one_batch
out = self.model(x)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 142, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 147, in replicate
return replicate(module, device_ids)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 53, in replicate
param_idx = param_indices[param]
KeyError: Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
device='cuda:0', requires_grad=True)
thanks!
How to prune 4 channels or 8 channels at a time so that the number of reserved channels is a multiple of 4? How can I set up?
augmentation:
test_resize: 256
Thanks for your great work!
Could you please provide the traning cost of DMCP training, Pruned Model Sampling, fine-tuning?
How many hours needed for these steps?
Thanks!
Hello,
I could understand that the DMCP prunes channel layer by layer and hence the MAC(or FLOP) count decreases. I found that DMCP 211M and 300M pruned from Mobilenet-V2 has higher parameter count than Mobilenet-V2. Why the parameter count increases ?
Classification Models | FLOPS (G) | Params (M) | TOP_acc@1 | TOP_acc@5 |
---|---|---|---|---|
Mobilenet-V2 | 0.858 | 3.48 | 71.87 | 90.294 |
DMCP 300M | 0.600 | 5.3 | 73.48 | 91.10 |
DMCP 211M | 0.420 | 4.2 | 71.60 | 89.95 |
Best Regards,
Atul
dmcp.yaml - mbv2
training:
epoch: 40
sandwich:
sample_type: offset
max_width: &max_width 1.5
min_width: &min_width 0.1
width_offset: &width_offset 0.1
num_sample: 4
label_smooth: 0.1
distillation:
enable: true
temperature: 1
loss_weight: 1
hard_label: False
arch:
target_flops: None
train_freq: 1
sample_type: [max, min, scheduled_random, scheduled_random]
floss_type: log_l1
flop_loss_weight: 0.1
num_flops_stats_sample: 3000
num_model_sample: 5
validation:
width: [*max_width]
calibration:
enable: True
num_batch: 5
evaluation:
width: [*max_width]
calibration:
enable: True
num_batch: 5
model:
type: DMCPMobileNetV2
kwargs:
num_classes: &num_classes 10
input_size: &input_size 32
width: [*min_width, *max_width, *width_offset]
prob_type: sigmoid
runner:
type: DMCPRunner
recover:
enable: False
checkpoint: None
distributed:
enable: False
optimizer:
momentum: 0.9
weight_decay: 0.00004
nesterov: True
no_wd: True
lr_scheduler:
base_lr: 0.2
warmup_lr: 0.5
warmup_steps: 1000
min_lr: 0.08
arch_lr_scheduler:
base_lr: 0.5
warmup_lr: 0.5
min_lr: 0.1
dataset:
type: CIFAR10
augmentation:
test_resize: 32
color_jitter: [0.2, 0.2, 0.2, 0.1]
workers: 4
batch_size: 64
num_classes: *num_classes
input_size: *input_size
logging:
print_freq: 50
Why is the loss value in max_width much greater than in min_width and random_width during the training process? For example, in max_width, the loss is 59, while the other two have a loss of only about 0.001
(torch1.4) wangzhaoming@ubuntu:~/work_code/dmcp$ python main.py --mode train --data /data2/ImageNet --config config/mbv2/dmcp.yaml --flops 87
/home/wangzhaoming/work_code/dmcp/utils/tools.py:62: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Traceback (most recent call last):
File "main.py", line 71, in
main()
File "main.py", line 42, in main
tools.init(config)
File "/home/wangzhaoming/work_code/dmcp/utils/tools.py", line 27, in init
dist.init_dist(config.distributed.enable)
File "/home/wangzhaoming/work_code/dmcp/utils/distributed.py", line 30, in init_dist
rank = int(os.environ['RANK'])
File "/home/wangzhaoming/anaconda3/envs/torch1.4/lib/python3.7/os.py", line 681, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
@Zx55
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.