Giter Site home page Giter Site logo

mit-han-lab / once-for-all Goto Github PK

View Code? Open in Web Editor NEW
1.8K 53.0 332.0 6.99 MB

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

Home Page: https://ofa.mit.edu/

License: MIT License

Python 78.01% Shell 0.02% Jupyter Notebook 21.97%
tinyml edge-ai efficient-model acceleration nas automl

once-for-all's Introduction

Once-for-All: Train One Network and Specialize it for Efficient Deployment [arXiv] [Slides] [Video]

@inproceedings{
  cai2020once,
  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://arxiv.org/pdf/1908.09791.pdf}
}

[News] Once-for-All is available at PyTorch Hub now!

[News] Once-for-All (OFA) Network is adopted by SONY Neural Architecture Search Library.

[News] Once-for-All (OFA) Network is adopted by ADI MAX78000/MAX78002 Model Training and Synthesis Tool.

[News] Once-for-All (OFA) Network is adopted by Alibaba and ranked 1st in the open division of the MLPerf Inference Benchmark (Datacenter and Edge).

[News] First place in the CVPR 2020 Low-Power Computer Vision Challenge, CPU detection and FPGA track.

[News] OFA-ResNet50 is released.

[News] The hands-on tutorial of OFA is released!

[News] OFA is available via pip! Run pip install ofa to install the whole OFA codebase.

[News] First place in the 4th Low-Power Computer Vision Challenge, both classification and detection track.

[News] First place in the 3rd Low-Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.

Train once, specialize for many deployment scenarios

80% top1 ImageNet accuracy under mobile setting

Consistently outperforms MobileNetV3 on Diverse hardware platforms

OFA-ResNet50 [How to use]

How to use / evaluate OFA Networks

Use

""" OFA Networks.
    Example: ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
""" 
from ofa.model_zoo import ofa_net
ofa_network = ofa_net(net_id, pretrained=True)
    
# Randomly sample sub-networks from OFA network
ofa_network.sample_active_subnet()
random_subnet = ofa_network.get_active_subnet(preserve_weight=True)
    
# Manually set the sub-network
ofa_network.set_active_subnet(ks=7, e=6, d=4)
manual_subnet = ofa_network.get_active_subnet(preserve_weight=True)

Evaluate

python eval_ofa_net.py --path 'Your path to imagenet' --net ofa_mbv3_d234_e346_k357_w1.0

OFA Network Design Space Resolution Width Multiplier Depth Expand Ratio kernel Size
ofa_resnet50 ResNet50D 128 - 224 0.65, 0.8, 1.0 0, 1, 2 0.2, 0.25, 0.35 3
ofa_mbv3_d234_e346_k357_w1.0 MobileNetV3 128 - 224 1.0 2, 3, 4 3, 4, 6 3, 5, 7
ofa_mbv3_d234_e346_k357_w1.2 MobileNetV3 160 - 224 1.2 2, 3, 4 3, 4, 6 3, 5, 7
ofa_proxyless_d234_e346_k357_w1.3 ProxylessNAS 128 - 224 1.3 2, 3, 4 3, 4, 6 3, 5, 7

How to use / evaluate OFA Specialized Networks

Use

""" OFA Specialized Networks.
Example: net, image_size = ofa_specialized('flops@[email protected]_finetune@75', pretrained=True)
""" 
from ofa.model_zoo import ofa_specialized
net, image_size = ofa_specialized(net_id, pretrained=True)

Evaluate

python eval_specialized_net.py --path 'Your path to imagent' --net flops@[email protected]_finetune@75

Model Name Details Top-1 (%) Top-5 (%) #Params #MACs
ResNet50 Design Space
ofa-resnet50D-41 [email protected][email protected] 79.8 94.7 30.9M 4.1B
ofa-resnet50D-37 [email protected][email protected] 79.7 94.7 26.5M 3.7B
ofa-resnet50D-30 [email protected][email protected] 79.3 94.5 28.7M 3.0B
ofa-resnet50D-24 [email protected][email protected] 79.0 94.2 29.0M 2.4B
ofa-resnet50D-18 [email protected][email protected] 78.3 94.0 20.7M 1.8B
ofa-resnet50D-12 [email protected][email protected]_finetune@25 77.1 93.3 19.3M 1.2B
ofa-resnet50D-09 [email protected][email protected]_finetune@25 76.3 92.9 14.5M 0.9B
ofa-resnet50D-06 [email protected][email protected]_finetune@25 75.0 92.1 9.6M 0.6B
FLOPs
ofa-595M flops@[email protected]_finetune@75 80.0 94.9 9.1M 595M
ofa-482M flops@[email protected]_finetune@75 79.6 94.8 9.1M 482M
ofa-389M flops@[email protected]_finetune@75 79.1 94.5 8.4M 389M
LG G8
ofa-lg-24 LG-G8_lat@[email protected]_finetune@25 76.4 93.0 5.8M 230M
ofa-lg-16 LG-G8_lat@[email protected]_finetune@25 74.7 92.0 5.8M 151M
ofa-lg-11 LG-G8_lat@[email protected]_finetune@25 73.0 91.1 5.0M 103M
ofa-lg-8 LG-G8_lat@[email protected]_finetune@25 71.1 89.7 4.1M 74M
Samsung S7 Edge
ofa-s7edge-88 s7edge_lat@[email protected]_finetune@25 76.3 92.9 6.4M 219M
ofa-s7edge-58 s7edge_lat@[email protected]_finetune@25 74.7 92.0 4.6M 145M
ofa-s7edge-41 s7edge_lat@[email protected]_finetune@25 73.1 91.0 4.7M 96M
ofa-s7edge-29 s7edge_lat@[email protected]_finetune@25 70.5 89.5 3.8M 66M
Samsung Note8
ofa-note8-65 note8_lat@[email protected]_finetune@25 76.1 92.7 5.3M 220M
ofa-note8-49 note8_lat@[email protected]_finetune@25 74.9 92.1 6.0M 164M
ofa-note8-31 note8_lat@[email protected]_finetune@25 72.8 90.8 4.6M 101M
ofa-note8-22 note8_lat@[email protected]_finetune@25 70.4 89.3 4.3M 67M
Samsung Note10
ofa-note10-64 note10_lat@[email protected]_finetune@75 80.2 95.1 9.1M 743M
ofa-note10-50 note10_lat@[email protected]_finetune@75 79.7 94.9 9.1M 554M
ofa-note10-41 note10_lat@[email protected]_finetune@75 79.3 94.5 9.0M 457M
ofa-note10-30 note10_lat@[email protected]_finetune@75 78.4 94.2 7.5M 339M
ofa-note10-22 note10_lat@[email protected]_finetune@25 76.6 93.1 5.9M 237M
ofa-note10-16 note10_lat@[email protected]_finetune@25 75.5 92.3 4.9M 163M
ofa-note10-11 note10_lat@[email protected]_finetune@25 73.6 91.2 4.3M 110M
ofa-note10-08 note10_lat@[email protected]_finetune@25 71.4 89.8 3.8M 79M
Google Pixel1
ofa-pixel1-143 pixel1_lat@[email protected]_finetune@75 80.1 95.0 9.2M 642M
ofa-pixel1-132 pixel1_lat@[email protected]_finetune@75 79.8 94.9 9.2M 593M
ofa-pixel1-79 pixel1_lat@[email protected]_finetune@75 78.7 94.2 8.2M 356M
ofa-pixel1-58 pixel1_lat@[email protected]_finetune@75 76.9 93.3 5.8M 230M
ofa-pixel1-40 pixel1_lat@[email protected]_finetune@25 74.9 92.1 6.0M 162M
ofa-pixel1-28 pixel1_lat@[email protected]_finetune@25 73.3 91.0 5.2M 109M
ofa-pixel1-20 pixel1_lat@[email protected]_finetune@25 71.4 89.8 4.3M 77M
Google Pixel2
ofa-pixel2-62 pixel2_lat@[email protected]_finetune@25 75.8 92.7 5.8M 208M
ofa-pixel2-50 pixel2_lat@[email protected]_finetune@25 74.7 91.9 4.7M 166M
ofa-pixel2-35 pixel2_lat@[email protected]_finetune@25 73.4 91.1 5.1M 113M
ofa-pixel2-25 pixel2_lat@[email protected]_finetune@25 71.5 90.1 4.1M 79M
1080ti GPU (Batch Size 64)
ofa-1080ti-27 1080ti_gpu64@[email protected]_finetune@25 76.4 93.0 6.5M 397M
ofa-1080ti-22 1080ti_gpu64@[email protected]_finetune@25 75.3 92.4 5.2M 313M
ofa-1080ti-15 1080ti_gpu64@[email protected]_finetune@25 73.8 91.3 6.0M 226M
ofa-1080ti-12 1080ti_gpu64@[email protected]_finetune@25 72.6 90.9 5.9M 165M
V100 GPU (Batch Size 64)
ofa-v100-11 v100_gpu64@[email protected]_finetune@25 76.1 92.7 6.2M 352M
ofa-v100-09 v100_gpu64@[email protected]_finetune@25 75.3 92.4 5.2M 313M
ofa-v100-06 v100_gpu64@[email protected]_finetune@25 73.0 91.1 4.9M 179M
ofa-v100-05 v100_gpu64@[email protected]_finetune@25 71.6 90.3 5.2M 141M
Jetson TX2 GPU (Batch Size 16)
ofa-tx2-96 tx2_gpu16@[email protected]_finetune@25 75.8 92.7 6.2M 349M
ofa-tx2-80 tx2_gpu16@[email protected]_finetune@25 75.4 92.4 5.2M 313M
ofa-tx2-47 tx2_gpu16@[email protected]_finetune@25 72.9 91.1 4.9M 179M
ofa-tx2-35 tx2_gpu16@[email protected]_finetune@25 70.3 89.4 4.3M 121M
Intel Xeon CPU with MKL-DNN (Batch Size 1)
ofa-cpu-17 cpu_lat@[email protected]_finetune@25 75.7 92.6 4.9M 365M
ofa-cpu-15 cpu_lat@[email protected]_finetune@25 74.6 92.0 4.9M 301M
ofa-cpu-11 cpu_lat@[email protected]_finetune@25 72.0 90.4 4.4M 160M
ofa-cpu-10 cpu_lat@[email protected]_finetune@25 71.1 89.9 4.2M 143M

How to train OFA Networks

mpirun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    python train_ofa_net.py

or

horovodrun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    python train_ofa_net.py

Introduction Video

Watch the video

Hands-on Tutorial Video

Watch the video

Requirement

  • Python 3.6+
  • Pytorch 1.4.0+
  • ImageNet Dataset
  • Horovod

Related work on automated and efficient deep learning:

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)

once-for-all's People

Contributors

han-cai avatar jpablomch avatar kentang-mit avatar lmxyy avatar lyken17 avatar mzahran001 avatar songhan avatar synxlin avatar usedtobe97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

once-for-all's Issues

In set_running_statistics, CPU is used by default to forward images

forward_model is created by deep copying incoming model
However, it's not deployed in any gpu devices.
It's time-consuming to calculate mean and variance by forwarding batch of images using cpu.
I think it's better to assign default device and deploy the copied one on it.

Incorrect accuracy while testing the pretrained ofa network

Thanks for sharing your code.

I have some problems when I test the ofa pretrained network.

I build a ofa network using the code provided in the README.

from model_zoo import ofa_net
ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
    
ofa_network.set_active_subnet(ks=7, e=6, d=4)
subnet = ofa_network.get_active_subnet(preserve_weight=True)

# test the subset on the validation set of the ImageNet

When I set the parameter ks, e and d with different value, the accuracy of the ofa network becomes about 0 in some cases. I show the test results in the following:
image

Have I made some mistake while testing the ofa pretrained network?

Question regarding implementation detail - re_organize_middle_weights

In channel selection for width control, the function re_organize_middle_weights in dynamic_layers. In line 144, the following operation is applied - importance[target_width:] = torch.arange(0, target_width - importance.size(0), -1).
I don't really understand this line. If importance is assumed to be sorted then it does nothing to the order of importance. If it is not - then important channels can effectively be discarded.
What am I missing?

What should I do after train_ofa_net

I run train_ofa_net.py and there is three folders under 'exp/': 'kernel2kernel_depth', 'kernel_depth2kernel_depth_width', 'normal2kernel'. Then, what should I do next? There are 'checkpoint logs net.config net_info.txt run.config' under each exp subfolder after training. Anybody knows how should I deal with it?

I can not find any relations between the training exp results and 'eval_ofa_net.py'. Please help this poor kid. \doge

Question about training for model `ofa_D4_E6_K7`

Hello
I want to train OFA model(ofa_mbv3) on 'Cifar100' or custom datasets.

so I want to get some training details about first supernet.

When I checked model in progressive-shrinking phase,
I saw F.Linear(Kernel transformer) layer's weights were also trained.

When I want to train First Supernet (ofa_D4_E6_K7), should I train there Kernel transform matrix?

And I was wondering If you had some information about OFA net training on other dataset(like cifar 10, 100), I want to know them.

Thank you all the time.

Questions about training supernet

Hi,

Thanks for your time regarding to this issue.

I have some questions about OFA supernet training phase.

  1. Will performance of supernet always surpass the performance of original model?

  2. How should we modify the hyper parameter setting from original model task (LR, optimizer type)?

  3. Is the performance of supernet the ceil of performances of subnets?

Thanks for your help and happy Chinese New Year!

Question about GPU Memory for OFA Progressive Shrinking.

Hi Han Cai, Thank you so much for responding about previous question very quickly.

I'm trying to train OFA Model(ofa_mbv3) using 4 Nvidia Titan V & 2 Titan RTX GPUs.

But There's a problems when validating subnet models.

I checked below code about progressive shrinking validate.

for setting, name in subnet_settings:
    run_manager.write_log('-' * 30 + ' Validate %s ' % name + '-' * 30, 'train', should_print=False)
    run_manager.run_config.data_provider.assign_active_img_size(setting.pop('image_size'))
    dynamic_net.set_active_subnet(**setting)
    run_manager.write_log(dynamic_net.module_str, 'train', should_print=False)

    run_manager.reset_running_statistics(dynamic_net)
    loss, top1, top5 = run_manager.validate(epoch=epoch, is_test=is_test, run_str=name, net=dynamic_net)
    losses_of_subnets.append(loss)
    top1_of_subnets.append(top1)
    top5_of_subnets.append(top5)
    valid_log += '%s (%.3f), ' % (name, top1)

Validating 1st loop (about 1st subnet) is no problem.
But when I try to validate 2nd subnet, Error("CUDA out of memory") happened.

My GPUs have 12GB(Titan V) , 24GB(Titan RTX) Memories each.

How big is your GPU memory?
Also, please let me know if there is any guessing or recommendation to solve this error.

Thank you so much.

Details about finetuning (25 / 75 epochs)

Thanks for sharing your code for this excellent work!

Could you reveal more details about how you finetune your specialized sub-network? I didn't find the code in the repo, but hyper-parameters like batch size, optimizer, learning rate, lr decay and weight decay will be also very helpful.

Thanks again.

Question about progressive shrinking

Greetings
There is a function re_organize_middle_weights which resort the convolution weight. However, the sequence of x remain the same after this operation.
Thus, the weight is misordering to input x. Mismatch of weight and input will cause output changes. Is this a big problem?

How to measure the latency correctly?

Hi, Thanks for your great work!
When I was testing the latency on V100, the results confused me.
I used the following code to measure the latency table.
torch.cuda.empty_cache() img_L = img_L.cuda() start.record() out = ofa_network(img_L) end.record() torch.cuda.synchronize() run_time.update(start.elapsed_time(end))
The img_L is one image.
Is this correct?

Some questions about accuracy predictor

Hi, I'm very interested in your works.

I want to use accuracy predictor about some other config. (like resnet based OFA ... and some)

I saw some tutorial codes about acc_predictor you uploaded, so I could understand how it look likes.
And I saw your paper's appendix about detail of accuracy predictor.

I have a question about how much train data for training accuracy predictor.

And when you were training acc_predictor , there are ground truths you measured using whole imagenet valid set.

How many ground truth are needed?

Also, I want to know hyper parameters about accuracy predictor's training.

I hope you answer my questions, Thank you

channel sorting for elastic width

Hi, thx for your work.
In the paper, for supporting elastic width, a channel sorting algorithm based on the norm of each channel was introduced. However, i can't find this part in the codes. Could anyone tell me about its location?

Use for our Custom Dataset

Hi,

Thanks for the Amazing work.

I want to train the OFA network on our custom Dataset. How to do it for the same?
Looking forward to your reply.

Thanks,
Darshan

When I use ofa_resnet50 to Efficient Deployment in tutorial/ofa.ipynb, I met some errors.

  1. first, I searched a network

Searching with note10 constraint (25): 100%|██████████| 500/500 [00:09<00:00, 51.03it/s]Found best architecture on note10 with latency <= 25.00 ms in 9.84 seconds! It achieves 81.71% predicted accuracy with 24.73 ms latency on note10.
Architecture of the searched sub-net:
DyConv(O32, K3, S2)
(DyConv(O32, K3, S1), Identity)
DyConv(O64, K3, S1)
max_pooling(ks=3, stride=2)
(3x3_BottleneckConv_in->768->256_S1, avgpool_conv)
(3x3_BottleneckConv_in->768->256_S1, Identity)
(3x3_BottleneckConv_in->1536->256_S1, Identity)
(3x3_BottleneckConv_in->768->256_S1, Identity)
(3x3_BottleneckConv_in->2048->512_S2, avgpool_conv)
(3x3_BottleneckConv_in->2048->512_S1, Identity)
(3x3_BottleneckConv_in->3072->512_S1, Identity)
(3x3_BottleneckConv_in->2048->512_S1, Identity)
(3x3_BottleneckConv_in->6144->1024_S2, avgpool_conv)
(3x3_BottleneckConv_in->3072->1024_S1, Identity)
(3x3_BottleneckConv_in->4096->1024_S1, Identity)
(3x3_BottleneckConv_in->6144->1024_S1, Identity)
(3x3_BottleneckConv_in->4096->1024_S1, Identity)
(3x3_BottleneckConv_in->4096->1024_S1, Identity)
(3x3_BottleneckConv_in->8192->2048_S2, avgpool_conv)
(3x3_BottleneckConv_in->6144->2048_S1, Identity)
(3x3_BottleneckConv_in->12288->2048_S1, Identity)
(3x3_BottleneckConv_in->12288->2048_S1, Identity)
MyGlobalAvgPool2d(keep_dim=False)
DyLinear(2048, 1000)

But, I think The middle dimension of the network searched is a bit untrustworthy

  1. When I wanted to evaluate this sub-model, I met this error

Evaluating the sub-network with latency = 24.7 ms on note10
RuntimeError Traceback (most recent call last)
in
6 , net_config, latency = result
7 print('Evaluating the sub-network with latency = %.1f ms on %s' % (latency, target_hardware))
----> 8 top1 = evaluate_ofa_subnet(
9 ofa_network,
10 imagenet_data_path,
~/桌面/once-for-all-master/ofa/tutorial/imagenet_eval_helper.py in evaluate_ofa_subnet(ofa_net, path, net_config, data_loader, batch_size, device)
18 assert len(net_config['ks']) == 20 and len(net_config['e']) == 20 and len(net_config['d']) == 5
19 ofa_net.set_active_subnet(ks=net_config['ks'], d=net_config['d'], e=net_config['e'])
---> 20 subnet = ofa_net.get_active_subnet().to(device)
21 calib_bn(subnet, path, net_config['r'][0], batch_size)
22 top1 = validate(subnet, path, net_config['r'][0], data_loader, batch_size, device)
~/桌面/once-for-all-master/ofa/imagenet_classification/elastic_nn/networks/ofa_resnets.py in get_active_subnet(self, preserve_weight)
226 active_idx = block_idx[:len(block_idx) - depth_param]
227 for idx in active_idx:
--> 228 blocks.append(self.blocks[idx].get_active_subnet(input_channel, preserve_weight))
229 input_channel = self.blocks[idx].active_out_channel
230 classifier = self.classifier.get_active_subnet(input_channel, preserve_weight)
~/桌面/once-for-all-master/ofa/imagenet_classification/elastic_nn/modules/dynamic_layers.py in get_active_subnet(self, in_channel, preserve_weight)
540
541 # copy weight from current layer
--> 542 sub_layer.conv1.conv.weight.data.copy
(
543 self.conv1.conv.get_active_filter(self.active_middle_channels, in_channel).data)
544 copy_bn(sub_layer.conv1.bn, self.conv1.bn.bn)

RuntimeError: The size of tensor a (768) must match the size of tensor b (88) at non-singleton dimension 0

I guess that Do I need to modify the code for resnet50 network. Please tell me how to modify . Thanks a lot

How many subnets does knowledge distillation optimize?

I have a question that is not cleared in the paper. During knowledge distillation, do you optimize for all 10^19 networks? The elastic - nn portion of the code seems to point to that:

	subnet_settings = []
	for d in depth_list:
		for e in expand_ratio_list:
			for k in ks_list:
				for w in width_mult_list:
					for img_size in image_size_list:
						subnet_settings.append([{
							'image_size': img_size,
							'd': d,
							'e': e,
							'ks': k,
							'w': w,
						}, 'R%s-D%s-E%s-K%s-W%s' % (img_size, d, e, k, w)])

Tutorial for deploying with FPGA

Hi,

Congratulations on this great job. I was amazed by your solution in CVPR2020 competition.

Is there any tutorial to use this work on a FPGA ZynqUltrascale ZU3EG or ZU9EG?

Best regards,

Jorge

What is the role of 'reset_running_statistics' ?

On line 67 of progressive_shrinking.py, why do we need the 'reset_running_statistics' function to reset both the 'mean' and 'var' value of the batchnormal layer to the 'mean' and 'var' obtained from random 2000 images?
run_manager.reset_running_statistics(dynamic_net)

Evolution details

hi, thanks for your excellent work

How did the network architecture be encoded and decoded during the evolution?

After reading the description of the acc predictor in the paper, it seems that the kernel size and expansion of each layer are first ecoded. If a architecture is [3,4, ....., 0,0 ..... 3,6], another architecture is [3,4, ....., 7,4 ..... 3,6], there are two question in evolution:

  1. What if [0,0] and [7,4] crossover [0,4]? This is not a normal gene.

  2. If one stage is [1,1,0,0], the last two are skipped. If mutation is [1,1,0,1] during the evolution process, which the last layer is not skipped, but the third layer is skipped. (which is not in line with the rules.)

About training of once-for-all network

Hi, thanks for your great work!
I am interested in training the once-for-all network but I met some problems when diving into your training code.
Line 198 in train_ofa_net.py loads a teacher model weights. Is this a trained teacher model and the training code only performs the progressive shrinking?
Besides, the arguments arg.task and arg.phase seem never changing during the training. Am I right? If it is, so I need to train multiple times with different arguments?
Thanks.

Question about the calculation of importance(L1 Norm)

Thank you for your great job.

I have a question about the calculation of importance.
Here in Once for all, the importance is calculated by the input dimension.

importance = torch.sum(torch.abs(self.point_linear.conv.conv.weight.data), dim=(0, 2, 3))

But in Pruning_filters_for_efficient_convnets, the importance is calculated by the output dimension.

https://github.com/tyui592/Pruning_filters_for_efficient_convnets/blob/00ec7b7ae9e8f9bd3973888590728477e73537d9/prune.py#L69

sum_of_kernel = torch.sum(torch.abs(kernel.view(kernel.size(0), -1)), dim=1)

Is there any intrinsic reason to calculated by the input dimension?

Thanks!

subnet重训练代码

project里似乎只有supernet的训练代码,子网的重训练代码请问是否能提供?

what does MACs mean?

Soryy to ask such a simple question, but I can not find the solution anywhere. Could anyone help me ?

top5 performance

Hi and thanks for the amazing work,

What's the top5 accuracy on ImageNet of the model that achieved top1=80% reported in the paper?
This would help for my literature review where I only have top5 for some models.

Thanks,
Boris

Error when run train_ofa_net.py

Hi, this project is an excellent work about NAS. I am very interested in it and try it on my machine. But I get the following problem when running 'horovodrun -np 4 -H localhost:4 python train_ofa_net.py':


[1,1]:Traceback (most recent call last):
[1,1]: File "train_ofa_net.py", line 194, in
[1,1]: distributed_run_manager.broadcast()
[1,1]: File "/home/xiaobingt/xueshengke/code/once-for-all/ofa/imagenet_codebase/run_manager/distributed_run_manager.py", line 183, in broadcast
[1,1]: hvd.broadcast_parameters(self.net.state_dict(), 0)
[1,1]: File "/home/xiaobingt/horovod/env/lib/python3.7/site-packages/horovod/torch/init.py", line 476, in broadcast_parameters
[1,1]: handle = broadcast_async_(p, root_rank, name)
[1,1]: File "/home/xiaobingt/horovod/env/lib/python3.7/site-packages/horovod/torch/mpi_ops.py", line 449, in broadcast_async_
[1,1]: return _broadcast_async(tensor, tensor, root_rank, name)
[1,1]: File "/home/xiaobingt/horovod/env/lib/python3.7/site-packages/horovod/torch/mpi_ops.py", line 359, in _broadcast_async
[1,1]: tensor, output, root_rank, name.encode() if name is not None else _NULL)
[1,1]:RuntimeError: Internal error. Requested ReadyEvent with GPU device but not compiled with CUDA.

It seems this issuse comes from my horovod. But I have installed successfully 'horovod' and can run examples without error. I also googled but no soluntion has been found yet. Can you help me?

Here is my environment:

  • Cudnn 7.6.5
  • Cudatoolkit 10.1.243
  • Openmpi 4.0.5
  • Python 3.7.8
  • Pytorch 1.5.1
  • Tensorflow-gpu 2.1.1

args.valid_size is wrong?

Looks like args.valid_size in train_ofa_net.py is set to 10000. Is that right? Seems to me that target size is much smaller than that (~200)

once-for-all/ofa/imagenet_classification/data_providers/base_provider.py", line 42, in random_sample_valid_set
assert train_size > valid_size

Bug for the implementation of knowledge distillation?

Thanks for sharing your code!
I'm wondering if this is a bug for the implementation of knowledge distillation.
Since the cross_entropy_loss_with_soft_target already use nn.LogSoftmax,

logsoftmax = nn.LogSoftmax()

Does it need to apply softmax on soft_logits here again? Thanks!

soft_label = F.softmax(soft_logits, dim=1)

How to deploy to mobile?

Thanks for great work! This code uses a pytorch model but you mention that the models are deployed on mobile in tf-lite, do you convert a pytorch model with ONNX or implement it in tensorflow separately?

two questions about ofa

Thanks for sharing your excellent work. I hava two questions about ofa.

  1. Different hardware platforms have different optimizations for op and We often choose efficient op according to differnt hardware platform, can ofa handle this situation when different hardware platform have different prefer op?
  2. On mobile platforms, different camera sensor produce different data, so different training data for different hardware platform. when we usr ofa for a generative network, like srgan, which platform's training data should be used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.