Comments (28)
After switching to using Clipped PPO I'm getting very encouraging results. See: https://github.com/NervanaSystems/distiller/wiki/AutoML-for-Model-Compression-(AMC):-Trials-and-Tribulations
from distiller.
Hi @huxianer , @RizhaoCai ,
I merged the revised AMC implementation to 'master'. You can now try our auto-compression code.
I'll add more information on the setup soon.
It currently doesn't support object detection. @levzlotnik is working on adding an example of object detection, after which we will consider automating. If you happen to integrate object-detection with AMC, we'd be interested in considering it for integration into the Distiller code-base.
Cheers,
Neta
Cheers
Neta
from distiller.
Hi,
Currently the status of ADC (now AMC: https://arxiv.org/abs/1802.03494) is unchanged. I'll update when we have something that can be shared.
Cheers
Neta
from distiller.
Thanks for the response :)
As far as I looked, the implementation seems to be almost done. if the remaining work is clear and you're open for contributions, I can set aside some time to finish it up.
I have been using distiller for a while now, and it saved me a lot of time. It would be awesome to have AMC up and running on it.
Cheers,
Amjad
from distiller.
Hi Amjad,
I'm happy to hear that you're using Distiller and find it useful!
I'll be returning from Beijing in a couple of weeks and then I'll spend some time to synchronize Distiller with the public Coach APIs, and we can then see how to work together to get AMC working ASAP.
I appreciate the help!
Cheers,
Neta
from distiller.
Hey Neta,
Any update regarding this? I think I will have some time to work on it in the next couple of weeks.
Cheers,
Amjad
from distiller.
Sorry Amjad, I still haven't completed the move to the public v.0.11.0 Coach. I'm currently pushing code that's still integrated with an older, private branch, of Coach.
I'll let you know as soon as I commit a version that can work with public Coach.
Thanks,
Neta
from distiller.
Hi Amjad,
I pushed a commit that integrates Distiller with the Coach master branch (requires one PR I pushed to Coach - see details in the Distiller commit).
Currently only R_flops (AccuracyGuaranteed Compression) is enabled.
It converges to a solution quickly after finishing the first 100 exploration episodes, but the converged solution is unsatisfactory. I tried it on Plain-20 and VGG16 - both for CIFAR.
There are several open issues, which I won't enumerate right now - first, I need to try to better understand what's going on.
Cheers,
Neta
from distiller.
@nzmora
*NOTE: you may need to update TensorFlow to the expected version:
$ pip3 install tensorflow==1.9.0
Does that mean I have to install cuda 9.0 if I want to try AMC?
from distiller.
Hi @HKLee2040
No, installing TF 1.9.0 doesn't not require upgrading CUDA.
Cheers
Neta
from distiller.
Work on AMC currently takes place in branch 'amc'. Your help is more than welcome.
Cheers
Neta
from distiller.
@nzmora @nzmora Could you share plain20.checkpoint.pth.tar,Thanks!
from distiller.
@huxianer the schedule file for training Plain20 is here. It took me about 33 minutes on 4-GPUs.
However, since you've asked :-), I've also uploaded the image here:
https://drive.google.com/file/d/1bBhjjxkXjFHmqfTWKnxop3n6QCN8QfZJ/view?usp=sharing
Cheers,
Neta
from distiller.
@nzmora Thank you very much! I have another question to ask you,I found the top1 performance is really unchanged when I dont use the pretrained model.So, if I don
t have the pretrained model,what
can I do?
from distiller.
Hi @huxianer,
I am not sure I understood your question, so I will answer according to what I understood.
I think you are asking how to train using AMC if we don't have a pre-trained model of the network we are compressing.
The answer is that you must have a pre-trained model because "We aim to automatically find the redundancy for each layer, characterized by sparsity. We train an reinforcement learning agent to predict the action and give the sparsity, then perform form the pruning. We quickly evaluate the accuracy after pruning but before fine-tuning as an effective delegate of final accuracy" (section 3, page 4). You can only "find the redundancy for each layer" if you are searching a pre-trained model. If the model is not trained, you cannot find any redundancy because the weights do not have any meaning (they are randomly distributed).
I hope this helps,
Neta
from distiller.
Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?
I have some modifications:
Due to only one GPU in my environment, so I modify "conv_op = g.find_op(normalize_module_name(name))" to "conv_op = g.find_op(name)".
And args.amc_target_density = None, so I add
args.amc_target_density = 0.5;
in my code.
from distiller.
Hi @HKLee2040
I have some modifications:
I will need to fix the code for the case of one GPU.
Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?
I don't know which protocol you are using ("mac-constrained" or "accuracy-guaranteed"), but both are highly correlated to the Top1 accuracy:
So it makes sense that you will see an overlap when the graphs are smoothed (I smoothed using a simple moving average) because the signal noise is made less noticeable in both the reward and accuracy signals. You can see an example here.
Having said that, I think that you ask a good question. I think that this is a clue as to why the reward defined in the AMC paper, for accuracy-guaranteed-compression, is not so good. The solutions converge on maximum density for all layers (you can see this in the green bars here) - probably because the agent tries to maximize the Top1 accuracy - and not enough weight is given to the MACs (FLOPs) in the reward (5).
This is my conjecture at the moment.
Thanks,
Neta
from distiller.
Hi @HKLee2040,
My protocol is "mac-constrained". The reward fn should be top1/100.
But why the blue and green line your Performance Data are so different?
Thanks for the persistency. The shift you see is an illusion (and causes confusion, I guess) and is caused by the fact that the reward and Top1 accuracy use different axes (top1 on the right; reward on the left). The reward's range is [0..1] and the accuracy is [0..100] and because their values are correlated exactly (reward = 1/100 as you wrote above) they should align. However, when we draw the MAC values, also on the left axis, they distort the relativity of the axes (they shift relative to one another). You can see this if you disable the rendering of the MACs graphs, or if you set the ylim
of the axes. For example:
def plot_performance(alpha, window_size, top1, macs, params, reward, start=0, end=-1):
plot_kwargs = {"figsize":(15,7), "lw": 1, "alpha": alpha, "title": "Performance Data"}
smooth_kwargs = {"lw": 2 if window_size > 0 else 1, "legend": True}
if macs:
ax = df['normalized_macs'][start:end].plot(**plot_kwargs, color="r")
ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
df['smooth_normalized_macs'] = smooth(df['normalized_macs'], window_size)
df['smooth_normalized_macs'][start:end].plot(**smooth_kwargs, color="r")
if top1:
ax = df['top1'][start:end].plot(**plot_kwargs, color="b", grid=True)
ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
df['smooth_top1'] = smooth(df['top1'], window_size)
df['smooth_top1'][start:end].plot(**smooth_kwargs, color="b")
if params:
ax = df['normalized_nnz'][start:end].plot(**plot_kwargs, color="black")
ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
df['smooth_normalized_nnz'] = smooth(df['normalized_nnz'], window_size)
df['smooth_normalized_nnz'][start:end].plot(**smooth_kwargs, color="black")
if reward:
ax = df['reward'][start:end].plot(**plot_kwargs, secondary_y=True, color="g")
ax.set(xlabel="Episode", ylabel="reward", ylim=[0,1.0])
df['smooth_reward'] = smooth(df['reward'], window_size)
df['smooth_reward'][start:end].plot(**smooth_kwargs, secondary_y=True, color="g")
ax.grid(True, which='minor', axis='x', alpha=0.3)
I uploaded my raw log files to here and you can load and try them.
Still, you ask why for you the graphs overlap and for me they don't. This is because, in my files, the big drop in the MACs (at episode 3474; to ~5%) causes the left and right axes to shift and they become unaligned.
Cheers
Neta
from distiller.
Hi @nzmora
Got it! It's my carelessness. I didn't check the scale of axes.
Thanks for your detailed reply.
from distiller.
Hi @nzmora
May I know why you set pi_lr = 1e-4, q_lr = 1e-3 in ddpg?
Do you refer to arXiv:1811.08886, where they use a fixed learning rate of 1e−4 for the actor network and 1e−3 for the critic network.
ddpg.ddpg(env=env1, test_env=env2, actor_critic=core.mlp_actor_critic,
ac_kwargs=dict(hidden_sizes=[hid]*layers, output_activation=tf.sigmoid),
gamma=1, # discount rate
seed=seed,
epochs=400,
replay_size=2000,
batch_size=64,
start_steps=env1.amc_cfg.num_heatup_epochs,
steps_per_epoch=800 * env1.num_layers(), # every 50 episodes perform 10 episodes of testing
act_noise=0.5,
pi_lr=1e-4,
q_lr=1e-3,
logger_kwargs=logger_kwargs)
from distiller.
Hi @HKLee2040,
I got these numbers from the DDPG paper
Continuous control with deep reinforcement learning.
Cheers
Neta
from distiller.
@nzmora Hi,How do you get the YAML file of pruning schedule,Could you share the pruning schedule YAML file of resnet trained in IMAGENET,THKS!
from distiller.
Hi @huxianer,
I'm not sure I understand which YAML file you refer to. AMC/ADC currently works w/o YAML.
There are some sample YAML files using other techniques. For example AGP.
Cheers
Neta
from distiller.
@nzmora @HKLee2040 I refer to every YAML file,here give it directly,but it does not say how to get it.You say AMC/ADC currently works w/o YAML,could you give an example which without YAML file,Thank you for your help!
from distiller.
Hi @huxianer
You can refer to nzmora's message
https://github.com/NervanaSystems/distiller/issues/64#issuecomment-451766455
The command-line is:
python3 compress_classifier.py --arch=plain20_cifar ../../../data.cifar --amc --resume=checkpoint.plain20_cifar.pth.tar --lr=0.05 --amc-action-range 0.0 0.80 --vs=0.8
from distiller.
@nzmora Hi,whether this Distiller supports detection model,and if not,do you have any intention to support it?
from distiller.
I am also interested in using AMC for detection models. How about the progress now?
from distiller.
Hi @levzlotnik @nzmora
Thank you for your great work!
Is there any update for the example of object detection with AMC? Or do you have any suggestions?
Thanks.
from distiller.
Related Issues (20)
- Could you provide the checkpoints for structural pruning experiments?
- checkpoints example in example jupyter-notebook download denied. HOT 1
- Support for PyTorch 1.7? HOT 3
- Can't install pyglet, even when i cloned it form github
- Higher than 8-bit Quantization not working properly!?
- yolo4 custom object detection deep compression
- Why can't I use multi-GPU training
- How can I use the distilled model in embedded device?
- Combining quantization and pruning in Distiller
- Issue running compress_classifier.py HOT 1
- Reduce the yolov3 model size of keras(.h5) or darknet(.weight)
- Quantization don't reduce the model file size
- How to train my original dataset in distiller? HOT 1
- Error running 'pip install mintapi' on Raspberry Pi
- --load-serialized will make model fail to prune HOT 1
- QAT for LSTM
- outdated requirements? HOT 2
- Sensitivity Analysis
- Does it support translation model?
- Load quantization aware model checkpoint (inference) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from distiller.