Hello there, I am wondering about the state of the ADC implementation, and what re

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, Currently the status of ADC (now AMC: <a href="https://arxiv.org

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Automated Deep Compression status about distiller HOT 28 CLOSED

intellabs commented on May 22, 2024

Automated Deep Compression status

from distiller.

Comments (28)

nzmora commented on May 22, 2024 1

After switching to using Clipped PPO I'm getting very encouraging results. See: https://github.com/NervanaSystems/distiller/wiki/AutoML-for-Model-Compression-(AMC):-Trials-and-Tribulations

from distiller.

nzmora commented on May 22, 2024 1

Hi @huxianer , @RizhaoCai ,

I merged the revised AMC implementation to 'master'. You can now try our auto-compression code.
I'll add more information on the setup soon.

It currently doesn't support object detection. @levzlotnik is working on adding an example of object detection, after which we will consider automating. If you happen to integrate object-detection with AMC, we'd be interested in considering it for integration into the Distiller code-base.
Cheers,
Neta

Cheers
Neta

from distiller.

nzmora commented on May 22, 2024

Hi,

Currently the status of ADC (now AMC: https://arxiv.org/abs/1802.03494) is unchanged. I'll update when we have something that can be shared.

Cheers
Neta

from distiller.

amjad-twalo commented on May 22, 2024

Thanks for the response :)
As far as I looked, the implementation seems to be almost done. if the remaining work is clear and you're open for contributions, I can set aside some time to finish it up.
I have been using distiller for a while now, and it saved me a lot of time. It would be awesome to have AMC up and running on it.

Cheers,
Amjad

from distiller.

nzmora commented on May 22, 2024

Hi Amjad,

I'm happy to hear that you're using Distiller and find it useful!
I'll be returning from Beijing in a couple of weeks and then I'll spend some time to synchronize Distiller with the public Coach APIs, and we can then see how to work together to get AMC working ASAP.
I appreciate the help!

Cheers,
Neta

from distiller.

amjad-twalo commented on May 22, 2024

Hey Neta,
Any update regarding this? I think I will have some time to work on it in the next couple of weeks.

Cheers,
Amjad

from distiller.

nzmora commented on May 22, 2024

Sorry Amjad, I still haven't completed the move to the public v.0.11.0 Coach. I'm currently pushing code that's still integrated with an older, private branch, of Coach.
I'll let you know as soon as I commit a version that can work with public Coach.
Thanks,
Neta

from distiller.

nzmora commented on May 22, 2024

Hi Amjad,
I pushed a commit that integrates Distiller with the Coach master branch (requires one PR I pushed to Coach - see details in the Distiller commit).
Currently only R_flops (AccuracyGuaranteed Compression) is enabled.
It converges to a solution quickly after finishing the first 100 exploration episodes, but the converged solution is unsatisfactory. I tried it on Plain-20 and VGG16 - both for CIFAR.
There are several open issues, which I won't enumerate right now - first, I need to try to better understand what's going on.

Cheers,
Neta

from distiller.

HKLee2040 commented on May 22, 2024

@nzmora
*NOTE: you may need to update TensorFlow to the expected version:
$ pip3 install tensorflow==1.9.0

Does that mean I have to install cuda 9.0 if I want to try AMC?

from distiller.

nzmora commented on May 22, 2024

Hi @HKLee2040
No, installing TF 1.9.0 doesn't not require upgrading CUDA.

Cheers
Neta

from distiller.

nzmora commented on May 22, 2024

See https://github.com/NervanaSystems/distiller/blob/amc/examples/automated_deep_compression/amc-results.ipynb.

Work on AMC currently takes place in branch 'amc'. Your help is more than welcome.
Cheers
Neta

from distiller.

huxianer commented on May 22, 2024

@nzmora @nzmora Could you share plain20.checkpoint.pth.tar,Thanks!

from distiller.

nzmora commented on May 22, 2024

@huxianer the schedule file for training Plain20 is here. It took me about 33 minutes on 4-GPUs.

However, since you've asked :-), I've also uploaded the image here:
https://drive.google.com/file/d/1bBhjjxkXjFHmqfTWKnxop3n6QCN8QfZJ/view?usp=sharing

Cheers,
Neta

from distiller.

huxianer commented on May 22, 2024

@nzmora Thank you very much! I have another question to ask you,I found the top1 performance is really unchanged when I dont use the pretrained model.So, if I dont have the pretrained model,what
can I do?

from distiller.

nzmora commented on May 22, 2024

Hi @huxianer,
I am not sure I understood your question, so I will answer according to what I understood.

I think you are asking how to train using AMC if we don't have a pre-trained model of the network we are compressing.
The answer is that you must have a pre-trained model because "We aim to automatically find the redundancy for each layer, characterized by sparsity. We train an reinforcement learning agent to predict the action and give the sparsity, then perform form the pruning. We quickly evaluate the accuracy after pruning but before fine-tuning as an effective delegate of final accuracy" (section 3, page 4). You can only "find the redundancy for each layer" if you are searching a pre-trained model. If the model is not trained, you cannot find any redundancy because the weights do not have any meaning (they are randomly distributed).

I hope this helps,
Neta

from distiller.

HKLee2040 commented on May 22, 2024

Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?
I have some modifications:
Due to only one GPU in my environment, so I modify "conv_op = g.find_op(normalize_module_name(name))" to "conv_op = g.find_op(name)".

And args.amc_target_density = None, so I add
args.amc_target_density = 0.5;
in my code.

from distiller.

nzmora commented on May 22, 2024

Hi @HKLee2040

I have some modifications:

I will need to fix the code for the case of one GPU.

Why the smooth_top1 and smooth_reward are overlapping in my "Performance Data" diagram?

I don't know which protocol you are using ("mac-constrained" or "accuracy-guaranteed"), but both are highly correlated to the Top1 accuracy:

So it makes sense that you will see an overlap when the graphs are smoothed (I smoothed using a simple moving average) because the signal noise is made less noticeable in both the reward and accuracy signals. You can see an example here.

Having said that, I think that you ask a good question. I think that this is a clue as to why the reward defined in the AMC paper, for accuracy-guaranteed-compression, is not so good. The solutions converge on maximum density for all layers (you can see this in the green bars here) - probably because the agent tries to maximize the Top1 accuracy - and not enough weight is given to the MACs (FLOPs) in the reward (5).
This is my conjecture at the moment.

Thanks,
Neta

from distiller.

nzmora commented on May 22, 2024

Hi @HKLee2040,

My protocol is "mac-constrained". The reward fn should be top1/100.
But why the blue and green line your Performance Data are so different?

Thanks for the persistency. The shift you see is an illusion (and causes confusion, I guess) and is caused by the fact that the reward and Top1 accuracy use different axes (top1 on the right; reward on the left). The reward's range is [0..1] and the accuracy is [0..100] and because their values are correlated exactly (reward = 1/100 as you wrote above) they should align. However, when we draw the MAC values, also on the left axis, they distort the relativity of the axes (they shift relative to one another). You can see this if you disable the rendering of the MACs graphs, or if you set the ylim of the axes. For example:

def plot_performance(alpha, window_size, top1, macs, params, reward, start=0, end=-1):
    plot_kwargs = {"figsize":(15,7), "lw": 1, "alpha": alpha, "title": "Performance Data"}
    smooth_kwargs = {"lw": 2 if window_size > 0 else 1, "legend": True}
    if macs:
        ax = df['normalized_macs'][start:end].plot(**plot_kwargs, color="r")
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_normalized_macs'] = smooth(df['normalized_macs'], window_size)
        df['smooth_normalized_macs'][start:end].plot(**smooth_kwargs, color="r")
    if top1:
        ax = df['top1'][start:end].plot(**plot_kwargs, color="b", grid=True)
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_top1'] = smooth(df['top1'], window_size)
        df['smooth_top1'][start:end].plot(**smooth_kwargs, color="b")
    if params:
        ax = df['normalized_nnz'][start:end].plot(**plot_kwargs, color="black")
        ax.set(xlabel="Episode", ylabel="(%)", ylim=[0,100])
        df['smooth_normalized_nnz'] = smooth(df['normalized_nnz'], window_size)
        df['smooth_normalized_nnz'][start:end].plot(**smooth_kwargs, color="black")        
    if reward:
        ax = df['reward'][start:end].plot(**plot_kwargs, secondary_y=True, color="g")
        ax.set(xlabel="Episode", ylabel="reward", ylim=[0,1.0])
        df['smooth_reward'] = smooth(df['reward'], window_size)
        df['smooth_reward'][start:end].plot(**smooth_kwargs, secondary_y=True, color="g")    
    ax.grid(True, which='minor', axis='x', alpha=0.3)

I uploaded my raw log files to here and you can load and try them.

Still, you ask why for you the graphs overlap and for me they don't. This is because, in my files, the big drop in the MACs (at episode 3474; to ~5%) causes the left and right axes to shift and they become unaligned.

Cheers
Neta

from distiller.

HKLee2040 commented on May 22, 2024

Hi @nzmora

Got it! It's my carelessness. I didn't check the scale of axes.
Thanks for your detailed reply.

from distiller.

HKLee2040 commented on May 22, 2024

Hi @nzmora

May I know why you set pi_lr = 1e-4, q_lr = 1e-3 in ddpg?
Do you refer to arXiv:1811.08886, where they use a fixed learning rate of 1e−4 for the actor network and 1e−3 for the critic network.

    ddpg.ddpg(env=env1, test_env=env2, actor_critic=core.mlp_actor_critic,
              ac_kwargs=dict(hidden_sizes=[hid]*layers, output_activation=tf.sigmoid),
              gamma=1,  # discount rate
              seed=seed,
              epochs=400,
              replay_size=2000,
              batch_size=64,
              start_steps=env1.amc_cfg.num_heatup_epochs,
              steps_per_epoch=800 * env1.num_layers(),  # every 50 episodes perform 10 episodes of testing
              act_noise=0.5,
              pi_lr=1e-4,
              q_lr=1e-3,
              logger_kwargs=logger_kwargs)

from distiller.

nzmora commented on May 22, 2024

Hi @HKLee2040,
I got these numbers from the DDPG paper
Continuous control with deep reinforcement learning.
Cheers
Neta

from distiller.

huxianer commented on May 22, 2024

@nzmora Hi，How do you get the YAML file of pruning schedule,Could you share the pruning schedule YAML file of resnet trained in IMAGENET,THKS!

from distiller.

nzmora commented on May 22, 2024

Hi @huxianer,
I'm not sure I understand which YAML file you refer to. AMC/ADC currently works w/o YAML.
There are some sample YAML files using other techniques. For example AGP.
Cheers
Neta

from distiller.

huxianer commented on May 22, 2024

@nzmora @HKLee2040 I refer to every YAML file,here give it directly,but it does not say how to get it.You say AMC/ADC currently works w/o YAML,could you give an example which without YAML file,Thank you for your help!

from distiller.

HKLee2040 commented on May 22, 2024

Hi @huxianer

You can refer to nzmora's message
https://github.com/NervanaSystems/distiller/issues/64#issuecomment-451766455

The command-line is:
python3 compress_classifier.py --arch=plain20_cifar ../../../data.cifar --amc --resume=checkpoint.plain20_cifar.pth.tar --lr=0.05 --amc-action-range 0.0 0.80 --vs=0.8

from distiller.

huxianer commented on May 22, 2024

@nzmora Hi,whether this Distiller supports detection model,and if not,do you have any intention to support it？

from distiller.

RizhaoCai commented on May 22, 2024

I am also interested in using AMC for detection models. How about the progress now?

from distiller.

wangyidong3 commented on May 22, 2024

Hi @levzlotnik @nzmora
Thank you for your great work!
Is there any update for the example of object detection with AMC? Or do you have any suggestions?
Thanks.

from distiller.

Automated Deep Compression status about distiller HOT 28 CLOSED

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent