Giter Site home page Giter Site logo

alecwangcq / grasp Goto Github PK

View Code? Open in Web Editor NEW
97.0 97.0 14.0 28 KB

Code for "Picking Winning Tickets Before Training by Preserving Gradient Flow" https://openreview.net/pdf?id=SkgsACVKPH

License: MIT License

Python 100.00%
deep-learning lottery-ticket-hypothesis machine-learning neural-networks pruning pytorch

grasp's People

Contributors

alecwangcq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

grasp's Issues

Some questions about your paper and code

Hello, Thank you very much for your code. I am a lit bit confused about some details in your paper. Could you please help me with them?

image
In your equation 8, you multiple the Hg with -theta. I do not understand why you have to multiply Hg with -theta since Hg is already a measurement of the importance.

  1. In your code, you use 1000 examples for cifar100 and divide them into 4250. While you computing the Hessian vector product: z += (grad_w[count].data * grad_f[count]).sum(), you use different numbers of data for grad_w and grad_f. The grad_w is the sum of gradients for 4250, but grad_f is only the gradients of this batch of 250 examples. Why do you do it in this way?

  2. If possible, could you please share a detailed proof of eqa.7?

Thank you very much for your work!

Question about Hessian-gradient product

Hi Chaoqi,
Thanks for sharing the awesome paper and code with us. I have a little problem about the Hessian-gradient product.

  1. What's the stop_grad function in the 3rd line of Algorithm.2 of the original paper.

  2. I checked the GraSP_ImageNet.py and found that grad_w and grad_f seem the same. They are all the gradient of the CE loss w.r.t the weights. What's the difference between them and is it possible to replace grad_w[count] * grad_f[count] (L102 of GraSP_ImageNet.py) with grad_w[count]*grad_w[count]?

  3. I'm also curious about the input split in GraSP_ImageNet.py. If you don't have enough memory, why not decrease input batch_size? I guess maybe there are some special reasons to not decrease batch_size?

Thanks,
Ziqi

Test accuracy for tiny-imagenet

Hello, when I am training tiny-imagenet with this repo using Grasp, the test accuracy will never reach even 1%. While the training accuracy will reach 99%. This behavior was noticed out of the box. Any idea what might be going on?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.