alecwangcq / grasp Goto Github PK
View Code? Open in Web Editor NEWCode for "Picking Winning Tickets Before Training by Preserving Gradient Flow" https://openreview.net/pdf?id=SkgsACVKPH
License: MIT License
Code for "Picking Winning Tickets Before Training by Preserving Gradient Flow" https://openreview.net/pdf?id=SkgsACVKPH
License: MIT License
Hi Chaoqi,
I was wondering whether dividing by gradient norm is necessary. It seems like it doesn't affect the ordering and therefore the results. I might be missing something.
Line 146 in f17d87a
Hi, quick question. Why are the inputs and targets split into different batches here?
https://github.com/alecwangcq/GraSP/blob/master/pruner/GraSP.py#L82
Did you not have the memory to compute the gradients in a single batch?
Hello, Thank you very much for your code. I am a lit bit confused about some details in your paper. Could you please help me with them?
In your equation 8, you multiple the Hg with -theta. I do not understand why you have to multiply Hg with -theta since Hg is already a measurement of the importance.
In your code, you use 1000 examples for cifar100 and divide them into 4250. While you computing the Hessian vector product: z += (grad_w[count].data * grad_f[count]).sum(), you use different numbers of data for grad_w and grad_f. The grad_w is the sum of gradients for 4250, but grad_f is only the gradients of this batch of 250 examples. Why do you do it in this way?
If possible, could you please share a detailed proof of eqa.7?
Thank you very much for your work!
Hi Chaoqi,
Thanks for sharing the awesome paper and code with us. I have a little problem about the Hessian-gradient product.
What's the stop_grad function in the 3rd line of Algorithm.2 of the original paper.
I checked the GraSP_ImageNet.py and found that grad_w and grad_f seem the same. They are all the gradient of the CE loss w.r.t the weights. What's the difference between them and is it possible to replace grad_w[count] * grad_f[count] (L102 of GraSP_ImageNet.py) with grad_w[count]*grad_w[count]?
I'm also curious about the input split in GraSP_ImageNet.py. If you don't have enough memory, why not decrease input batch_size? I guess maybe there are some special reasons to not decrease batch_size?
Thanks,
Ziqi
Hello, when I am training tiny-imagenet with this repo using Grasp, the test accuracy will never reach even 1%. While the training accuracy will reach 99%. This behavior was noticed out of the box. Any idea what might be going on?
Hi Chaoqi,
Thanks for sharing the awesome paper and code with us. I have a little problem about the code.
in the code:
configs/imagenet/resnet50/GraSP_80.json
"learning_rate" and "weight_decay" are both equal to 0.
could you tell me the value about this two parameters in your expriment?
Thanks,
Pong
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.