Giter Site home page Giter Site logo

Comments (9)

abdur-n-tr avatar abdur-n-tr commented on August 26, 2024 1

Thanks for considering the issue and quick fix. This repo is really great and keeps updating it with awesome stuff!

from tez.

abhishekkrthakur avatar abhishekkrthakur commented on August 26, 2024

it should be fixed in main branch now. could you please confirm?

from tez.

abdur-n-tr avatar abdur-n-tr commented on August 26, 2024

This issue still exists in main branch as well as I did not see any fix for it.

from tez.

abhishekkrthakur avatar abhishekkrthakur commented on August 26, 2024

hmm... then i might be missing something. could you please share more information/code? i see that the zero grads as fine. or i missed something?

from tez.

abdur-n-tr avatar abdur-n-tr commented on August 26, 2024

I just run the latest tez code and same issue is still happening.

image

Here, you can see when batch_index = 0, it will first zero_grad and then forward pass will run and it is fine but when batch_index = 1, this condition will not run so forward pass will run without doing zero_grad first as in below snapshot.

image

So, one solution is to either remove if condition on zero_grad in _step() on line # 336 OR remove the self.batch_index == 0 on line # 299 OR you can come with some other fix which you know better.

Hope it helps.

from tez.

abhishekkrthakur avatar abhishekkrthakur commented on August 26, 2024

it seems like one of us is confused. when batch_index = 1 or any value greater than zero, zero_grad is happening here:

if self.batch_index > 0:
. This zero_grad is before the forward pass.

when batch_index = 0, _step function does the zero grad. this zero_grad is before the very first forward pass.

from tez.

abdur-n-tr avatar abdur-n-tr commented on August 26, 2024

I tried to log the forward pass and zero_grad (wherever it is written in code) like this,

image

image

You can point out if I am printing logs in wrong way.
and you will see logs like this,

image

ofcourse zero_grad is happening for batch_index = 1 but after forward pass completes for batch_index = 1.

from tez.

abdur-n-tr avatar abdur-n-tr commented on August 26, 2024

Also, I just recap myself about zero_grad and actually it has nothing to do with forward pass instead it must happen before backward pass (apologies for mistyping) so I logged backward pass as well but still same issue as you can see below.

zero_grad before backward pass:
https://discuss.pytorch.org/t/should-we-perform-optimizer-zero-grad-before-we-do-forward-pass-or-after-that/14254

image

from tez.

abhishekkrthakur avatar abhishekkrthakur commented on August 26, 2024

ohh! thanks for the code. i got it now and I've fixed it in main branch. many thanks for looking deep into the codebase 🙏🏽

from tez.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.