Giter Site home page Giter Site logo

Comments (9)

qgallouedec avatar qgallouedec commented on June 5, 2024 2

For TD3, I only found two runs where you have an explosion of the losses, but this didn't lead to the bug:
https://wandb.ai/openrlbenchmark/sb3/runs/2qdjqemd (Walker2DBulletEnv-v0)
https://wandb.ai/openrlbenchmark/sb3/runs/ffc7kx3m (BipedalWalkerHardcore-v0)
What a wonderful tool openrlbenchmark is, ping @vwxyzjn ;)

from rl-baselines3-zoo.

qgallouedec avatar qgallouedec commented on June 5, 2024

This may be due to a learning rate too high, see #156 (comment); do you use the default hyperparams?

Also related (and probably duplicate): DLR-RM/stable-baselines3#1401 and DLR-RM/stable-baselines3#1418

from rl-baselines3-zoo.

ZJEast avatar ZJEast commented on June 5, 2024

yes, I use the default hyperparams, I try different learning rate later.

from rl-baselines3-zoo.

araffin avatar araffin commented on June 5, 2024

Hello,
thanks for sharing the bug report.
Does the NaN happen only for some runs or for all runs?
Could you log and share a failed run using W&B? (that would allow us to take a look at all the logged data)

I also assume you are using pybullet gymnasium repo?

I'll try to reproduce the issue in the meantime.

Also related: DLR-RM/stable-baselines3#1372 changing to AdamW might solve the problem too.

from rl-baselines3-zoo.

ZJEast avatar ZJEast commented on June 5, 2024

I have tried TD3, SAC, TQC on some pybullet envs. And it only happens for the task I mention, the others is fine.
I install pybullet env by 'pip install -r ./requirements.txt' .

I can upload some log file.

sac-AntBulletEnv-v0.zip
sac-HalfCheetahBulletEnv-v0.zip
tqc-AntBulletEnv-v0.zip
tqc-HalfCheetahBulletEnv-v0.zip

from rl-baselines3-zoo.

araffin avatar araffin commented on June 5, 2024

Thanks =)

Looking at the log it seems to be due to an explosion of std (and you are using a much larger budget that the one we were using by default).
So, setting use_expln=True (and maybe using AdamW) should solve your issue.

I would appreciate a PR that adds this parameter =)

Hmm, for TD3 it is weird if it happens as it doesn't rely on any distribution.

EDIT: I guess the issue is similar to Stable-Baselines-Team/stable-baselines3-contrib#146 by @qgallouedec

from rl-baselines3-zoo.

qgallouedec avatar qgallouedec commented on June 5, 2024

Bug already encountered in openrlbenchmark, I might have forgotten to report it: https://wandb.ai/openrlbenchmark/sb3/runs/27cez5ua
EDIT: I did report it, you're right @araffin ;)

from rl-baselines3-zoo.

ZJEast avatar ZJEast commented on June 5, 2024

after I change the hyperparams from

policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300])"

to

policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300], use_expln=True)"

this problem never happens again, so let's close this issue

from rl-baselines3-zoo.

araffin avatar araffin commented on June 5, 2024

Thanks for trying out =)
i'm reopening as we need to change the defaults (we would welcome a PR).

from rl-baselines3-zoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.