Comments (9)
For TD3, I only found two runs where you have an explosion of the losses, but this didn't lead to the bug:
https://wandb.ai/openrlbenchmark/sb3/runs/2qdjqemd (Walker2DBulletEnv-v0)
https://wandb.ai/openrlbenchmark/sb3/runs/ffc7kx3m (BipedalWalkerHardcore-v0)
What a wonderful tool openrlbenchmark is, ping @vwxyzjn ;)
from rl-baselines3-zoo.
This may be due to a learning rate too high, see #156 (comment); do you use the default hyperparams?
Also related (and probably duplicate): DLR-RM/stable-baselines3#1401 and DLR-RM/stable-baselines3#1418
from rl-baselines3-zoo.
yes, I use the default hyperparams, I try different learning rate later.
from rl-baselines3-zoo.
Hello,
thanks for sharing the bug report.
Does the NaN happen only for some runs or for all runs?
Could you log and share a failed run using W&B? (that would allow us to take a look at all the logged data)
I also assume you are using pybullet gymnasium repo?
I'll try to reproduce the issue in the meantime.
Also related: DLR-RM/stable-baselines3#1372 changing to AdamW might solve the problem too.
from rl-baselines3-zoo.
I have tried TD3, SAC, TQC on some pybullet envs. And it only happens for the task I mention, the others is fine.
I install pybullet env by 'pip install -r ./requirements.txt' .
I can upload some log file.
sac-AntBulletEnv-v0.zip
sac-HalfCheetahBulletEnv-v0.zip
tqc-AntBulletEnv-v0.zip
tqc-HalfCheetahBulletEnv-v0.zip
from rl-baselines3-zoo.
Thanks =)
Looking at the log it seems to be due to an explosion of std (and you are using a much larger budget that the one we were using by default).
So, setting use_expln=True
(and maybe using AdamW) should solve your issue.
I would appreciate a PR that adds this parameter =)
Hmm, for TD3 it is weird if it happens as it doesn't rely on any distribution.
EDIT: I guess the issue is similar to Stable-Baselines-Team/stable-baselines3-contrib#146 by @qgallouedec
from rl-baselines3-zoo.
Bug already encountered in openrlbenchmark, I might have forgotten to report it: https://wandb.ai/openrlbenchmark/sb3/runs/27cez5ua
EDIT: I did report it, you're right @araffin ;)
from rl-baselines3-zoo.
after I change the hyperparams from
policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300])"
to
policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300], use_expln=True)"
this problem never happens again, so let's close this issue
from rl-baselines3-zoo.
Thanks for trying out =)
i'm reopening as we need to change the defaults (we would welcome a PR).
from rl-baselines3-zoo.
Related Issues (20)
- [Feature Request] Specify custom keyword arguments for eval environments HOT 2
- [Bug]: enjoy panda policy in hugging face HOT 7
- [Bug]: ppo_lstm not implemented in hyperparams_opt.py HOT 3
- [Question] rl_zoo3 optimization pipeline for ros-based custom env HOT 2
- [Question] Support for Customized BaseFeaturesExtractor HOT 1
- [Bug]: video recording on Pybullet HOT 4
- Plotting Script Improvement HOT 1
- [Question] Training Donkey Car Without Simulator Rendering HOT 2
- Issue with 'feat/offline-RL' Branch for Donkey Car in rl-baselines3-zoo HOT 4
- [Feature Request] Call train from Python code HOT 2
- [Bug]: Cannot enjoy due to error Cannot convert space of type Discrete(7). Please upgrade your code to gymnasium. HOT 1
- [Feature Request] Store git hash of key repos/packages
- [Error]: I got unexpected error using enjoy() with pretrain model HOT 4
- Training DonkeyCar with TQC algorithm with pretrained AE
- [Bug]: Custom Sub-Hyperparameters during train.py -> Optimize HOT 1
- [Question] You must pass an environment when using `HerReplayBuffer` HOT 1
- [Question] RuntimeError: Unable to sample before the end of the first episode. We recommend choosing a value for learning_starts that is greater than the maximum number of timesteps in the environment. HOT 5
- [Question] Custom Eval Callback for train/optimize HOT 2
- [Bug]: TODO: add test dependencies in the `setup.py` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rl-baselines3-zoo.