nm512 / dreamerv3-torch Goto Github PK

View Code? Open in Web Editor NEW

341.0 341.0 76.0 1.68 MB

Implementation of Dreamer v3 in pytorch.

License: MIT License

Python 98.78% Dockerfile 1.15% Shell 0.07%

deep-learning pytorch reinforcement-learning

dreamerv3-torch's People

Contributors

Stargazers

Watchers

Forkers

wang88256187 cmu-sichengx jeremygatineau kaustubhsridhar truncs feracero zhoubin-me shell769324 wkh923 maxemerling dirkmcpherson ktolnos zdx3578 p90-rushb ll7 intuinno drapado sungwon23 dankhap yufeiwang63 alaliqing webersamuel rpl-cs-ucl carldegio junjungoal benquick123 saeedvandsaeed iamseungpil sangyeon-park n-brindise yinjc8214 zxq-0058 im-ant guidoinsinger liuqh16 konakarthik12 hitsunzhenguo giteverything lyk-love hdadong topsailcbd yezus69 ugadiarov-la-phystech-edu connorzhong m-barker anivanch henosis-us rohan138 zivzone georgeljj saturn-not-la mirkocovizzi yachenkang intractai qingyuanwunothing maxzand pickxiguapi bwl11 moreinfoy cszou victorkich jonasgrutter bryanoliveira muyun1996

dreamerv3-torch's Issues

Training Error. no attribute 'observation_space'

Thank you for your reproduction. I encountered this problem when trying to run atari training. Is my atari version incorrect? The version of gym installed is 0.19.0.

Traceback (most recent call last):
  File "/home/liangyb/git/dreamerv3-torch/dreamer.py", line 406, in <module>
    main(parser.parse_args(remaining))
  File "/home/liangyb/git/dreamerv3-torch/dreamer.py", line 353, in main
    train_envs[0].observation_space,
  File "/home/liangyb/git/dreamerv3-torch/envs/wrappers.py", line 177, in __getattr__
    return getattr(self._env, name)
  File "/home/liangyb/git/dreamerv3-torch/envs/wrappers.py", line 18, in __getattr__
    return getattr(self._env, name)
  File "/home/liangyb/git/dreamerv3-torch/envs/wrappers.py", line 203, in __getattr__
    return getattr(self._env, name)
  File "/home/liangyb/git/dreamerv3-torch/envs/wrappers.py", line 94, in __getattr__
    return getattr(self._env, name)
  File "/home/liangyb/git/dreamerv3-torch/envs/wrappers.py", line 143, in __getattr__
    return getattr(self._env, name)
AttributeError: 'Atari' object has no attribute 'observation_space'

Multigpus

Hi,
How to extend the codebase to multi-gpus ? Seems like non-trivial to do that?
Thank you,

[nr620@gpu022 dreamerv3-torch]$ python3 dreamer.py --configs dmc_vision --task dmc_walker_walk --logdir ./logdir/dmc_walker_walk
Logdir logdir/dmc_walker_walk
Create envs.
Traceback (most recent call last):
File "/cache/home/nr620/code/dreamerv3-torch/dreamer.py", line 365, in
main(parser.parse_args(remaining))
File "/cache/home/nr620/code/dreamerv3-torch/dreamer.py", line 238, in main
train_envs = [make("train", i) for i in range(config.envs)]
File "/cache/home/nr620/code/dreamerv3-torch/dreamer.py", line 238, in
train_envs = [make("train", i) for i in range(config.envs)]
File "/cache/home/nr620/code/dreamerv3-torch/dreamer.py", line 237, in
make = lambda mode, id: make_env(config, mode, id)
File "/cache/home/nr620/code/dreamerv3-torch/dreamer.py", line 151, in make_env
env = dmc.DeepMindControl(
File "/cache/home/nr620/code/dreamerv3-torch/envs/dmc.py", line 13, in init
from dm_control import suite
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/suite/init.py", line 24, in
from dm_control.suite import acrobot
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/suite/acrobot.py", line 20, in
from dm_control import mujoco
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/mujoco/init.py", line 18, in
from dm_control.mujoco.engine import action_spec
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 41, in
from dm_control import _render
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/_render/init.py", line 86, in
Renderer = import_func()
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/_render/init.py", line 46, in _import_osmesa
from dm_control._render.pyopengl.osmesa_renderer import OSMesaContext
File "/home/nr620/.local/lib/python3.9/site-packages/dm_control/_render/pyopengl/osmesa_renderer.py", line 35, in
from OpenGL import GL
File "/home/nr620/.local/lib/python3.9/site-packages/OpenGL/GL/init.py", line 4, in
from OpenGL.GL.VERSION.GL_1_1 import *
File "/home/nr620/.local/lib/python3.9/site-packages/OpenGL/GL/VERSION/GL_1_1.py", line 14, in
from OpenGL.raw.GL.VERSION.GL_1_1 import *
File "/home/nr620/.local/lib/python3.9/site-packages/OpenGL/raw/GL/VERSION/GL_1_1.py", line 7, in
from OpenGL.raw.GL import _errors
File "/home/nr620/.local/lib/python3.9/site-packages/OpenGL/raw/GL/_errors.py", line 4, in
_error_checker = _ErrorChecker( _p, _p.GL.glGetError )
AttributeError: 'NoneType' object has no attribute 'glGetError'

will add memory-maze env ?

https://github.com/jurgisp/memory-maze/

Random sampling in tools::sample_episodes

Hi,

I was going over your dataset code and I noticed that you're sampling from the episode buffer randomly. Generally this is correct because subsequent episodes will be strongly correlated, but your sampling technique picks a random episode at each step rather than guaranteeing every episode is seen before any episode is seen twice.

It's probably not a big deal since you'll sample uniformly on average, but I was wondering if you had a reason to make this implementation choice?

Thanks again for writing this repo.

multiple environments

Hi there, I'm still in the process of digging at this issue, but maybe you know a quick fix
I got an error when I set --envs to be greater than 1 in both atari and dmc environments

Here's the error + results of running with more than 1 env: python3 dreamer.py --configs defaults --task dmc_walker_walk --logdir results/walker --envs 2

Logdir results/walker
Create envs.
Prefill dataset (2500 steps).
Traceback (most recent call last):
  File "dreamer.py", line 375, in <module>
    main(parser.parse_args(remaining))
  File "dreamer.py", line 330, in main
    tools.simulate(random_agent, train_envs, prefill)
  File "/<path>/dreamerv3-torch/tools.py", line 146, in simulate
    action = [
  File "/<path>/dreamerv3-torch/tools.py", line 147, in <listcomp>
    {k: np.array(action[k][i].detach().cpu()) for k in action}
  File "/<path>/dreamerv3-torch/tools.py", line 147, in <dictcomp>
    {k: np.array(action[k][i].detach().cpu()) for k in action}
IndexError: index 1 is out of bounds for dimension 0 with size 1

EDIT: I got through that specific error, but there are shape errors all over the place when --envs > 1

About parallel efficiency

Hi,

Thanks for the nice implementation,

I have one question about parallel threads: in the original implementation, they use Sever-Client framework to have truly asynchronous threads among data collection (envs), policy (parallel_actor) and policy+world model learning (parallel_learner). Link

Meanwhile, in your implementation, if I understand correctly, you use iteration for env in envs (like Dreamerv2) in tools.py - def simulate link)

I wonder if this difference has any significant efficiency in clockwise and FPS?

Looking forward to your answer!

Example from README not working

Hi, I tried to train the example on DMC Vision but got the following error:

python3 dreamer.py --configs dmc_vision --task dmc_walker_walk --logdir ./logdir/dmc_walker_walk
Logdir logdir/dmc_walker_walk
Create envs.
libEGL warning: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)

libEGL warning: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)

Traceback (most recent call last):
File "/home/stuart/Github/dreamerv3-torch/dreamer.py", line 418, in
main(parser.parse_args(remaining))
File "/home/stuart/Github/dreamerv3-torch/dreamer.py", line 332, in main
train_envs = [make("train") for _ in range(config.envs)]
File "/home/stuart/Github/dreamerv3-torch/dreamer.py", line 332, in
train_envs = [make("train") for _ in range(config.envs)]
File "/home/stuart/Github/dreamerv3-torch/dreamer.py", line 331, in
make = lambda mode: make_env(config, logger, mode, train_eps, eval_eps)
File "/home/stuart/Github/dreamerv3-torch/dreamer.py", line 190, in make_env
env = dmc.DeepMindControl(task, config.action_repeat, config.size)
File "/home/stuart/Github/dreamerv3-torch/envs/dmc.py", line 11, in init
from dm_control import suite
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/suite/init.py", line 24, in
from dm_control.suite import acrobot
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/suite/acrobot.py", line 20, in
from dm_control import mujoco
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/mujoco/init.py", line 18, in
from dm_control.mujoco.engine import action_spec
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 41, in
from dm_control import _render
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/_render/init.py", line 86, in
Renderer = import_func()
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/_render/init.py", line 36, in _import_egl
from dm_control._render.pyopengl.egl_renderer import EGLContext
File "/home/stuart/miniconda3/envs/dreamer3.9/lib/python3.9/site-packages/dm_control/_render/pyopengl/egl_renderer.py", line 75, in
raise ImportError('Cannot initialize a headless EGL display.')
ImportError: Cannot initialize a headless EGL display.

The multi-task setup

Hi! This is awesome reimplementation work and thanks a lot! Could you point out the multi-task setup for the configuration on DM_control or other possible benchmarks?

Question of env

Hi,NM512,im trying to run the training experiments but face some question below, it seems caused by torch2.0,how could I fix it?Hope to your reply.

(dreamerv3) cxy@amin001-SYS-7049GP-TRT:~/dreamerv3-torch-main$ python3 dreamer.py --configs dmc_vision --task dmc_walker_walk --logdir ./logdir/dmc_walker_walk
Logdir logdir/dmc_walker_walk
Create envs.
/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
Prefill dataset (500 steps).
Logger: (5000 steps).
Simulate agent.
Encoder CNN shapes: {'image': (64, 64, 3)}
Encoder MLP shapes: {}
Decoder CNN shapes: {'image': (64, 64, 3)}
Decoder MLP shapes: {}
[5000]
Start evaluation.
Traceback (most recent call last):
File "/home/cxy/dreamerv3-torch-main/dreamer.py", line 392, in
main(parser.parse_args(remaining))
File "/home/cxy/dreamerv3-torch-main/dreamer.py", line 334, in main
tools.simulate(
File "/home/cxy/dreamerv3-torch-main/tools.py", line 168, in simulate
action, agent_state = agent(obs, done, agent_state)
File "/home/cxy/dreamerv3-torch-main/dreamer.py", line 90, in call
policy_output, state = self._policy(obs, state, training)
File "/home/cxy/dreamerv3-torch-main/dreamer.py", line 107, in _policy
embed = self._wm.encoder(obs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cxy/dreamerv3-torch-main/networks.py", line 391, in forward
outputs.append(self._cnn(inputs))
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cxy/dreamerv3-torch-main/networks.py", line 526, in forward
x = self.layers(x)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cxy/dreamerv3-torch-main/networks.py", line 876, in forward
ret = F.conv2d(
RuntimeError: GET was unable to find an engine to execute this computation
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function MjrContext.del at 0x7f94a72fe790>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 633, in del
self.free()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/mujoco/wrapper/core.py", line 625, in free
ctx.call(ptr.free)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 196, in call
return self._call_locked(func, *args, **kwargs)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Exception ignored in: <function ContextBase.del at 0x7f94ad232550>
Traceback (most recent call last):
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 118, in del
self._free_unconditionally()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
self._render_executor.terminate(self._free_on_executor_thread)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 214, in terminate
self._call_locked(cleanup_callable)
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/site-packages/dm_control/_render/executor/render_executor.py", line 206, in _call_locked
return self._executor.submit(func, *args, **kwargs).result()
File "/home/cxy/miniconda3/envs/dreamerv3/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

Error running example provided in README.md

Hello, I ran the training on DMC Vision as provided in the README:

python3 dreamer.py --configs dmc_vision --task dmc_walker_walk --logdir ./logdir/dmc_walker_walk

But I got the following error:

Traceback (most recent call last):
  File "dreamer.py", line 386, in <module>
    main(parser.parse_args(remaining))
  File "dreamer.py", line 276, in main
    config.num_actions = acts.n if hasattr(acts, "n") else acts.shape[0]
AttributeError: 'functools.partial' object has no attribute 'shape'

I then lowered the number of envs from 4 (default) to 1. I could then fix the error by changing line 274 from

acts = train_envs[0].action_space

acts = train_envs[0].action_space()

but this resulted in other errors. I suppose the problem is elsewhere. Any help is appreciated !

Replay Buffer Question

Hi @NM512,

congrats on this implementation! Looks awesome.

I was reading your code and I got confused about your implementation of the replay buffer. Are you storing the buffer in RAM or in disk? Which code appends the new transitions to the buffer? In the config file, dataset_size=0, isn't this a problem?

Can you please give further details about this topic?
Thanks in advance.

Plan2explore

Hi, thanks for your contribution.

I'm just wondering: have you tested your code in plan2explore experiment setting and can you show some results? I saw you updated your code with plan2explore code.

@NM512

Thanks.

networks.py line 808 variable "mean" not defined

    elif self._dist == "normal_1":
        x = self._dist_layer(x)
        dist = torchd.normal.Normal(mean, 1)
        dist = tools.ContDist(torchd.independent.Independent(dist, 1))

     the variable "mean" is not defined

Question for Docker

Hi,NM512,im trying to use docker to run dreamerv3 but face some question below, it seems caused by mujoco,how could I fix it?Hope to your reply.

root@cb0af21e2e22:/workspace/dreamerv3-torch# python3 dreamer.py --configs dmc_vision --task dmc_walker_walk --logdir ./logdir/dmc_walker_walk
Logdir logdir/dmc_walker_walk
Create envs.
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:132: UserWarning: A NumPy version >=1.21.6 and <1.28.0 is required for this version of SciPy (detected version 1.21.0)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
  File "/workspace/dreamerv3-torch/dreamer.py", line 392, in <module>
    main(parser.parse_args(remaining))
  File "/workspace/dreamerv3-torch/dreamer.py", line 266, in main
    train_envs = [make("train") for _ in range(config.envs)]
  File "/workspace/dreamerv3-torch/dreamer.py", line 266, in <listcomp>
    train_envs = [make("train") for _ in range(config.envs)]
  File "/workspace/dreamerv3-torch/dreamer.py", line 265, in <lambda>
    make = lambda mode: make_env(config, mode)
  File "/workspace/dreamerv3-torch/dreamer.py", line 179, in make_env
    env = dmc.DeepMindControl(
  File "/workspace/dreamerv3-torch/envs/dmc.py", line 15, in __init__
    self._env = suite.load(
  File "/opt/conda/lib/python3.10/site-packages/dm_control/suite/__init__.py", line 113, in load
    return build_environment(domain_name, task_name, task_kwargs,
  File "/opt/conda/lib/python3.10/site-packages/dm_control/suite/__init__.py", line 148, in build_environment
    env = domain.SUITE[task_name](**task_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dm_control/suite/walker.py", line 62, in walk
    physics = Physics.from_xml_string(*get_model_and_assets())
  File "/opt/conda/lib/python3.10/site-packages/dm_control/mujoco/engine.py", line 436, in from_xml_string
    return cls.from_model(model)
  File "/opt/conda/lib/python3.10/site-packages/dm_control/mujoco/engine.py", line 419, in from_model
    return cls(data)
  File "/opt/conda/lib/python3.10/site-packages/dm_control/mujoco/engine.py", line 123, in __init__
    self._reload_from_data(data)
  File "/opt/conda/lib/python3.10/site-packages/dm_control/mujoco/engine.py", line 400, in _reload_from_data
    model=index.struct_indexer(self.model, 'mjmodel', axis_indexers),
  File "/opt/conda/lib/python3.10/site-packages/dm_control/mujoco/index.py", line 628, in struct_indexer
    attr = getattr(struct, field_name)
AttributeError: 'MjModel' object has no attribute 'eq_active'. Did you mean: 'eq_active0'?

ConvDecoder without SymlogDist?

It seems like that in the official jax repo the loss function for image is 'mse' and for other vector it is a symlog. Does it make sense to have the dist function default to MSE? Here is the config from the original jax implementation -

  decoder: {mlp_keys: '.*', cnn_keys: '.*', act: silu, norm: layer, mlp_layers: 5, mlp_units: 1024, cnn: resnet, cnn_depth: 96, cnn_blocks: 0, image_dist: mse, vector_dist: symlog_mse, inputs: [deter, stoch], resize: stride, winit: normal, fan: avg, outscale: 1.0, minres: 4, cnn_sigmoid: False}

will add this greate memory cell?

https://github.com/NeuromorphicComputing/STPN

Short-Term Plasticity Neurons Learning to Learn and Forget

Can you share training log of DMC envs?

Thank you for your work. Would you be willing to share the training logs such as how the reward loss as well as the image loss changes as the number of training steps increases and to what extent it eventually converges. When I used the algorithm on my own dataset, I noticed that the world model's reward loss looks larger, and I'm not sure if that makes sense. It looks like the reward loss converges around 0.5, but the range of the reward is -1 to 1. I think this is a relatively large prediction error.

Object-Centric Atari 2600 Environments

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments https://arxiv.org/abs/2306.08649
https://github.com/k4ntz/OC_Atari

Model sizes

Hey, thanks so much for the repo, I was looking for a streamlined pytorch implementation.

I'm looking to increase the model size and I'm wondering which config attributes correspond to which attributes from the Dreamer paper. Most are intuitive but some aren't so much.

Any help appreciated.

symbolic reason

https://github.com/IBM/neuro-vector-symbolic-architectures-raven

Question on the compute-efficiency of the implementation

Hi, thanks for a nice implementation of DreamerV3! This should be very useful. I have a quick question, how does it compare with DreamerV3 JAX implementation in terms of compute costs? Like, how long does it take to run a single run with this Torch implementation and JAX implementation? It should be very informative and helpful if you might perhaps be able to provide this information!

Why use manually-written Conv2dSame over torch.nn.Conv2d(padding='same')?

Hi, I really appreciate this work, just a small questions though.

In this project, the convolutional layers are from a Conv2dSame class that implements 'same' padding manually.
However, torch.nn.Conv2d does have 'same' padding mode built-in.
Why use the manually-written Conv2dSame over torch.nn.Conv2d(padding='same')?

Questions :)

Hi,

I am currently trying to implement DreamerV2/V3 in PyTorch as well and have encountered some challenges/ am stuck a little bit at various points. I was wondering if you would be open to help me with a few questions. If you can, I would really appreciate it. If you're available, you can reach me on Discord at Till#1064 (preferred) or via email at [email protected].

Best,
Till

Ignore

Wallclock comparison for the benchmarks

It would be helpful to have wallclock comparions for the benchmarks you had posted. I think Danijar's jax implementation uses jax scan heavily to make the imagination/rollouts loops efficient.

[Question] How to use parallel envs for crafter?

Great work with this repo!

I'm getting key error log_reward when using parallel envs for crafter.

Config:

crafter:
task: crafter_reward
step: 1e6
parallel: True
#/no eval
eval_episode_num: 0
eval_every: 1e4
action_repeat: 1
envs: 4
train_ratio: 512
video_pred_log: true
dyn_hidden: 1024
dyn_deter: 4096
units: 1024
reward_layers: 5
cont_layers: 5
value_layers: 5
actor_layers: 5
encoder: {mlp_keys: '$^', cnn_keys: 'image', cnn_depth: 96, mlp_layers: 5, mlp_units: 1024}
decoder: {mlp_keys: '$^', cnn_keys: 'image', cnn_depth: 96, mlp_layers: 5, mlp_units: 1024}
actor_dist: 'onehot'
imag_gradient: 'reinforce'

Result:

Logdir logdir/crafter
Create envs.
Prefill dataset (2500 steps).
[0] log_reward 1.1 / log_achievement_collect_coal 0.0 / log_achievement_collect_diamond 0.0 / log_achievement_collect_drink 0.0 / log_achievement_collect_iron 0.0 / log_achievement_collect_sapling 0.0 / log_achievement_collect_stone 0.0 / log_achievement_collect_wood 97.0 / log_achievement_defeat_skeleton 0.0 / log_achievement_defeat_zombie 0.0 / log_achievement_eat_cow 0.0 / log_achievement_eat_plant 0.0 / log_achievement_make_iron_pickaxe 0.0 / log_achievement_make_iron_sword 0.0 / log_achievement_make_stone_pickaxe 0.0 / log_achievement_make_stone_sword 0.0 / log_achievement_make_wood_pickaxe 0.0 / log_achievement_make_wood_sword 0.0 / log_achievement_place_furnace 0.0 / log_achievement_place_plant 0.0 / log_achievement_place_stone 0.0 / log_achievement_place_table 0.0 / log_achievement_wake_up 39.0 / dataset_size 584.0 / train_return 1.1 / train_length 146.0 / train_episodes 4.0
Traceback (most recent call last):
File "/content/drive/MyDrive/dreamerv3-torch/dreamer.py", line 396, in
main(parser.parse_args(remaining))
File "/content/drive/MyDrive/dreamerv3-torch/dreamer.py", line 309, in main
state = tools.simulate(
File "/content/drive/MyDrive/dreamerv3-torch/tools.py", line 167, in simulate
obs = {k: np.stack([o[k] for o in obs]) for k in obs[0]}
File "/content/drive/MyDrive/dreamerv3-torch/tools.py", line 167, in
obs = {k: np.stack([o[k] for o in obs]) for k in obs[0]}
File "/content/drive/MyDrive/dreamerv3-torch/tools.py", line 167, in
obs = {k: np.stack([o[k] for o in obs]) for k in obs[0]}
KeyError: 'log_reward'

[Question] How does episode sampling handle environment resets?

Hi, I'm confused about how episodes are sampled from the replay buffer, since episodes may have different lengths, and different episodes might be played in different environments due to resetting after a terminal state.

I still don't fully understand the sampling procedure, but from what I can tell based on sample_episodes(), it looks like episodes which end prematurely are padded with transitions from other episodes until you have sampled batch_size sequences of length batch_length.

For example, suppose batch_size=1 and batch_length=10, and the first episode you sample only has 3 transitions, e.g., (s_1, s_2), (s_2, s_3), (s_3, s_4). After the agent reaches terminal state s_4, the environment resets, and you obtain another episode of length 10, say, s'_1, ..., s'_10. Could we then train using a sequence such as (s_1, s_2), ..., (s_3, s_4), (s'_1, s'_2), ..., (s'_6, s'_7)? That is, is it okay to combine sequences from different episodes, even though the episodes may have been played in completely different environments?

Thanks for your time and for an amazing port of Dreamer!

Set agent to eval mode in simulate function

Hi! thank you for the great work!
In line 165 in tools.py (the simulate function), you call the agent as follows:
action, agent_state = agent(obs, done, agent_state)

I think this should be:
action, agent_state = agent(obs, done, agent_state, not is_eval)
right?

Thanks!

use python3.9

python3.9 install req.txt ok, no error

pip install opencv-python
pip install 'gym[atari]'
pip install memory-maze

jsikyoon/dreamer-torch#2

Mutable Default Argument

Hello!

The recursively_collect_optim_state_dict function has a mutable default argument. Due to the weird/magical/confusing ways that Python works it seems like the default argument itself gets updated causing all future calls to this function to have a pre-populated visited set.

Relevant stackoverflow link for this issue:
https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument?fbclid=IwAR1pR_luIkKqRzeLoDBYC38hUt3Qxnc_3wtnZfLAbEoAsiaFNuM1mFiJNbI

dreamerv3-torch/tools.py

Lines 965 to 967 in 2c7a81a

    
           def recursively_collect_optim_state_dict( 
        
               obj, path="", optimizers_state_dicts=None, visited=set() 
        
           ):

I propose the following change to avoid this issue:

def recursively_collect_optim_state_dict(
    obj, path="", optimizers_state_dicts=None, visited=None
):
    if visited is None:
        visited = set()

'MjModel' object has no attribute 'eq_active' error caused by mujuco version

Running the example code after installing the dependencies may raise the corresponding error as follows:

Traceback (most recent call last):
  File "/home/pc/PlanInDream/dreamerv3-torch/dreamer.py", line 392, in <module>
    main(parser.parse_args(remaining))
  File "/home/pc/PlanInDream/dreamerv3-torch/dreamer.py", line 266, in main
    train_envs = [make("train") for _ in range(config.envs)]
  File "/home/pc/PlanInDream/dreamerv3-torch/dreamer.py", line 266, in <listcomp>
    train_envs = [make("train") for _ in range(config.envs)]
  File "/home/pc/PlanInDream/dreamerv3-torch/dreamer.py", line 265, in <lambda>
    make = lambda mode: make_env(config, mode)
  File "/home/pc/PlanInDream/dreamerv3-torch/dreamer.py", line 179, in make_env
    env = dmc.DeepMindControl(
  File "/home/pc/PlanInDream/dreamerv3-torch/envs/dmc.py", line 15, in __init__
    self._env = suite.load(
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/suite/__init__.py", line 113, in load
    return build_environment(domain_name, task_name, task_kwargs,
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/suite/__init__.py", line 148, in build_environment
    env = domain.SUITE[task_name](**task_kwargs)
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/suite/walker.py", line 62, in walk
    physics = Physics.from_xml_string(*get_model_and_assets())
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 436, in from_xml_string
    return cls.from_model(model)
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 419, in from_model
    return cls(data)
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 123, in __init__
    self._reload_from_data(data)
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/mujoco/engine.py", line 400, in _reload_from_data
    model=index.struct_indexer(self.model, 'mjmodel', axis_indexers),
  File "/home/pc/anaconda3/envs/PID/lib/python3.9/site-packages/dm_control/mujoco/index.py", line 628, in struct_indexer
    attr = getattr(struct, field_name)
AttributeError: 'MjModel' object has no attribute 'eq_active'

It is an error raised by mujoco version, which could be solved by degrading the version of the mujoco (like 2.3.4)

Dynamics loss should be -adv instead of adv?

dreamerv3-torch/models.py

Line 443 in 43e1b2a

actor_target = adv

Referencing the dreamerv3 official implementation

https://github.com/danijar/dreamerv3/blob/main/dreamerv3/agent.py#L295

Resuming training resets optimizer parameters

Hi, first of all great job with this impleentation, it is truly useful!
I wanted to ask about the procedure to resume a training job. I see that the model is saved after each "epoch" and then the state dictionary is loaded when resuming the training if we saved the model. Nonetheless, it seems that the Optimizer of each of the models is not actually saved and loaded right? This forces a reset on the optimizer every time we resume a training job, which kind of messes up the training.

Please let me know if I am missing something and thank you in advance!

Questions

Hi,

I'm curious that is there some modifications in this repo compared to the tensorflow version, cause I find this repo outperforms the original in some tasks such as quadruped-walk and acrobot-swingup. Is this repo tweaked for hyperparameters or is there a slight difference in model structure?

Thanks in advance.

Training error: change config.size to [256, 256] result in error

Thank you for your work. I met the following errors when I changed the config size to [256, 256], could you help me?

Start evaluation. 
Traceback (most recent call last):
  File "dreamer.py", line 445, in <module>
    main(parser.parse_args(remaining))
  File "dreamer.py", line 408, in main
    tools.simulate(eval_policy, eval_envs, episodes=config.eval_episode_num)
  File "/disk/users/jk639/Jing_ws/rl_rule_exceptions/dreamerv3-torch/tools.py", line 147, in simulate
    action, agent_state = agent(obs, done, agent_state, reward)
  File "dreamer.py", line 103, in __call__
    policy_output, state = self._policy(obs, state, training)
  File "dreamer.py", line 121, in _policy
    latent, _ = self._wm.dynamics.obs_step(
  File "/disk/users/jk639/Jing_ws/rl_rule_exceptions/dreamerv3-torch/networks.py", line 229, in obs_step
    x = self._obs_out_layers(x)
  File "/disk/no_backup/jk639/env_dreamer3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/disk/no_backup/jk639/env_dreamer3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/disk/no_backup/jk639/env_dreamer3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/disk/no_backup/jk639/env_dreamer3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x66048 and 4608x512)

Question about reversed() during deleting cache

Thank you for the great work!

From my point of view, in the function erase_over_episodes, the oldest episodes will be erased.
However, when deleting the episode, the code uses reversed(), which may delete the latest episodes?

def erase_over_episodes(cache, dataset_size):
    step_in_dataset = 0
    for key, ep in reversed(sorted(cache.items(), key=lambda x: x[0])):
        if (
            not dataset_size
            or step_in_dataset + (len(ep["reward"]) - 1) <= dataset_size
        ):
            step_in_dataset += len(ep["reward"]) - 1
        else:
            del cache[key]
    return step_in_datas

Does the code have a bug or did I make any mistake? Thank you!

More inputs other than image

Hi, how can I give more input information to the agent, like some scalar input? Because I found different input information in the original paper (below), how can I achieve this and is it possible?

Shifted value for return computation

Hi,

Thanks for the implementation!

I noticed a small difference with the original implementation while looking at the code.
When computing the target returns, the official DreamerV3 implementation uses the classic bellman equation with rewards[1:], discounts[1:] etc such that Value(t) = reward(t+1) + discount(t+1) * Value(t+1). here

However, in your implementation I noticed that you use the DreamerV2 way of shifting the value with Value(t) = reward(t) + discount(t) * Value(t+1) using rewards[:-1], discounts[:-1] etc. here where reward(t) is the reward of the current state (same as the value) instead of being the one of next state.

So I wanted to know if you also experimented using the DreamerV3 way of predicting next rewards in the value function.
Both approaches seams to work similarly in practice

Also, I suppose the normal weight init should be:
nn.init.trunc_normal_(m.weight.data, mean=0.0, std=std, a=-2.0*std, b=2.0*std)
instead of:
nn.init.trunc_normal_(m.weight.data, mean=0.0, std=std, a=-2.0, b=2.0)
to correctly trunc the weights values outside 2 standard deviations here.

Best,
Maxime

Any plan to test on MineRL?

Thanks for your great contribution to this reimplementation. Is there any plan to test this implementation on MineRL? Also is there any plan to release the results of crafter and memory-maze compared with the JAX version?

[Feature Request] Add “set seed everywhere” feature to ensure reproducibility of experiments

Hi,
Thanks very much for this implementation!
Currently, the project does not have a consistent way of setting the random seed for torch and other random number generators. This can lead to different results for the same experiment. To address this issue, I propose to add a feature that allows the user to set the seed for all relevant modules and libraries at the beginning of the experiment. Like this : https://github.com/thuml/Flowformer/blob/bc8c4d22b48dde62519f086ba6a9f22463277741/Flowformer_RL/utils.py#L16
This way, we can ensure that the results are reproducible and reliable.
I hope this suggestion will be helpful for the project.

Logger: (10000 steps).
Simulate agent.
Encoder CNN shapes: {'image': (64, 64, 3)}
Encoder MLP shapes: {}
Decoder CNN shapes: {'image': (64, 64, 3)}
Decoder MLP shapes: {}
Optimizer model_opt has 15686787 variables.
Traceback (most recent call last):
  File "/local/home/argesp/dreamerv3-torch/dreamer.py", line 365, in <module>
    main(parser.parse_args(remaining))
  File "/local/home/argesp/dreamerv3-torch/dreamer.py", line 287, in main
    agent = Dreamer(
  File "/local/home/argesp/dreamerv3-torch/dreamer.py", line 45, in __init__
    self._task_behavior = models.ImagBehavior(config, self._wm)
  File "/local/home/argesp/dreamerv3-torch/models.py", line 223, in __init__
    self.actor = networks.MLP(
  File "/local/home/argesp/dreamerv3-torch/networks.py", line 654, in __init__
    assert dist in ("tanh_normal", "normal", "trunc_normal", "huber"), dist
AssertionError: onehot

By replacing the hardcoded "learned" parameter, the code seem to run, but I do not know if the training happens as intended.

self.actor = networks.MLP(
            feat_size,
            (config.num_actions,),
            config.actor["layers"],
            config.units,
            config.act,
            config.norm,
            config.actor["dist"],
            # "learned",
            1.0,
            config.actor["min_std"],
            config.actor["max_std"],
            absmax=1.0,
            temp=config.actor["temp"],
            unimix_ratio=config.actor["unimix_ratio"],
            outscale=config.actor["outscale"],
            name="Actor",
        )

Could you tell me if I am getting the intended behaviour with this fix?

	def recursively_collect_optim_state_dict(
	obj, path="", optimizers_state_dicts=None, visited=set()
	):