danijar / director Goto Github PK

Deep Hierarchical Planning from Pixels

Home Page: https://danijar.com/director/

Dockerfile 0.51% Python 99.34% Shell 0.16%

algorithms decision-making deep-learning hierarchical-reinforcement-learning planning reinforcement-learning sparse-rewards world-models

director's Introduction

Hi there 👋

🤖 AI Algorithms


dreamerv3		Mastering Diverse Domains through World Models
daydreamer		DayDreamer: World Models for Physical Robot Learning
director		Deep Hierarchical Planning from Pixels
dreamerv2*		Mastering Atari with Discrete World Models
dreamer*		Dream to Control: Learning Behaviors by Latent Imagination
planet*		Learning Latent Dynamics for Planning from Pixels
batch-ppo*		Efficient Batched Reinforcement Learning in TensorFlow

📈 Benchmarks


crafter		Benchmarking the Spectrum of Agent Capabilities
diamond_env		Standardized Minecraft Diamond task for reinforcement learning

🛠️ Tools


zerofun		Remote function calls for array data using ZMQ
elements		Building blocks for productive research
ninjax		General Modules for JAX
handout		Turn Python scripts into handouts with Markdown and figures

* Archived

director's People

Contributors

Stargazers

Watchers

Forkers

ipsec hany606 lpjiang97 godnpeter intuinno weigelb1988 jdubkim cmeo97 notyetcontent frankroeder agalashov tomosii syn8510 rafarodsa geronest shashank879 yeahoon-kim mirceatlx zhang-ll

director's Issues

How to do inference with trained model??

@danijar
Would you please tell me how to do inference with model trained by "train_with_viz"??

I would be appreciate it if you could answer my questions.

Visualizing decoded skills in hierarchy.py

Hi!

I've been trying to directly visualize the goal the manager generates, but I can't figure out how to take the one-hot skill grid and turn it into an image of the scene. I can visualize the latent properly using the WorldModel's decode head, but is the 1024 entry vector produced by Hierarchy.py::dec(skill) viewable the same way the decoded latent vector is? I'm using pong so I can train/run locally.

Thanks for publishing this repo, it's great work!

Can I train in the environment of minecraft?

I see that the code include the part of minecraft, can I train on this environment? Thanks a lot!

How to reproduce fig. A.1

Hi Danijar,

Reading the appendix of Director I couldn't understand what you mean by providing the reward to the worker. Is there a config I can use to do that? In the description of the figure you write: " When additionally providing task reward to the worker", does it mean that you change the context variable defined in hierarchy.py and include the reward as well? Also, if it works so well, why don't you do that by default? Have you tried to do the same for other tasks as well (i.e. Ant Mazes)?

Thank you so much!

Bests,
Cristian

Recreation loss for goal autoencoder

Hi,

The goal autoencoder's recreation loss in your code is the negative log probability of the world model's representation under the goal decoder's distribution:

rec = -dec.log_prob(tf.stop_gradient(goal))

But the paper lists it as the MSE of the decoded state and the original state:

np.sqrt(dec(feat.detach()) - feat)

Is the former a better measure of recreation loss than the latter?

Also, you only use the deterministic part of the RSSM as the representation, for training the goal autoencoder (in hierarchy::train_vae_replay). Why only use the deterministic part and not include the stochastic part?

Apologies if you've written about this somewhere, and thank you for making your extremely interesting work public.

Slow training or OOM Error on single gpu

Hi Danijar, thank you so much for helping me running the code.
It took some time to run different tasks, in order to provide more information.
So, I think there are two problems in the code at the moment.

For vision tasks, VRAM memory gets exhausted right after collecting pre-training samples, even on GPU with 16/24GB VRAM.
Even for tasks that do not require vision (such as dmc_proprio), training is about 20x slower than expected. To be more specific about this, I found out that "fps" value in the logger is between 2 to 5. I have checked other parts of the training code and measured time to see if there is any bottlenecks in the code, but I was not able to find one.

Here, I will attach the outputs of each task. (Each of them has a link to gist)

dmc_vision / dmc_walker_walk: RESOURCE_EXHAUSTED Error after collecting pre-train samples
dmc_proprio / dmc_walker_walk: In line 821, you can see that fps is 3.1. It took about 15 hours to collect 200k steps. (+ In line 809, train/duration is 3220.91, which means each train_step is taking approx. 50 minutes?)
loconav / loconav_ant_maze_m: RESOURCE_EXHAUSTED Error after collecting pre-train samples

I also tried changing number of envs to 1 or changing the batch size to 1, but it did not make a difference. It would be amazing if you could help me figure out what causes this problem. Thank you so much.

Below is the list of python packages I installed on my virtual env (python 3.10).

Package                            Version
---------------------------------- ---------
absl-py                            1.4.0
astunparse                         1.6.3
atari-py                           0.2.9
backports.shutil-get-terminal-size 1.0.0
bcrypt                             4.0.1
cachetools                         5.3.0
certifi                            2022.12.7
cffi                               1.15.1
charset-normalizer                 3.1.0
cloudpickle                        1.6.0
colorama                           0.4.6
contourpy                          1.0.7
crafter                            1.8.0
cryptography                       40.0.1
cycler                             0.11.0
decorator                          5.1.1
dm-control                         1.0.11
dm-env                             1.6
dm-sonnet                          2.0.1
dm-tree                            0.1.8
flatbuffers                        23.3.3
fonttools                          4.39.3
gast                               0.4.0
glfw                               2.5.9
google-auth                        2.17.1
google-auth-oauthlib               0.4.6
google-pasta                       0.2.0
grpcio                             1.53.0
gym                                0.19.0
gym-minigrid                       1.0.3
h5py                               3.8.0
idna                               3.4
imageio                            2.27.0
keras                              2.8.0
Keras-Preprocessing                1.1.2
kiwisolver                         1.4.4
labmaze                            1.0.6
libclang                           16.0.0
llvmlite                           0.39.1
lxml                               4.9.2
Markdown                           3.4.3
markdown-it-py                     2.2.0
MarkupSafe                         2.1.2
matplotlib                         3.7.1
mdurl                              0.1.2
mujoco                             2.3.3
numba                              0.56.4
numpy                              1.23.5
nvidia-cublas-cu12                 12.1.0.26
nvidia-cuda-runtime-cu12           12.1.55
nvidia-cudnn-cu12                  8.9.0.131
oauthlib                           3.2.2
opencv-python                      4.7.0.72
opensimplex                        0.4.4
opt-einsum                         3.3.0
packaging                          23.0
paramiko                           3.1.0
Pillow                             9.5.0
pip                                22.0.2
protobuf                           3.19.6
pyasn1                             0.4.8
pyasn1-modules                     0.2.8
pycparser                          2.21
Pygments                           2.14.0
PyNaCl                             1.5.0
PyOpenGL                           3.1.6
pyparsing                          3.0.9
python-dateutil                    2.8.2
reprint                            0.6.0
requests                           2.28.2
requests-oauthlib                  1.3.1
rich                               13.3.3
rsa                                4.9
ruamel.yaml                        0.17.21
ruamel.yaml.clib                   0.2.7
scipy                              1.10.1
setuptools                         59.6.0
six                                1.16.0
tabulate                           0.9.0
tensorboard                        2.8.0
tensorboard-data-server            0.6.1
tensorboard-plugin-wit             1.8.1
tensorflow                         2.8.3
tensorflow-estimator               2.8.0
tensorflow-io-gcs-filesystem       0.32.0
tensorflow-probability             0.16.0
tensorrt                           8.6.0
termcolor                          2.2.0
tqdm                               4.65.0
typing_extensions                  4.5.0
urllib3                            1.26.15
Werkzeug                           2.2.3
wheel                              0.37.1
wrapt                              1.15.0

"multi_gpu" and "multi_worker" configurations not working

Hi, first of all, thank you so much for sharing such amazing work & code.
I really loved the idea and the results of this paper, and am trying to apply some ideas on top of this.
However, I have faced some problems. I trained the model for dmc_vision dmc_walker_walk task using GPU with 16GB and 24GB VRAM, but received an out-of-memory error. I changed the batch size to 1, but it did not help fixing the problem.
Also, when I ran this on GPU with smaller VRAM (like 8GB or 12GB), I noticed that training process gets stuck after 8008 steps (about 3-5 minutes after training starts). In the paper, it says the training can be done in one day using V100 GPU which has 32GB VRAM. I was wondering if I need a GPU with larger VRAM to train this model. I could infer that this is the case because running dmc_proprio did not have any problem. I think using a model with CNN causes this problem. I was wondering if there is a way to run training on a GPU with smaller VRAM.

Assuming that lack of VRAM is the problem, I also tried to use multi-gpus, and tried "multi_gpu" and "multi_worker" configurations in tfagent.py, but now I am getting a new error as follows:

metrics.update(self.model_opt(model_tape, model_loss, modules))
    File
"/vol/bitbucket/jk3417/explainable-mbhrl/embodied/agents/director/tfutils.py",
ine 246, in __call__  *
        self._opt.apply_gradients(
    File
"/vol/bitbucket/xmbhrl/lib/python3.10/site-packages/keras/optimizer_v2/op
timizer_v2.py", line 671, in apply_gradients
        return tf.__internal__.distribute.interim.maybe_merge_call(
RuntimeError: `merge_call` called while defining a new graph or a
tf.function. This can often happen if the function `fn` passed to
`strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function`
contains a synchronization point, such as aggregating gradients (e.g,
optimizer.apply_gradients), or if the function `fn` uses a control flow
statement which contains a synchronization point in the body. Such behaviors are
not yet supported. Instead, please avoid nested `tf.function`s or control flow
statements that may potentially cross a synchronization boundary, for example,
wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a
`tf.function` or move the control flow out of `fn`. If you are subclassing a
`tf.keras.Model`, please avoid decorating overridden methods `test_step` and
`train_step` in `tf.function`.

There's a high chance that I am using a wrong tensorflow version, so please do understand if I am using wrong dependencies.
I checked out the dockerfile and saw that it is using tensorflow 2.8 or 2.9, but when using 2.9, JIT compilation failed.
Would be amazing if someone can share if they're also facing similar issues or know the solution to this problem. Thank you so much.

I am using

Python: 3.10.6
Tensorflow: 2.8.2
CUDA: 11.4 with CUDNN 8.2.4

How to "run" agent after training or visualize results

Hi Danijar,
I know this might sound like a dumb question, but after I trained Director on a few tasks, I'd like to see how it performs by either rendering an environment while running the agent or simply running headless and looking at some plots/board on several performance metrics.
Just FYI I used the docker container to train. (which BTW I had to update it for me, as it actually requires Tensorflow 2.11.0rc1-gpu, libgles2-mesa-dev, upgrade PyOpenGL, matplotlib, and a few more changes to run smoothly)
Thanks for your help.