Giter Site home page Giter Site logo

danijar / director Goto Github PK

View Code? Open in Web Editor NEW
80.0 3.0 20.0 6.25 MB

Deep Hierarchical Planning from Pixels

Home Page: https://danijar.com/director/

Dockerfile 0.51% Python 99.34% Shell 0.16%
algorithms decision-making deep-learning hierarchical-reinforcement-learning planning reinforcement-learning sparse-rewards world-models

director's Introduction

Hi there ๐Ÿ‘‹

๐Ÿค–ย  AI Algorithms

dreamerv3 Mastering Diverse Domains through World Models
daydreamer DayDreamer: World Models for Physical Robot Learning
director Deep Hierarchical Planning from Pixels
dreamerv2* Mastering Atari with Discrete World Models
dreamer* Dream to Control: Learning Behaviors by Latent Imagination
planet* Learning Latent Dynamics for Planning from Pixels
batch-ppo* Efficient Batched Reinforcement Learning in TensorFlow

๐Ÿ“ˆย  Benchmarks

crafter Benchmarking the Spectrum of Agent Capabilities
diamond_env Standardized Minecraft Diamond task for reinforcement learning

๐Ÿ› ๏ธย  Tools

zerofun Remote function calls for array data using ZMQ
elements Building blocks for productive research
ninjax General Modules for JAX
handout Turn Python scripts into handouts with Markdown and figures

ย * Archived

director's People

Contributors

danijar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

director's Issues

Visualizing decoded skills in hierarchy.py

Hi!

I've been trying to directly visualize the goal the manager generates, but I can't figure out how to take the one-hot skill grid and turn it into an image of the scene. I can visualize the latent properly using the WorldModel's decode head, but is the 1024 entry vector produced by Hierarchy.py::dec(skill) viewable the same way the decoded latent vector is? I'm using pong so I can train/run locally.

director_pong

Thanks for publishing this repo, it's great work!

How to reproduce fig. A.1

Hi Danijar,

Reading the appendix of Director I couldn't understand what you mean by providing the reward to the worker. Is there a config I can use to do that? In the description of the figure you write: " When additionally providing task reward to the worker", does it mean that you change the context variable defined in hierarchy.py and include the reward as well? Also, if it works so well, why don't you do that by default? Have you tried to do the same for other tasks as well (i.e. Ant Mazes)?

Thank you so much!

Bests,
Cristian

Recreation loss for goal autoencoder

Hi,

The goal autoencoder's recreation loss in your code is the negative log probability of the world model's representation under the goal decoder's distribution:

rec = -dec.log_prob(tf.stop_gradient(goal))

But the paper lists it as the MSE of the decoded state and the original state:

np.sqrt(dec(feat.detach()) - feat)

Is the former a better measure of recreation loss than the latter?

Also, you only use the deterministic part of the RSSM as the representation, for training the goal autoencoder (in hierarchy::train_vae_replay). Why only use the deterministic part and not include the stochastic part?

Apologies if you've written about this somewhere, and thank you for making your extremely interesting work public.

Slow training or OOM Error on single gpu

Hi Danijar, thank you so much for helping me running the code.
It took some time to run different tasks, in order to provide more information.
So, I think there are two problems in the code at the moment.

  1. For vision tasks, VRAM memory gets exhausted right after collecting pre-training samples, even on GPU with 16/24GB VRAM.
  2. Even for tasks that do not require vision (such as dmc_proprio), training is about 20x slower than expected. To be more specific about this, I found out that "fps" value in the logger is between 2 to 5. I have checked other parts of the training code and measured time to see if there is any bottlenecks in the code, but I was not able to find one.

Here, I will attach the outputs of each task. (Each of them has a link to gist)

  1. dmc_vision / dmc_walker_walk: RESOURCE_EXHAUSTED Error after collecting pre-train samples

  2. dmc_proprio / dmc_walker_walk: In line 821, you can see that fps is 3.1. It took about 15 hours to collect 200k steps. (+ In line 809, train/duration is 3220.91, which means each train_step is taking approx. 50 minutes?)

  3. loconav / loconav_ant_maze_m: RESOURCE_EXHAUSTED Error after collecting pre-train samples

I also tried changing number of envs to 1 or changing the batch size to 1, but it did not make a difference. It would be amazing if you could help me figure out what causes this problem. Thank you so much.


Below is the list of python packages I installed on my virtual env (python 3.10).

Package                            Version
---------------------------------- ---------
absl-py                            1.4.0
astunparse                         1.6.3
atari-py                           0.2.9
backports.shutil-get-terminal-size 1.0.0
bcrypt                             4.0.1
cachetools                         5.3.0
certifi                            2022.12.7
cffi                               1.15.1
charset-normalizer                 3.1.0
cloudpickle                        1.6.0
colorama                           0.4.6
contourpy                          1.0.7
crafter                            1.8.0
cryptography                       40.0.1
cycler                             0.11.0
decorator                          5.1.1
dm-control                         1.0.11
dm-env                             1.6
dm-sonnet                          2.0.1
dm-tree                            0.1.8
flatbuffers                        23.3.3
fonttools                          4.39.3
gast                               0.4.0
glfw                               2.5.9
google-auth                        2.17.1
google-auth-oauthlib               0.4.6
google-pasta                       0.2.0
grpcio                             1.53.0
gym                                0.19.0
gym-minigrid                       1.0.3
h5py                               3.8.0
idna                               3.4
imageio                            2.27.0
keras                              2.8.0
Keras-Preprocessing                1.1.2
kiwisolver                         1.4.4
labmaze                            1.0.6
libclang                           16.0.0
llvmlite                           0.39.1
lxml                               4.9.2
Markdown                           3.4.3
markdown-it-py                     2.2.0
MarkupSafe                         2.1.2
matplotlib                         3.7.1
mdurl                              0.1.2
mujoco                             2.3.3
numba                              0.56.4
numpy                              1.23.5
nvidia-cublas-cu12                 12.1.0.26
nvidia-cuda-runtime-cu12           12.1.55
nvidia-cudnn-cu12                  8.9.0.131
oauthlib                           3.2.2
opencv-python                      4.7.0.72
opensimplex                        0.4.4
opt-einsum                         3.3.0
packaging                          23.0
paramiko                           3.1.0
Pillow                             9.5.0
pip                                22.0.2
protobuf                           3.19.6
pyasn1                             0.4.8
pyasn1-modules                     0.2.8
pycparser                          2.21
Pygments                           2.14.0
PyNaCl                             1.5.0
PyOpenGL                           3.1.6
pyparsing                          3.0.9
python-dateutil                    2.8.2
reprint                            0.6.0
requests                           2.28.2
requests-oauthlib                  1.3.1
rich                               13.3.3
rsa                                4.9
ruamel.yaml                        0.17.21
ruamel.yaml.clib                   0.2.7
scipy                              1.10.1
setuptools                         59.6.0
six                                1.16.0
tabulate                           0.9.0
tensorboard                        2.8.0
tensorboard-data-server            0.6.1
tensorboard-plugin-wit             1.8.1
tensorflow                         2.8.3
tensorflow-estimator               2.8.0
tensorflow-io-gcs-filesystem       0.32.0
tensorflow-probability             0.16.0
tensorrt                           8.6.0
termcolor                          2.2.0
tqdm                               4.65.0
typing_extensions                  4.5.0
urllib3                            1.26.15
Werkzeug                           2.2.3
wheel                              0.37.1
wrapt                              1.15.0

"multi_gpu" and "multi_worker" configurations not working

Hi, first of all, thank you so much for sharing such amazing work & code.
I really loved the idea and the results of this paper, and am trying to apply some ideas on top of this.
However, I have faced some problems. I trained the model for dmc_vision dmc_walker_walk task using GPU with 16GB and 24GB VRAM, but received an out-of-memory error. I changed the batch size to 1, but it did not help fixing the problem.
Also, when I ran this on GPU with smaller VRAM (like 8GB or 12GB), I noticed that training process gets stuck after 8008 steps (about 3-5 minutes after training starts). In the paper, it says the training can be done in one day using V100 GPU which has 32GB VRAM. I was wondering if I need a GPU with larger VRAM to train this model. I could infer that this is the case because running dmc_proprio did not have any problem. I think using a model with CNN causes this problem. I was wondering if there is a way to run training on a GPU with smaller VRAM.

Assuming that lack of VRAM is the problem, I also tried to use multi-gpus, and tried "multi_gpu" and "multi_worker" configurations in tfagent.py, but now I am getting a new error as follows:

metrics.update(self.model_opt(model_tape, model_loss, modules))
    File
"/vol/bitbucket/jk3417/explainable-mbhrl/embodied/agents/director/tfutils.py",
ine 246, in __call__  *
        self._opt.apply_gradients(
    File
"/vol/bitbucket/xmbhrl/lib/python3.10/site-packages/keras/optimizer_v2/op
timizer_v2.py", line 671, in apply_gradients
        return tf.__internal__.distribute.interim.maybe_merge_call(
RuntimeError: `merge_call` called while defining a new graph or a
tf.function. This can often happen if the function `fn` passed to
`strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function`
contains a synchronization point, such as aggregating gradients (e.g,
optimizer.apply_gradients), or if the function `fn` uses a control flow
statement which contains a synchronization point in the body. Such behaviors are
not yet supported. Instead, please avoid nested `tf.function`s or control flow
statements that may potentially cross a synchronization boundary, for example,
wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a
`tf.function` or move the control flow out of `fn`. If you are subclassing a
`tf.keras.Model`, please avoid decorating overridden methods `test_step` and
`train_step` in `tf.function`.

There's a high chance that I am using a wrong tensorflow version, so please do understand if I am using wrong dependencies.
I checked out the dockerfile and saw that it is using tensorflow 2.8 or 2.9, but when using 2.9, JIT compilation failed.
Would be amazing if someone can share if they're also facing similar issues or know the solution to this problem. Thank you so much.

I am using

  • Python: 3.10.6
  • Tensorflow: 2.8.2
  • CUDA: 11.4 with CUDNN 8.2.4

How to "run" agent after training or visualize results

Hi Danijar,
I know this might sound like a dumb question, but after I trained Director on a few tasks, I'd like to see how it performs by either rendering an environment while running the agent or simply running headless and looking at some plots/board on several performance metrics.
Just FYI I used the docker container to train. (which BTW I had to update it for me, as it actually requires Tensorflow 2.11.0rc1-gpu, libgles2-mesa-dev, upgrade PyOpenGL, matplotlib, and a few more changes to run smoothly)
Thanks for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.