Giter Site home page Giter Site logo

magi's Introduction

Magi RL library in JAX

Installation | Agents | Examples | Contributing | Documentation

pytest

Note Future development of JAX agents in Magi have moved to Corax

Magi is a RL library in JAX that is fully compatible with Acme.

In addition to the features provided by Acme, Magi offers implementation of RL agents that are not found in the Acme repository as well as providing useful tools for integrating experiment logging services such as WandB.

Note: Magi is in alpha development so expect breaking changes!

Magi currently depends on HEAD version of dm-acme instead of the latest release version on PyPI which is fairly old.

Installation

  1. Create a new Python virtual environment
python3 -m venv venv
source venv/bin/activate
  1. Install dependencies with the following commands.
pip install -U pip setuptools wheel
# Magi depends on latest version of dm-acme.
# The dependencies in setup.py are abstract which allows you to pin
# a specific version of dm-acme.
# The following command installs the latest version of dm-acme
pip install 'git+https://github.com/deepmind/acme.git#egg=dm-acme[jax,tf,examples]'
# Install magi in editable mode, with additional dependencies.
# In case you need to run examples on GPU, you should install the
# GPU version of JAX with a command like the following
pip install 'jax[cuda]<0.4' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install -e '.[jax]'

The base installation for magi does not list TensorFlow/JAX as a dependency. However, note that JAX requires platform-specific installation (CPU/GPU and CUDA versions). Furthermore, Acme depends on Reverb and LaunchPad which requires them to be pinned against specific versions of TensorFlow. This should be handled if you use install dm-acme with [jax,tf] extras. However, you can also use install with different versions of TensorFlow/Reverb/Launchpad. In that case, you should omit the extras and find compatible versions and pin those versions accordingly.

If for some reason installation fails, first check out GitHub Actions badge to see if this fails on the latest CI run. If the CI is successful, then it's likely that there are some issues to setting up your own environment. Refer to .github/workflows/ci.yaml as the official source for how to set up the environment.

Agents

magi includes popular RL algorithm implementation such as SAC, DrQ, SAC-AE and PETS. Refer to magi/agents for a full list of agents.

Examples

Check out examples where we include examples of using our RL agents on popular benchmark tasks.

Testing

On Linux, you can run tests with

nox test

Contributing

Refer to CONTRIBUTING.md.

Acknowledgements

Magi is inspired by many of the open-source RL projects out there. Here is a (non-exhaustive) list of related libraries and packages that Magi references:

License

Apache License 2.0

Citation

If you use Magi in your work, please cite us according to the CITATION file. You may learn more about the CITATION file from here.

magi's People

Contributors

ethanluoyc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

magi's Issues

Improve Wandb logging

  • run wandb init in the logger instead of outside
  • configure wandb to use step_key for step, similar to the tfsummary logger
  • handle wandb finish in close function of logger

Moving the cost function inside the OptimizerBasedActor

Given that different OptimizerBasedActor subclasses may need different cost functions (e.g. scalar-valued and vector-valued), maybe we should try making the cost function a method of the OptimizerBasedActor in magi/agents/pets/acting.py. This way users can overwrite this method to implement their custom cost functions.

TD3

Implement TD3 in Magi

PETS

We would love to have a good implementation of PETS in the paper.

Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine, Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models, NIPS 2018, arxiv:1805.12114

I will put up a WIP PR for tracking the status of a proof of concept implementation, we can then iterate over the design, and test it against the environments used by the original paper.

CQL performance

While I have previously benchmarked CQL on the locomotion tasks, it seems that the performance is not great on antmaze, figure out the reason.

Pin dependencies.

Our requirements are currently very flaky due to the rapid development of Acme. Let's pin the version of the corresponding nightly libraries from reverb, tf, tfp.

Provide benchmark results

We should provide some benchmark results for SAC/SAC-AE/DrQ as well as IMPALA performances on some benchmark suite.

For SAC/SAC-AE/DrQ, we can consider comparing with the PlaNet benchmark suite. For IMPALA, I am not inclined to do Atari but we can. The IMPALA architecture has been tested to work on gym_minigrid.

Distributed and Single Process IMPALA

This is a direct port from the implementation in acme.agents.jax.impala, but allows the sequences to be truncated. In addition to that, a distributed agent is added.

Adopt a configuration library

Our current examples use different approaches for configuring the RL agents. We can make things more scalable and easier to maintain by using a library for writing configuration.

There were two options I considered. Hydra and ml_collections

  • ml_collections, https://github.com/google/ml_collections, a configuration library from Google. This is the library adopted by a lot of Google Research + DM projects. The configuration files are just Python modules with a get_config function. It is very non-intrusive and requires minimal changes to the rest of the codebase other than the entry point. Compared to Hydra, it does not provide sweeps or multi-run like functionality out of the box. However, this should be easy to achieve with some metaprogramming that generates the sweeps.
  • hydra, http://hydra.cc/docs/, this is the configuration used by FAIR projects. Hydra is nice since it provides a lot of useful utilities out of the box (e.g., sweeps). Configs are in YAML. I initially thought Hydra would be a good fit. However, after adopting for some of my personal projects, I found tailoring it to specific needs becomes difficult.

I suppose that we will incrementally move to ml_collections for writing the configuration in the examples. Users of Magi can easily opt-out from ml_collections if they prefer their own approach to configuration.

Moving on, we should also start thinking about how to enable users to easily sweep with different hyperparameters. I had some experience with some closed-source approaches to doing this and would like to transfer some of that experience to develop a (sub)package for it.

Use Poetry for dependency management

We currently have a hand-rolled setup for managing dependencies, which is really a maintenance burden.

Let's use poetry for dependency management.

https://python-poetry.org/

Some notes may be added for people to set up Poetry on UCL computers. Also, we need to investigate how people can opt out of Poetry if they want to. This should be fine for more recent versions of pip since Poetry can be used only as the build backend.

Conservative Q Learning

Let's use this issue to track the progress of a CQL learner implementation.

https://arxiv.org/abs/2006.04779

The official implementation seems to be

https://github.com/aviralkumar2907/CQL

which includes experiments for both discrete and continuous settings. I think as a first step we should try to implement a continuous control CQL implementation so that we can compare with the other offline RL algorithms we have right now (aka. CRR and TD3-BC).

Support MacOS

Currently, the agents do not work on MacOS because of the dependency on dm-reverb, which currently only supports Linux.

Fortunately, there is an open PR in reverb google-deepmind/reverb#24 that aims to add support for MacOS and it looks like it is close to being merged. We can publicize support for MacOS when that is merged.

Error in running tests on machines with GPU

__________________________ ERROR collecting magi/agents/drq/agent_test.py ___________________________
magi/agents/drq/agent_test.py:7: in <module>
    from magi.agents.drq import networks
magi/agents/drq/networks.py:9: in <module>
    orthogonal_init = hk.initializers.Orthogonal(scale=jnp.sqrt(2.0))
venv/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py:373: in <lambda>
    fn = lambda x: lax_fn(*_promote_args_inexact(numpy_fn.__name__, x))
venv/lib/python3.8/site-packages/jax/_src/lax/lax.py:312: in sqrt
    return sqrt_p.bind(x)
venv/lib/python3.8/site-packages/jax/core.py:259: in bind
    out = top_trace.process_primitive(self, tracers, params)
venv/lib/python3.8/site-packages/jax/core.py:597: in process_primitive
    return primitive.impl(*tracers, **params)
venv/lib/python3.8/site-packages/jax/interpreters/xla.py:230: in apply_primitive
    compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params)
venv/lib/python3.8/site-packages/jax/_src/util.py:197: in wrapper
    return cached(bool(config.x64_enabled), *args, **kwargs)
venv/lib/python3.8/site-packages/jax/_src/util.py:190: in cached
    return f(*args, **kwargs)
venv/lib/python3.8/site-packages/jax/interpreters/xla.py:280: in xla_primitive_callable
    compiled = backend_compile(backend, built_c, options)
venv/lib/python3.8/site-packages/jax/interpreters/xla.py:344: in backend_compile
    return backend.compile(built_c, compile_options=options)
E   RuntimeError: Internal: libdevice not found at ./libdevice.10.bc

Asynchronous Subprocess wrappers

Some environments can only run a single instance in a process (e.g. pybullet environments with pixel observation when egl acceleration is enabled). This prevents us from running multiple copies of the environments in the same experiment (for training and evaluation). We should introduce an environment wrapper that supports creating these environments in a subprocess.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.