Giter Site home page Giter Site logo

instadeepai / jumanji Goto Github PK

View Code? Open in Web Editor NEW
559.0 10.0 68.0 61.75 MB

πŸ•ΉοΈ A diverse suite of scalable reinforcement learning environments in JAX

Home Page: https://instadeepai.github.io/jumanji

License: Apache License 2.0

JavaScript 0.01% Python 100.00%
jax python reinforcement-learning research

jumanji's Introduction

Jumanji logo

Python Versions PyPI Version Tests Code Style MyPy License Hugging Face

BinPack Cleaner Connector CVRP FlatPack Game2048
GraphColoring JobShop Knapsack Maze Minesweeper MMST
MultiCVRP PacMan RobotWarehouse RubiksCube SlidingTilePuzzle Snake
RobotWarehouse Sudoku Tetris Tetris

Jumanji @ ICLR 2024

Jumanji has been accepted at ICLR 2024, check out our research paper.

Welcome to the Jungle! 🌴

Jumanji is a diverse suite of scalable reinforcement learning environments written in JAX. It now features 22 environments!

Jumanji is helping pioneer a new wave of hardware-accelerated research and development in the field of RL. Jumanji's high-speed environments enable faster iteration and large-scale experimentation while simultaneously reducing complexity. Originating in the research team at InstaDeep, Jumanji is now developed jointly with the open-source community. To join us in these efforts, reach out, raise issues and read our contribution guidelines or just star 🌟 to stay up to date with the latest developments!

Goals πŸš€

  1. Provide a simple, well-tested API for JAX-based environments.
  2. Make research in RL more accessible.
  3. Facilitate the research on RL for problems in the industry and help close the gap between research and industrial applications.
  4. Provide environments whose difficulty can be scaled to be arbitrarily hard.

Overview 🦜

  • πŸ₯‘ Environment API: core abstractions for JAX-based environments.
  • πŸ•ΉοΈ Environment Suite: a collection of RL environments ranging from simple games to NP-hard combinatorial problems.
  • 🍬 Wrappers: easily connect to your favourite RL frameworks and libraries such as Acme, Stable Baselines3, RLlib, OpenAI Gym and DeepMind-Env through our dm_env and gym wrappers.
  • πŸŽ“ Examples: guides to facilitate Jumanji's adoption and highlight the added value of JAX-based environments.
  • 🏎️ Training: example agents that can be used as inspiration for the agents one may implement in their research.

Environments 🌍

Jumanji provides a diverse range of environments ranging from simple games to NP-hard combinatorial problems.

Environment Category Registered Version(s) Source Description
πŸ”’ Game2048 Logic Game2048-v1 code doc
🎨 GraphColoring Logic GraphColoring-v0 code doc
πŸ’£ Minesweeper Logic Minesweeper-v0 code doc
🎲 RubiksCube Logic RubiksCube-v0
RubiksCube-partly-scrambled-v0
code doc
πŸ”€ SlidingTilePuzzle Logic SlidingTilePuzzle-v0 code doc
✏️ Sudoku Logic Sudoku-v0
Sudoku-very-easy-v0
code doc
πŸ“¦ BinPack (3D BinPacking Problem) Packing BinPack-v1 code doc
🧩 FlatPack (2D Grid Filling Problem) Packing FlatPack-v0 code doc
🏭 JobShop (Job Shop Scheduling Problem) Packing JobShop-v0 code doc
πŸŽ’ Knapsack Packing Knapsack-v1 code doc
β–’ Tetris Packing Tetris-v0 code doc
🧹 Cleaner Routing Cleaner-v0 code doc
πŸ”— Connector Routing Connector-v2 code doc
🚚 CVRP (Capacitated Vehicle Routing Problem) Routing CVRP-v1 code doc
🚚 MultiCVRP (Multi-Agent Capacitated Vehicle Routing Problem) Routing MultiCVRP-v0 code doc
πŸ” Maze Routing Maze-v0 code doc
πŸ€– RobotWarehouse Routing RobotWarehouse-v0 code doc
🐍 Snake Routing Snake-v1 code doc
πŸ“¬ TSP (Travelling Salesman Problem) Routing TSP-v1 code doc
Multi Minimum Spanning Tree Problem Routing MMST-v0 code doc
α—§β€’β€’β€’α—£β€’β€’ PacMan Routing PacMan-v1 code doc
πŸ‘Ύ Sokoban Routing Sokoban-v0 code doc

Installation 🎬

You can install the latest release of Jumanji from PyPI:

pip install -U jumanji

Alternatively, you can install the latest development version directly from GitHub:

pip install git+https://github.com/instadeepai/jumanji.git

Jumanji has been tested on Python 3.8 and 3.9. Note that because the installation of JAX differs depending on your hardware accelerator, we advise users to explicitly install the correct JAX version (see the official installation guide).

Rendering: Matplotlib is used for rendering all the environments. To visualize the environments you will need a GUI backend. For example, on Linux, you can install Tk via: apt-get install python3-tk, or using conda: conda install tk. Check out Matplotlib backends for a list of backends you can use.

Quickstart ⚑

RL practitioners will find Jumanji's interface familiar as it combines the widely adopted OpenAI Gym and DeepMind Environment interfaces. From OpenAI Gym, we adopted the idea of a registry and the render method, while our TimeStep structure is inspired by DeepMind Environment.

Basic Usage πŸ§‘β€πŸ’»

import jax
import jumanji

# Instantiate a Jumanji environment using the registry
env = jumanji.make('Snake-v1')

# Reset your (jit-able) environment
key = jax.random.PRNGKey(0)
state, timestep = jax.jit(env.reset)(key)

# (Optional) Render the env state
env.render(state)

# Interact with the (jit-able) environment
action = env.action_spec.generate_value()          # Action selection (dummy value here)
state, timestep = jax.jit(env.step)(state, action)   # Take a step and observe the next state and time step
  • state represents the internal state of the environment: it contains all the information required to take a step when executing an action. This should not be confused with the observation contained in the timestep, which is the information perceived by the agent.
  • timestep is a dataclass containing step_type, reward, discount, observation and extras. This structure is similar to dm_env.TimeStep except for the extras field that was added to allow users to log environments metrics that are neither part of the agent's observation nor part of the environment's internal state.

Advanced Usage πŸ§‘β€πŸ”¬

Being written in JAX, Jumanji's environments benefit from many of its features including automatic vectorization/parallelization (jax.vmap, jax.pmap) and JIT-compilation (jax.jit), which can be composed arbitrarily. We provide an example of a more advanced usage in the advanced usage guide.

Registry and Versioning πŸ“–

Like OpenAI Gym, Jumanji keeps a strict versioning of its environments for reproducibility reasons. We maintain a registry of standard environments with their configuration. For each environment, a version suffix is appended, e.g. Snake-v1. When changes are made to environments that might impact learning results, the version number is incremented by one to prevent potential confusion. For a full list of registered versions of each environment, check out the documentation.

Training 🏎️

To showcase how to train RL agents on Jumanji environments, we provide a random agent and a vanilla actor-critic (A2C) agent. These agents can be found in jumanji/training/.

Because the environment framework in Jumanji is so flexible, it allows pretty much any problem to be implemented as a Jumanji environment, giving rise to very diverse observations. For this reason, environment-specific networks are required to capture the symmetries of each environment. Alongside the A2C agent implementation, we provide examples of such environment-specific actor-critic networks in jumanji/training/networks.

⚠️ The example agents in jumanji/training are only meant to serve as inspiration for how one can implement an agent. Jumanji is first and foremost a library of environments - as such, the agents and networks will not be maintained to a production standard.

For more information on how to use the example agents, see the training guide.

Contributing 🀝

Contributions are welcome! See our issue tracker for good first issues. Please read our contributing guidelines for details on how to submit pull requests, our Contributor License Agreement, and community guidelines.

Citing Jumanji ✏️

If you use Jumanji in your work, please cite the library using:

@misc{bonnet2024jumanji,
    title={Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX},
    author={ClΓ©ment Bonnet and Daniel Luo and Donal Byrne and Shikha Surana and Sasha Abramowitz and Paul Duckworth and Vincent Coyette and Laurence I. Midgley and Elshadai Tegegn and Tristan Kalloniatis and Omayma Mahjoub and Matthew Macfarlane and Andries P. Smit and Nathan Grinsztajn and Raphael Boige and Cemlyn N. Waters and Mohamed A. Mimouni and Ulrich A. Mbou Sob and Ruan de Kock and Siddarth Singh and Daniel Furelos-Blanco and Victor Le and Arnu Pretorius and Alexandre Laterre},
    year={2024},
    eprint={2306.09884},
    url={https://arxiv.org/abs/2306.09884},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

See Also πŸ”Ž

Other works have embraced the approach of writing RL environments in JAX. In particular, we suggest users check out the following sister repositories:

  • πŸ€– Qdax is a library to accelerate Quality-Diversity and neuro-evolution algorithms through hardware accelerators and parallelization.
  • 🌳 Evojax provides tools to enable neuroevolution algorithms to work with neural networks running across multiple TPU/GPUs.
  • 🦾 Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators.
  • πŸ‹οΈβ€ Gymnax implements classic environments including classic control, bsuite, MinAtar and a collection of meta RL tasks.
  • 🎲 Pgx provides classic board game environments like Backgammon, Shogi, and Go.

Acknowledgements πŸ™

The development of this library was supported with Cloud TPUs from Google's TPU Research Cloud (TRC) 🌀.

jumanji's People

Contributors

aar65537 avatar alaterre avatar arnupretorius avatar biogeek avatar callumtilbury avatar clement-bonnet avatar coyettev avatar dantp-ai avatar dluo96 avatar driessmit avatar egiob avatar elshadaik avatar eltociear avatar george-ogden avatar iamunr4v31 avatar lollcat avatar medalimimouni avatar mvmacfarlane avatar mwolinska avatar pduckworth avatar raphaelavalos avatar rodsiry avatar ruanjohn avatar sash-a avatar siddarthsingh1 avatar surana01 avatar taodav avatar tristankalloniatis avatar ulricharmel avatar wang-r-j avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jumanji's Issues

Make `wrappers.jumanji_specs_to_dm_env_specs` compatible with PyTrees

Is your feature request related to a problem? Please describe

In jumanji.wrappers the conversion of jumanji.specs to dm_env.specs does not accept general PyTree Nodes. As mentioned in: google-deepmind/dm_env#10, this should simply be compatible.

Describe the solution you'd like

Replace the else statement within jumanji.wrappers.jumanji_specs_to_dm_env_specs on line:465 from this:

def jumanji_specs_to_dm_env_specs(
    spec: Spec,
) -> Union[dm_env.specs.DiscreteArray, dm_env.specs.BoundedArray, dm_env.specs.Array]:
    if isinstance(spec, DiscreteArray):
        ...
    elif: 
        ...
    else:
        raise ValueError(
            f"spec {spec} of type {type(spec)} is not available in a deepmind environment. "
            "Please override the observation_spec or action_spec method to output spec of type "
            "`dm_env.specs.Array`."
        )

to something like this:

def jumanji_specs_to_dm_env_specs(
    spec: Spec,
) -> Union[dm_env.specs.DiscreteArray, dm_env.specs.BoundedArray, dm_env.specs.Array]:
    if isinstance(spec, DiscreteArray):
        ...
    elif: 
        ...
    else:
        try:
            # Recursively call this function for nested Specs
            return jax.tree_map(jumanji_specs_to_dm_env_specs, spec)
        except ...:
            raise ValueError(...)

Additional context

Add any other context or screenshots about the feature request here.

bug: import jumanji crash from a console that is not a notebook

Description

If I run import jumanji from a console (a PyCharm console in my case), I get the following error from this line:

IPython.core.error.UsageError: Invalid GUI request 'notebook', valid ones are:dict_keys(['none', 'osx', 'tk', 'gtk', 'wx', 'qt', 'qt4', 'qt5', 'glut', 'pyglet', 'gtk3'])

I think it comes from the fact that we are checking the backend to set up the proper matplotlib backend accordingly. However, the current version seems to assume a jupyter notebook when it is a python console, thus breaking at import time.

The solution would be to set up the backend optionally, i.e. if an error is encountered, then a default backend is set. Or to figure out how to properly differentiate jupyter notebooks from something else. In any case, we should not break at import time because of rendering!

What Jumanji version are you using?

v0.1.1

Which accelerator(s) are you using?

No response

Additional System Info

Linux

Additional Context

No response

(Optional) Suggestion

No response

refactor(cleaner): return state from the generator

Is your feature request related to a problem? Please describe

The instance generator for the cleaner environment currently returns an array containing the initial grid. It should instead returns the full environment state to be consistent with other environments.

Describe the solution you'd like

Return an instance of cleaner.State in the instance generator.

docs: add example notebook

Is your feature request related to a problem? Please describe

People are more likely to use Jumanji if they can quickly try it out without a lot of admin. We should have an example notebook (we previously had the anakin snake notebook but it was removed because it was outdated).

docs: make contributing link and menu bar shortcut work on the doc

Is your feature request related to a problem? Please describe

On the documentation website, there are hyperlinks/shortcuts that do not work properly, although they are working on the readme on GitHub.

Describe the solution you'd like

Clicking the shortcuts on the menu bar (Installation | Quickstart | ... | Reference Docs) should redirect the user to the corresponding locations on the page. There is a problem with the emojis that are sometimes part of the link (on the GitHub's readme I think) and sometimes not (on the website I believe). Also, the "contributing guidelines" link does not work on the website because there is no page hosted for it.

Additional context

It seems that one has to reconcile the way GitHub renders the readme and the way mkdocs builds the doc when it comes to hyperlinks.

docs: state that we use the google style guide

Is your feature request related to a problem? Please describe

Make it clear to the community what style guide they should be abiding by which could reduce the number of review iterations required to get a pull request in.

Describe the solution you'd like

Add a reference to the google style guide in the developer documentation or readme.

feat(registration): default to latest version

Is your feature request related to a problem? Please describe

jumanji.make("TSP") should default to latest version of TSP, e.g. "TSP-v1". This way, a user who sees the codebase and agrees with the current version of TSP should be able to use TSP and not have to search for what version of TSP we are at.

docs: show test coverage in badge

Is your feature request related to a problem? Please describe

We should display the test coverage in a badge on the README. This is important to show since well-tested API is a goal.

feat: deprecate Connect4

Is your feature request related to a problem? Please describe

Connect4 will be removed in a future release (v0.2).

Describe the solution you'd like

Raise a deprecated warning when using Connect4 as it will be removed soon.

Remark

Bug in the reset function: action_mask = jnp.ones((BOARD_WIDTH,), dtype=jnp.int8) should be changed to action_mask = jnp.ones((BOARD_WIDTH,), dtype=bool).

refactor: remove environment factory

Is your feature request related to a problem? Please describe

The environment factory ENV_FACTORY in setup_train.py is no longer used and can be removed.

refactor: use named tuples instead of dataclasses for timestep and state

Is your feature request related to a problem? Please describe

Since the dataclasses we use are mutable, some side effects may occur when working with environment State and TimeStep.

Describe the solution you'd like

Use NamedTuple instead.

Describe alternatives you've considered

We could also freeze the dataclasses to be immutable but the NamedTuple option is preferred.

docs: add environment speeds to readme

Is your feature request related to a problem? Please describe

A clear and concise description of what the problem is.

Describe the solution you'd like

Add to the README environment table the environment speed (in steps/s).

feat(jobshop): create instance generator with known optimal solution

Is your feature request related to a problem? Please describe

Create a JobShop instance generator whose optimal makespan is known. This will be useful to benchmark agents better since the best solution will be known in advance.

Describe the solution you'd like

The generator will generate a random schedule based on a specified makespan (length of the schedule), number of machines, number of jobs, and max number of operations per job.

bug: protocol not supported in Python 3.7

Description

Jumanji supports Python 3.7 however the Protocol imported from typing isn't supported in Python 3.7.

What Jumanji version are you using?

v0.1.3

Which accelerator(s) are you using?

No response

Additional System Info

No response

Additional Context

No response

(Optional) Suggestion

Import Protocol from typing_extension instead of typing.

feat: expose state and key as attributes in DMEnvWrapper

Is your feature request related to a problem? Please describe

When working with a dm_env.Environment version of a Jumanji environment using the JumanjiToDMEnvWrapper, one may need to get/set the state and key of the environment e.g. to allow planning and "restart" the environment to its previous state.

Describe the solution you'd like

Right now, this is possible by calling wrapped_env._state and wrapped_env._state which should not be allowed since key and state are private attributes.
A solution would be to properly expose them as properties and to implement setters for these properties (in the common style).


Checklist:

  • Expose state and key of JumanjiToDMEnvWrapper as properties
  • Implement setters for these properties
  • Test that one can instantiate a Jumanji environment (e.g. jumanji.make("Snake-6x6-v0")), wrap it with JumanjiToDMEnvWrapper and then get and set the corresponding state and key attributes without leading underscores

ci: support python 3.10

Is your feature request related to a problem? Please describe

We would like Jumanji to support Python 3.10.


Checklist

  • Add 3.10 checks in the GitHub pipeline.
  • Update the documentation to reflect this extended support.

feat: use future annotations for same class type

Is your feature request related to a problem? Please describe

This issue is about cleaning the type hinting in the repository. Currently, when a method (inside a given class) returns an object whose type is the class itself (e.g. in Environment), the return type is given with quotes:

class Environment:

    ...

    def unwrapped(self) -> "Environment":
        ...

According to this thread, this is needed for Python<3.7, but can be removed with future annotations from Python 3.7+.

Describe the solution you'd like

Since Jumanji supports Python>=3.8, we could get rid of these quotes throughout the repository and use future annotations instead.

refactor: create env viewer interface

Is your feature request related to a problem? Please describe

Create an abstract base class Viewer which all environment viewers inherit from. This will help standardise how rendering is done across all environments.

fix: type checking of dataclasses

Is your feature request related to a problem? Please describe

At multiple places in the code, a workaround is used to check typing of chex dataclasses. This involves doing a conditional import using if TYPE_CHECKING - we would like to avoid this if possible. It is related to this issue.

Describe the solution you'd like

We would a solution which avoids doing the conditional import.

Alternatives considered

Use a built-in dataclass instead of a chex dataclass. Example implementation where mypy doesn't complain:

@dataclasses.dataclass(init=False)
class TimeStep(Generic[Observation]):
    step_type: StepType
    reward: Array
    discount: Array
    observation: Observation
    extras: Optional[Dict]

    def __init__(self, step_type: StepType, reward: Array, discount: Array, observation: Observation,
                 extras: Optional[Dict] = None):
        self.step_type = step_type
        self.reward = reward
        self.discount = discount
        self.observation = observation
        self.extras = extras

    def first(self) -> Array:
        return self.step_type == StepType.FIRST

    def mid(self) -> Array:
        return self.step_type == StepType.MID

    def last(self) -> Array:
        return self.step_type == StepType.LAST

fix: step type in autoreset wrapper

Is your feature request related to a problem? Please describe

When using AutoResetWrapper, the sequence of timestep.step_type returned during a rollout shows 0 values where the env has been auto reset instead of 2 values. Recall that

  • 0 corresponds to the first step
  • 1 corresponds to a mid step
  • 2 corresponds to the last step

Describe the solution you'd like

Move the auto reset bug: [1, 1, 1, 2, 0, 1, 1, 1] has to be converted to [1, 1, 1, X, 1, 1, 1] with X being 0 or 2. Right now it is 0 but it should be 2 to warn the user that the episode got terminated. This would be important e.g. if using while not timestep.last().

timestep = timestep.replace(  # type: ignore
    observation=reset_timestep.observation
)

feat: improve environment reset speed by changing jax.random.choice

Is your feature request related to a problem? Please describe

jax.random.choice seems to be slow especially when sampling without replacement. Sampling with replacement seems to be much faster, but even without replacement is probably slower than jax.random.randint (to be verified).

Describe the solution you'd like

Find a way to use jax.random.choice(replace=False) as little as possible to improve environment speed.

Alternatives considered

As a first study towards this, it turns out that jax.random.choice(..., replace=True) is faster than jax.random.categorical for sampling with replacement. jax.random.choice(..., replace=False) appears much slower than the other two. When sampling without replacement is needed, we still have to study what the best approach is.
Source: notebook

Alternatives to jax.random.choice(..., replace=False) that could be considered and assessed include:

  • Sampling once from the joint distribution where the joint gathers all the valid pairs
  • Creating two random partitions p1 and p2, sampling one index i only, and get the two samples by p1[i] and p2[i]
  • Sequentially sampling one index and then a second one from the conditional distribution given the first one is not available

It may be that jax.random.choice(..., replace=False) ends up being the most optimised version. In any case, the solution may depend on how many samples we need to sample without replacement (e.g. 2 in the case of Snake).

Remarks

We need to take this into account for random policies. It is likely that the random action selection influences the environment speed by a lot, hence biasing speed benchmarks.

fix: vmap-ed brax environment with BraxToJumanjiWrapper

Is your feature request related to a problem? Please describe

Currently, the BraxToJumanjiWrapper cannot accept a Brax environment that has been wrapped by Brax's VmapWrapper due to jax.lax.cond not broadcasting in the case of a vmap-ed environment. A current workaround is to not use Brax's VmapWrapper', convert it to Jumanji with BraxToJumanjiWrapperand then use Jumanji'sVmapWrapper`, this will produce the desired outcome.
To summarize,

from jumanji import wrappers as jumanji_wrappers
from brax.envs import create, wrappers as brax_wrappers

brax_env = create("ant")

# This does not work
jumanji_env = jumanji_wrappers.BraxToJumanjiWrapper(brax_wrappers.VmapWrapper(brax_env))

# This works
jumanji_env = jumanji_wrappers.VmapWrapper(jumanji_wrappers.BraxToJumanjiWrapper(brax_env))

The goal of this issue is to make the first solution work as well for consistency.

Describe the solution you'd like

The jax.lax.cond line in the step function of BraxToJumanjiWrapper should be changed to handle the case where
state.done is a vector and not a scalar (i.e., when the Brax environment originates from a VmapWrapper).

Issue reproduction

from brax.envs import wrappers as brax_wrappers
from brax.envs import create
from jumanji.wrappers import BraxToJumanjiWrapper

brax_env = create("ant")
jumanji_env = BraxToJumanjiWrapper(brax_wrappers.VmapWrapper(brax_env))
state, timestep = jax.jit(jumanji_env.reset)(jax.random.split(jax.random.PRNGKey(0), num = 1))
action = jumanji_env.action_spec().generate_value()[None, ...]
state, timestep = jax.jit(jumanji_env.step)(state, action)

Checklist

  • jumanji_env = jumanji_wrappers.BraxToJumanjiWrapper(brax_wrappers.VmapWrapper(brax_env)). Calling jumanji_env.reset and jumanji_env.step should work without raising exceptions.
  • Writes a unit test to check for this feature

refactor: improve consistency of extras

Is your feature request related to a problem? Please describe

The extras field of TimeStep can contain environment information useful for decision-making (e.g. Connect4's current player ID) or environment metrics (e.g. BinPack's volume utilisation). There is an inconsistency in what the extras field is used for as it is sometimes meant to be used by the algorithm and sometimes just logged as a metric.

Describe the solution you'd like

We should move any algorithm-related information from extras to the environment observation (e.g. Connect4's observation could have another field called current_player or something). We should update the documentation/docstrings accordingly to explicitly mention that TimeStep.extras does not contain stuff that is meant to be observed by the agent as those should be in the observation.


TODOs

  • adapt docstrings, doc, codes, etc to make explicit the fact that TimeStep.extras does not contain any info meant to be observed
  • move agent-specific extras (e.g. Connect4's current player ID) to environment observations

bug: flake8 fails in ci pipeline

Description

The CI pipeline fails at the linting step, specifically flake8 fails with the following error:

An unexpected error has occurred: CalledProcessError: command: ('/usr/bin/git', 'fetch', 'origin', '--tags')
return code: 128
expected return code: 0
stdout: (none)
stderr:
    fatal: could not read Username for 'https://gitlab.com/': No such device or address
    
Check the log at /home/runner/.cache/pre-commit/pre-commit.log

What Jumanji version are you using?

v0.1.1

Which accelerator(s) are you using?

N/A

Additional System Info

N/A

Additional Context

No response

(Optional) Suggestion

This error occurs because flake8 took down their GitLab repository in favour of their GitHub repository.
However, by default, pre-commit uses the GitLab link. Thus, we need to replace https://gitlab.com/PyCQA/flake8 with https://github.com/PyCQA/flake8 in the .pre-commit-config.yaml. For more information, see this

feat(registry): retrieve the latest version of an environment

Is your feature request related to a problem? Please describe

As of now, when calling make with an environment name omitting the version, the current behaviour is to fetch version v0. This will become a problem when we start having more than one version per environment. Getting v0 doesn't make sense.

Describe the solution you'd like

I suggest we throw an error if the version number is missing. This would simplify the code and force users to be explicit about the version they want. It is also better for reproducibility. It is slightly less user-friendly because they would need to look up the listing of the registered environments before calling make.

Describe alternatives you've considered

An alternative solution is to change the described behaviour to fetch the latest version of an environment if the version number is omitted.


Misc

  • Check for duplicate requests.

feat(specs): make environment specs properties instead of methods

Is your feature request related to a problem? Please describe

env.observation_spec is a method and creates a new spec each time it is being called. If we it a property, we can JIT a function that gets these specs.

Describe the solution you'd like

def __init__(self):
    self.observation_spec = self._make_observation_spec()

def _make_observation_spec():
    return spec.Spec(...)

style: move root files to their own folder

Is your feature request related to a problem? Please describe

The root is quite messy with lots of files (e.g. commitlint.config.js, mkdocs.yml, license_header.txt).

Describe the solution you'd like

We could move a lot of these files to separate folders and keep only necessary files in the root directory (e.g. README, and setup.py).

docs: mention request for Pgx

Sorry for the comment from out of the blue. Jumanji's sophisticated API is great, and its application to problems like TSP is really interesting.

Today, we released Pgx, a collection of JAX-based RL environments dedicated to classic board games like Go. We have implemented over 15 environments, including Backgammon, Shogi, and Go, and confirmed that they are considerably faster than existing C++/Python implementations. We also plan to implement Chess and Contract Bridge in the coming weeks.

We believe Jumanji and Pgx can complement each other as both implement JAX-based RL environments but focus on different domains. We would be happy if you could kindly mention Pgx in the README like other JAX-based RL environments if you like it. For example,

🎲 Pgx provides classic board game environments like Backgammon, Shogi, and Go.

🎲 [Pgx](https://github.com/sotetsuk/pgx) provides classic board game environments like Backgammon, Shogi, and Go.

Thanks!

feat(specs): implement sample method

Is your feature request related to a problem? Please describe

Jumanji specs have a generate_value method which essentially returns zeros of the correct pytree/shape. It would be nice to be able to sample values (with an optional mask) from the action spec. This would give us random policies for free for all environments.

Describe the solution you'd like

Add a sample method to the specs similar to Gym.

Remarks

We need to decide whether it is redundant to have both sample and generate_value.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.


Misc

  • Check for duplicate requests.

feat(binpack): improve speed of the environment

Is your feature request related to a problem? Please describe

There is a potential for speedup in the BinPack environment's step method. If the computation is quite sparse, using jax.lax.map instead of jax.vmap may speed the environment up when lots of EMSs are not alive.

Describe the solution you'd like

  • POC to be done with the timer.
  • Faster version of BinPack's step method.

bug: make pull requests trigger workflow

Description

Previously the workflow was not triggered.

What Jumanji version are you using?

No response

Which accelerator(s) are you using?

No response

Additional System Info

No response

Additional Context

No response

(Optional) Suggestion

No response

bug: protocol not supported in Python 3.7

Description

Jumanji supports Python 3.7 however the Protocol imported from typing isn't supported in Python 3.7.

What Jumanji version are you using?

v0.1.3

Which accelerator(s) are you using?

No response

Additional System Info

No response

Additional Context

No response

(Optional) Suggestion

Import Protocol from typing_extension instead of typing.

bug: images are not displayed on PyPI

Description

The Jumanji logo, badges and environment GIF do not appear on the PyPI description page. This is probably due to not exporting the images when uploading to PyPI.

What Jumanji version are you using?

v0.1.1

Which accelerator(s) are you using?

No response

Additional System Info

No response

Additional Context

No response

(Optional) Suggestion

No response

bug: import of jumanji make pytest tests seg faults.

Description

Good morning,

Importing jumanji is making my other tests seg faults.

Without any test calling/importing code relying on JJ - if I just add a simple import jumanji - tests seg fault, if I remove the import, tests pass.

if I comment in jumanji/__init__.py the import of the binpack sub-module (from jumanji.environments.combinatorial import binpack as _binpack) and the associated env registrations - tests pass.

I suspect it has to do with something in jumanji\environments\__init__.py which is loaded when importing binpack. Since there are a lot of import in the init - it's hard to find the culprit.

I can't disclose the dependencies I am using.

Thanks,

Cyprien

What Jumanji version are you using?

0.1.3

Which accelerator(s) are you using?

CPU/GPU

Additional System Info

Python 3.10.8 - WSL - Ubuntu 20.04 LTS

Additional Context

No response

(Optional) Suggestion

I have forked the repo and am working on making the registration of the binpack env not require any import from the env.

Instead of:

register(
    id="BinPack-rand20-v0",
    entry_point="jumanji.environments:BinPack",
    kwargs={
        "instance_generator": _binpack.instance_generator.RandomInstanceGenerator(
            max_num_items=20,
            max_num_ems=80,
        ),
        "obs_num_ems": 40,
    },
)

We could have:

register(
    id="BinPack-rand20-v0",
    entry_point="jumanji.environments:BinPack",
    kwargs={
        "instance_generator": {
            "type": "random",
            "max_num_items": 20,
            "max_num_ems": 80,
        },
        "obs_num_ems": 40,
    },
)

or,

register(
    id="BinPack-rand20-v0",
    entry_point="jumanji.environments:BinPack",
    kwargs={
        "instance_generator": "random",
        "max_num_items": 20,
        "max_num_ems": 80,
        "obs_num_ems": 40,
    },
)

feat: constrain all environment states to have a key attribute

Is your feature request related to a problem? Please describe

One assumption about environment states is that they have a key (jax random key) attribute to manage stochasticity in the environment step function. This key is then used in wrappers such as JumanjiToDMEnvWrapper.

Describe the solution you'd like

I am not sure of the solution to go for. I have identified two possible ways: using protocols to make it explicit that states have a key attribute, or using some abstract State class that will have the key attribute mandatory.

What is the best way of forcing the environment's State to have a key attribute?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.