Can this code run on TPU?

JAX MuZero

A JAX implementation of the MuZero agent.

Everything is implemented in JAX, including the MCTS. The entire search process can be jitted and can run on accelerators such as GPUs.

Requirements

Run the following command to create a new conda environment with all dependencies:

conda env create -f conda_env.yml

Then activate the conda environment by

conda activate muzero

Or if you prefer using your own Python environment, run the following command to install the dependencies:

pip install -r requirements.txt

Training

Run the following command for learning to play the Atari game Breakout:

python -m experiments.breakout

Atari 100K Benchmark Results

Median human-normalized score:

Raw game scores:

Repository Structure

.
├── algorithms              # Files for the MuZero algorithm.
│   ├── actors.py           # Agent-environment interaction.
│   ├── agents.py           # An RL agent that plans with a learned model by MCTS.
│   ├── haiku_nets.py       # Neural networks.
│   ├── muzero.py           # The training pipeline.
│   ├── replay_buffers.py   # Experience replay.
│   ├── types.py            # Customized data structures.
│   └── utils.py            # Helper functions.
├── environments            # The Atari environment interface and wrappers.
├── experiments             # Experiment configuration files.
├── vec_env                 # Vectorized environment interfaces.
├── conda_env.yml           # Conda environment specification.
├── requirements.txt        # Python dependencies.
├── LICENSE
└── README.md

Resources

NeurIPS 2020: JAX Ecosystem Meetup, video and slides
https://arxiv.org/src/1911.08265v2/anc/pseudocode.py
https://github.com/YeWR/EfficientZero

	def fn(state: chex.Array, action: chex.Array):
	one_hot_action = hk.one_hot(action, self._action_space.n)
	next_state = self._transit_fn.apply(params.transition, one_hot_action, state)
	next_state = utils.scale_gradient(next_state, 0.5)
	return next_state, next_state

hwhitetooth / jax_muzero Goto Github PK

jax_muzero's Introduction

JAX MuZero

Requirements

Training

Atari 100K Benchmark Results

Repository Structure

Resources

jax_muzero's People

Contributors

Stargazers

Watchers

Forkers

jax_muzero's Issues

Can this code run on TPU?

[Question] What's the motivation for scaling the gradient?

What are differences to EfficientZero?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent