Giter Site home page Giter Site logo

by571 / soft-actor-critic-and-extensions Goto Github PK

View Code? Open in Web Editor NEW
249.0 6.0 29.0 6.13 MB

PyTorch implementation of Soft-Actor-Critic and Prioritized Experience Replay (PER) + Emphasizing Recent Experience (ERE) + Munchausen RL + D2RL and parallel Environments.

License: MIT License

Python 100.00%
reinforcement-learning reinforcement-learning-algorithms soft-actor-critic prioritized-experience-replay emphasizing-recent-experience sac pytorch continuous actor-critic-algorithm munchausen-reinforcement-learning

soft-actor-critic-and-extensions's Introduction

Soft-Actor-Critic-and-Extensions

PyTorch implementation of Soft-Actor-Critic with the Extensions PER + ERE + Munchausen RL and the option for Multi-Environments for parallel data collection and faster training.


This repository includes the newest Soft-Actor-Critic version (Paper 2019) as well as extensions for SAC:

  • Prioritized Experience Replay (PER)
  • Emphasizing Recent Experience without Forgetting the Past(ERE)
  • Munchausen Reinforcement Learning Paper
  • D2RL: DEEP DENSE ARCHITECTURES IN REINFORCEMENT LEARNING Paper
  • N-step Bootstrapping
  • Parallel Environments

In the paper implementation of ERE the authors used and older version of SAC, whereas this repository contains the newest version of SAC as well as a Proportional Prioritization implementation of PER.

TODO:

  • add IQN Critic [X] with IQN critic its 10x slower... need to fix that
  • adding D2DRL IQN Critic [ ]
  • create distributed SAC version with ray [ ]
  • added N-step bootstrapping [X]
  • Check performance with all add-ons [ ]
  • added pybulletgym

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.7.0  
Numpy 1.15.2 
gym 0.10.11 
pybulletgym

How to use:

The new script combines all extensions and the add-ons can be simply added by setting the corresponding flags.

python run.py -info sac

Parameter: To see the options: python run.py -h

-env, Environment name, default = Pendulum-v0
-per, Adding Priorizied Experience Replay to the agent if set to 1, default = 0
-munchausen, Adding Munchausen RL to the agent if set to 1, default = 0
-dist, --distributional, Using a distributional IQN Critic network if set to 1, default = 0
-d2rl, Uses Deep Actor and Deep Critic Networks if set to 1, default = 0
-n_step, Using n-step bootstrapping, default = 1
-ere, Adding Emphasizing Recent Experience to the agent if set to 1, default = 0
-info, Information or name of the run
-frames, The amount of training interactions with the environment, default is 100000
-eval_every, Number of interactions after which the evaluation runs are performed, default = 5000
-eval_runs, Number of evaluation runs performed, default = 1
-seed, Seed for the env and torch network weights, default is 0
-lr_a, Actor learning rate of adapting the network weights, default is 3e-4
-lr_c, Critic learning rate of adapting the network weights, default is 3e-4
-a, --alpha, entropy alpha value, if not choosen the value is leaned by the agent
-layer_size, Number of nodes per neural network layer, default is 256
-repm, --replay_memory, Size of the Replay memory, default is 1e6
-bs, --batch_size, Batch size, default is 256
-t, --tau, Softupdate factor tau, default is 0.005
-g, --gamma, discount factor gamma, default is 0.99
--saved_model, Load a saved model to perform a test run!
-w, --worker, Number of parallel worker (attention, batch-size increases proportional to worker number!), default = 1

old scripts

with the old scripts you can still run three different SAC versions

Run regular SAC: python SAC.py -env Pendulum-v0 -ep 200 -info sac

Run SAC + PER: python SAC_PER.py -env Pendulum-v0 -ep 200 -info sac_per

Run SAC + ERE + PER: python SAC_ERE_PER.py -env Pendulum-v0 -frames 20000 -info sac_per_ere

For further input arguments and hyperparameter check the code.

Observe training results

tensorboard --logdir=runs

Results

It can be seen that the extensions not always bring improvements to the algorithm. This is depending on the environment and from environment to environment different - as the authors mention in their paper (ERE).

Pendulum

LLC

  • All runs without hyperparameter-tuning

PyBullet Environments

HalfCheetah HalfCheetah Hopper

Comparison SAC and D2RL-SAC

D2RL-Pendulum

Comparison SAC and M-SAC

munchausenRL munchausenRL2

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Author

  • Sebastian Dittert

Feel free to use this code for your own projects or research.

@misc{SAC,
  author = {Dittert, Sebastian},
  title = {PyTorch Implementation of Soft-Actor-Critic-and-Extensions},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BY571/Soft-Actor-Critic-and-Extensions}},
}

soft-actor-critic-and-extensions's People

Contributors

by571 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

soft-actor-critic-and-extensions's Issues

Use of sum trees

Hello, in the original PER paper I believe sum tree was used to speed up sampling, and I believe ERE also mentions using it in their PER implementation and PER + ERE implementation as well. It seems that your code uses simple np.random.choice to sample instead.

Have you tried implementing the tree data structure to see if that speeds up the code at all?

Thanks!

understanding alpha learning

Hi, there,
I am confused about how alpha learning is done here:

alpha_loss = - (self.log_alpha.cpu() * (log_pis.cpu() + self.target_entropy).detach().cpu()).mean()

I thought line 244 here should use alpha instead of self.log_alpha to compute alpha_loss, the dependency goes like: self.log_alpha --> alpha --> alpha_loss, so that ADAM will optimize self.log_alpha automatically for us.

Thanks.

Shuang

How do you plot figures?

Your implementation is perfect!
I just have one question.
How do you plot figures? How do you test the performance of the agent?
Can you opensource your plot code?

UserWarning

Hi, when I run your code, the using warning is:

UserWarning: Using a target size (torch.Size([256, 3])) that is different to the input size (torch.Size([256, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  critic2_loss = 0.5*F.mse_loss(Q_2, Q_targets.detach())

My python is 3.6 and pytorch is 1.4.0 with numpy 1.19.2.
The env is hopper-v2.

minor errors in the code of SAC.py

Thanks for your work and efforts. Your code is easily understood and reproducible.

I found a minor issue in the update of Actor loss.

  1. It seems that you use critic network 1 to update the loss rather than using the minimum one of these two critic networks.
  2. if use fixed alpha, there is an error in the update of Actor loss because the value actions_pred is only defined in case of auto-tuning temperature.
  3. In the Agent class: 1) the parameter add_noise in the act method is not used; 2) the weight initialization can not be done since the parameter init_w is not given.

Thank you again,
Best regards.

Tao

why the result is not good?

meaningful work for studying RL. Do you have further analysis of the experimental results? why the results is not good? especial the reply buffer and M-SAC.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.