Giter Site home page Giter Site logo

denisyarats / pytorch_sac Goto Github PK

View Code? Open in Web Editor NEW
476.0 6.0 102.0 14.86 MB

PyTorch implementation of Soft Actor-Critic (SAC)

License: MIT License

Python 0.56% Jupyter Notebook 99.44%
reinforcement-learning dm-control soft-actor-critic pytorch deep-reinforcement-learning actor-critic mujoco gym deep-learning sac

pytorch_sac's Introduction

Soft Actor-Critic (SAC) implementation in PyTorch

This is PyTorch implementation of Soft Actor-Critic (SAC) [ArXiv].

If you use this code in your research project please cite us as:

@misc{pytorch_sac,
  author = {Yarats, Denis and Kostrikov, Ilya},
  title = {Soft Actor-Critic (SAC) implementation in PyTorch},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/denisyarats/pytorch_sac}},
}

Requirements

We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment and activate it:

conda env create -f conda_env.yml
source activate pytorch_sac

Instructions

To train an SAC agent on the cheetah run task run:

python train.py env=cheetah_run

This will produce exp folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. One can attacha tensorboard to monitor training by running:

tensorboard --logdir exp

Results

An extensive benchmarking of SAC on the DM Control Suite against D4PG. We plot an average performance of SAC over 3 seeds together with p95 confidence intervals. Importantly, we keep the hyperparameters fixed across all the tasks. Note that results for D4PG are reported after 10^8 steps and taken from the original paper. Results

pytorch_sac's People

Contributors

denisyarats avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pytorch_sac's Issues

Continuous action space?

Hi, thanks for sharing! 
I would like to ask a question, is the output of the SAC algorithm you implemented a continuous action space?

Question about implementation details

Hey,

First, I want to thank you for your implementation of SAC which is of high quality.

I want to ask you about two details of implementation which I don't understand well.

First, why here instead of clamping the log_std you use this constrain here here? How do you choose 2, -5?

        # constrain log_std inside [log_std_min, log_std_max]
        log_std = torch.tanh(log_std)
        log_std_min, log_std_max = self.log_std_bounds
        log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1)

Second, why you approximate atanh with 0.5 * (x.log1p() - (-x).log1p()) here instead of torch.atanh() ?

Thank you for your responses ๐Ÿ˜„

data

Hello, author. I was inspired by the code of your project. I want to ask you for leave. I don't quite understand the meaning of the parameters in the code data. Can you explain it?

Saved model weights

Hi, do you have saved model weights? That would be very helpful for me. Thanks!

Help to speed up the code

Hi, the code currently takes around 15hours for a million epochs, could you guide me into increasing this speed and not deteriorating the performance. As the compute required for the code is pretty less, i would like to utilize them to speed up. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.