denisyarats / pytorch_sac Goto Github PK

PyTorch implementation of Soft Actor-Critic (SAC)

License: MIT License

Python 0.56% Jupyter Notebook 99.44%

reinforcement-learning dm-control soft-actor-critic pytorch deep-reinforcement-learning actor-critic mujoco gym deep-learning sac

pytorch_sac's Introduction

Soft Actor-Critic (SAC) implementation in PyTorch

This is PyTorch implementation of Soft Actor-Critic (SAC) [ArXiv].

If you use this code in your research project please cite us as:

@misc{pytorch_sac,
  author = {Yarats, Denis and Kostrikov, Ilya},
  title = {Soft Actor-Critic (SAC) implementation in PyTorch},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/denisyarats/pytorch_sac}},
}

Requirements

We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment and activate it:

conda env create -f conda_env.yml
source activate pytorch_sac

Instructions

To train an SAC agent on the cheetah run task run:

python train.py env=cheetah_run

This will produce exp folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. One can attacha tensorboard to monitor training by running:

tensorboard --logdir exp

Results

An extensive benchmarking of SAC on the DM Control Suite against D4PG. We plot an average performance of SAC over 3 seeds together with p95 confidence intervals. Importantly, we keep the hyperparameters fixed across all the tasks. Note that results for D4PG are reported after 10^8 steps and taken from the original paper.

pytorch_sac's People

Contributors

Stargazers

Watchers

Forkers

trendingtechnology gwthomas samuelstanton capybaralet saminyeasar joannetruong yusukeurakami rabbit721 franktiantt bycn zivzone luisenp yewr shirongliu ischubert maitycyrus ssilwal mauriyin dongfangyixi lonely0000cc zhichenml camall3n munyan shahrutav lbeki01 bic4907 priyakot bangyou01 solatie drzhoukarl rudyn2 boxixia apprenticearnab yerik-chen agcxgz321 whz204168-lab dipaco booo0m xy9485 naokiyokoyama bolundai0216 shagunsodhani qiaowenchuan kouroshhakha anugyas qiaojunyu philipjball liuqi8827 zhaohengyin sangboom lee15253 c-rizz lrxsxdl jyopari kevin-thankyou-lin arnaudfickinger flyfoever rabachi sateeshkumar21 velythyl ymzhang01 blankshc d3sm0 enjeeneer csalcedo001 ydeh22 0nedividedbyzer0 xueliu8617112 omurammm zulinjenrn zhaoyi11 emlynw ht2214 reinholdm mitchellgoffpc dhruvsreenivas mooricanna nasoza rureadyo arya87 fucker007 zahinsufiyan dxyang zhangmingcheng28 yufeiwang63 bowenxxxx lamperouge12 zxq-0058 shenjiede rivado-e konakarthik12 dtbinh mattblau c0sch0 hsuth1996 ostglnd xiaowei2013-2026 wyq199321 adrialopezescoriza ssubhnil

pytorch_sac's Issues

Continuous action space?

Hi, thanks for sharing! 
I would like to ask a question, is the output of the SAC algorithm you implemented a continuous action space?

Question about implementation details

Hey,

First, I want to thank you for your implementation of SAC which is of high quality.

I want to ask you about two details of implementation which I don't understand well.

First, why here instead of clamping the log_std you use this constrain here here? How do you choose 2, -5?

        # constrain log_std inside [log_std_min, log_std_max]
        log_std = torch.tanh(log_std)
        log_std_min, log_std_max = self.log_std_bounds
        log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1)

Second, why you approximate atanh with 0.5 * (x.log1p() - (-x).log1p()) here instead of torch.atanh() ?

Thank you for your responses 😄

could you share the csv file of the benchmark data?

Thank you for sharing excellent work!
The cost of state-based training is also expensive but necessary. Could you share the state-based benchmark data?

data

Hello, author. I was inspired by the code of your project. I want to ask you for leave. I don't quite understand the meaning of the parameters in the code data. Can you explain it?

Saved model weights

Hi, do you have saved model weights? That would be very helpful for me. Thanks!

Could you share the code you used to plot the results of your experiment?

Help to speed up the code

Hi, the code currently takes around 15hours for a million epochs, could you guide me into increasing this speed and not deteriorating the performance. As the compute required for the code is pretty less, i would like to utilize them to speed up. Thanks!