sfujim / td3_bc Goto Github PK

View Code? Open in Web Editor NEW

310.0 310.0 47.0 8 KB

Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

License: MIT License

Python 95.28% Shell 4.72%

td3_bc's People

Contributors

Stargazers

Watchers

Forkers

sweetice zhongjiegdut rezer0dai djazdeck presdigitator r-ceph ethanluoyc godka miyembe canu2esp rainwangphy lucien-cs shichao2023 zhengyaojiang pratikkunapuli kristery mistcarryyou vkurenkov ycl010203 ikelq zhangchneyu alizeepace yuanhonglian bitsandscraps kbkartik colinqiyangli gengsinong subrahmanyam2305 zhendong-wang dhruvsreenivas dario-bolli kclauw offlinerl jing199887 nagisazj yangmindidemajia dldnxks12 xiaowei2013-2026 swaminathansk nishudel gjbecker lin-c-x wyq199321 geyang-ars shenjiede greenavocado92

td3_bc's Issues

use of expl_noise

Couldn't find the usage of expl_noise in actual td3 implementation

Questions about performance metric d4rl_score.

I run the code in halfcheetah-expert-v0, and it seems to work well, but its performance metric d4rl_score is only about 1.1-1.2, and the result of the paper is about 110-120, I am confused. (my mujoco version is 200)

Unable to reproduce results of Antmaze in Table 8

Hi! I have the same question as a previously closed issue. I wasn't able to reproduce results for Antmaze tasks in Table 8. I made the following adjustments in run_experiments.sh, 1. change envs to Antmaze; 2. make normalize False. The .sh file looks like this:

envs=("antmaze-umaze-v0"
"antmaze-umaze-diverse-v0"......)
for ((i=0;i<5;i+=1))
do
for env in ${envs[*]}
do
python main.py
--env $env
--normalize False
--seed $i
done
done

But the normalized scores for last final 10 evaluations and 5 seeds are much lower than the numbers provided in Table 8 inl paper. What's going wrong here? Is there other places I should modify? It would be great to provide how to reproduce Antmaze results. Thanks!

The results in Antmaze

Hi,

I would like to ask the setting about the experiments in Antmaze. Should I need to tune the hyparameters for mujoco locomotion?

I find I cannot reproduce the results about Antmaze in the paper.

Best

sfujim / td3_bc Goto Github PK

td3_bc's People

Contributors

Stargazers

Watchers

Forkers

td3_bc's Issues

use of expl_noise

Questions about performance metric d4rl_score.

Unable to reproduce results of Antmaze in Table 8

The results in Antmaze

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent