Hi -- <a href="https://github.com/JannerM/mbpo/tree/4c48e2f70a8858f7bd7cd4a9675a40b981

Ah, and also <a href="https://github.com/JannerM/mbpo/blob/4c48e2f70a8858f7bd7cd4a9675

Sorry one last question -- the <a href="https://github.com/JannerM/mbpo/tree/aeeaf7155

Clarification on Ant/Humanoid environments about mbpo HOT 6 CLOSED

jannerm commented on August 23, 2024

Clarification on Ant/Humanoid environments

from mbpo.

Comments (6)

bamos commented on August 23, 2024 1

Ah yeah, makes sense! Also I copied the new envs into my code and had to add 'max_episode_steps': 1000 to the specs to get the time-limited versions of the environments by default

from mbpo.

bamos commented on August 23, 2024

Ah, and also this config file points to the Humanoid-v2, so perhaps the mbpo/env version is being used somewhere?

from mbpo.

bamos commented on August 23, 2024

Sorry one last question -- the configs for the hopper/walker also point to the gym v2 versions of these environments, which have early termination and alive bonuses. Do you actually use the parameterized v3 environments that don't have early termination or alive bonuses?

from mbpo.

jannerm commented on August 23, 2024

Thanks for the questions!

We're using the v2 environments. The changes in Ant and Humanoid should be limited to truncating the observations, and the termination conditions should be the same as in the originals (Ant, Humanoid for reference).

I just pushed a commit that hopefully makes this clearer from reading the code. It changes the environment registration so that the modified environments have unique names instead of overwriting the defaults, and removes environment parameters that are not actually used because we test on different versions of the environments. (It is a bit unfortunate that those unused parameters were there in the first place; thanks for catching that.)

from mbpo.

bamos commented on August 23, 2024

Great, thanks for the quick response and clarification! It may be worth making this difference more visible somewhere as the paper says the 1000-step versions of these environments are used but most of these v2 environments use early termination

from mbpo.

jannerm commented on August 23, 2024

We included the sentence about using the standard 1000-step benchmarks because it's common to modify them to have a shorter horizon (eg, here and here), and this caused some of the baselines to look like they had different performance than originally reported. I can see how this is a bit underspecified now that there are newer versions of the environments that always run for 1000 steps, so I'll make note of this in the paper. Thanks for the catch!

from mbpo.

Recommend Projects

Clarification on Ant/Humanoid environments about mbpo HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent