Giter Site home page Giter Site logo

Comments (6)

bamos avatar bamos commented on August 23, 2024 1

Ah yeah, makes sense! Also I copied the new envs into my code and had to add 'max_episode_steps': 1000 to the specs to get the time-limited versions of the environments by default

from mbpo.

bamos avatar bamos commented on August 23, 2024

Ah, and also this config file points to the Humanoid-v2, so perhaps the mbpo/env version is being used somewhere?

from mbpo.

bamos avatar bamos commented on August 23, 2024

Sorry one last question -- the configs for the hopper/walker also point to the gym v2 versions of these environments, which have early termination and alive bonuses. Do you actually use the parameterized v3 environments that don't have early termination or alive bonuses?

from mbpo.

jannerm avatar jannerm commented on August 23, 2024

Thanks for the questions!

We're using the v2 environments. The changes in Ant and Humanoid should be limited to truncating the observations, and the termination conditions should be the same as in the originals (Ant, Humanoid for reference).

I just pushed a commit that hopefully makes this clearer from reading the code. It changes the environment registration so that the modified environments have unique names instead of overwriting the defaults, and removes environment parameters that are not actually used because we test on different versions of the environments. (It is a bit unfortunate that those unused parameters were there in the first place; thanks for catching that.)

from mbpo.

bamos avatar bamos commented on August 23, 2024

Great, thanks for the quick response and clarification! It may be worth making this difference more visible somewhere as the paper says the 1000-step versions of these environments are used but most of these v2 environments use early termination

from mbpo.

jannerm avatar jannerm commented on August 23, 2024

We included the sentence about using the standard 1000-step benchmarks because it's common to modify them to have a shorter horizon (eg, here and here), and this caused some of the baselines to look like they had different performance than originally reported. I can see how this is a bit underspecified now that there are newer versions of the environments that always run for 1000 steps, so I'll make note of this in the paper. Thanks for the catch!

from mbpo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.