Hi, great work on the paper and code! I am working on a project that builds on top of

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reproducing agent performance in MovementBandits about mlsh HOT 3 OPEN

openai commented on August 10, 2024

Reproducing agent performance in MovementBandits

from mlsh.

Comments (3)

chaonan99 commented on August 10, 2024

In my case the two sub policies learns to aim for different goal points, But the master policy leads the agent to a single goal.

from mlsh.

SiyuanLee commented on August 10, 2024

I reproduce the results too, but it takes many more samples to converge than reported in the paper. Has anyone else met this phenomenon before?

from mlsh.

jhejna commented on August 10, 2024

@SiyuanLee I am also trying to reproduce the results given in the paper. I directly ran the command from the README for AntBandits. After over 1400 iterations it hasn't converged or seemed to improve at all. Did you observe any high sensitivity to random seeds / were you able to reproduce AntBandits?

from mlsh.

Related Issues (14)

Recommend Projects

Reproducing agent performance in MovementBandits about mlsh HOT 3 OPEN

Comments (3)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent