As far as I can see, model hyperparameters are different. Thanks.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can you please list out the difference between your code and ikostrikov/pytorch-a3c about rl_a3c_pytorch HOT 14 CLOSED

dgriff777 commented on May 18, 2024

Can you please list out the difference between your code and ikostrikov/pytorch-a3c

from rl_a3c_pytorch.

Comments (14)

dgriff777 commented on May 18, 2024 1

in contrast though in Breakout-v0 its scoring over 400 in 4-5hrs which is far faster than other model on 32 threads

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

well to start we have different input sizes. His is 42x42 and mine is 80x80. his model is exact replica of universe starter agent. That model is good but obviously very fine tuned for Pong specifically. Im using a 4 layers conv2d model with 32 filters of size 5 × 5, 32 filters of size 5 × 5, 64 filters of size 4 × 4, and 32 filters of size 3 × 3 with single strides for all and max pooling on each. Im also using a 512 LSTM Cell as opposed to 256 last cell. Also have RMSprop shared optimizer implemented. My model obviously larger so slower to train but more robust and much higher final performance as designed for the tough gym v0 environments

from rl_a3c_pytorch.

slowbull commented on May 18, 2024

Thanks ! In your experiment, does RMSprop shared optimizer works better than Adam?

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

They are actually quite different considering both A3C LSTM obviously

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

I fine tuned the Adam more so been using that to train but with some tinkering on RMSprop it should give similar results from the few times I played with it. The Adam epsilon default was must change. Big improvement from just that

from rl_a3c_pytorch.

slowbull commented on May 18, 2024

Thanks for your quick reply!

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

They both show benefit of being more robust and steadying factor to learning compared to non shared

from rl_a3c_pytorch.

ethancaballero commented on May 18, 2024

@dgriff777 @ppwwyyxx Why did increasing Adam epsilon from 1e-8 to 1e-3 help? Purpose of epsilon is to prevent division by zero by adding it to denominator. 1e-8 is already large enough to prevent 0 division (I think), so changing to 1e-3 would just add more arbitrary bias.

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

The default epsilon for Adam is often not best choice in my experience. As to why in this case it works better several things could be of cause but its hyper parameter searching which always has a fuzzy factor.

from rl_a3c_pytorch.

slowbull commented on May 18, 2024

How long does it take to train Pong-v0? I used 16 threads, and after 7 hours, episode reward is about 10, far slower and worse than the original network.

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

Well as I said before the universal starter agent/ikostrikov/pytorch-a3c is highly optimized for Pong. Also that model uses 42x42 input while mine is 80x80 which means more data to crunch and its also larger more robust model so that it can perform well on all games in Atari not just Pong which is also quite simple. For Pong-v0 its gonna take about 6-7hrs to start scoring 21pts as opposed to other model which is around 2hrs I believe but my model has better overall performance limit

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

thats for 32 threads. Have not trained it on 16threads but rough estimate would be around 10hrs for 16threads at most I believe

from rl_a3c_pytorch.

slowbull commented on May 18, 2024

After about 8 hours, I got expected results on Pong-v0. Thanks!

from rl_a3c_pytorch.

dgriff777 commented on May 18, 2024

You welcome :)

from rl_a3c_pytorch.

Can you please list out the difference between your code and ikostrikov/pytorch-a3c about rl_a3c_pytorch HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent