Comments (14)
in contrast though in Breakout-v0 its scoring over 400 in 4-5hrs which is far faster than other model on 32 threads
from rl_a3c_pytorch.
well to start we have different input sizes. His is 42x42 and mine is 80x80. his model is exact replica of universe starter agent. That model is good but obviously very fine tuned for Pong specifically. Im using a 4 layers conv2d model with 32 filters of size 5 × 5, 32 filters of size 5 × 5, 64 filters of size 4 × 4, and 32 filters of size 3 × 3 with single strides for all and max pooling on each. Im also using a 512 LSTM Cell as opposed to 256 last cell. Also have RMSprop shared optimizer implemented. My model obviously larger so slower to train but more robust and much higher final performance as designed for the tough gym v0 environments
from rl_a3c_pytorch.
Thanks ! In your experiment, does RMSprop shared optimizer works better than Adam?
from rl_a3c_pytorch.
They are actually quite different considering both A3C LSTM obviously
from rl_a3c_pytorch.
I fine tuned the Adam more so been using that to train but with some tinkering on RMSprop it should give similar results from the few times I played with it. The Adam epsilon default was must change. Big improvement from just that
from rl_a3c_pytorch.
Thanks for your quick reply!
from rl_a3c_pytorch.
They both show benefit of being more robust and steadying factor to learning compared to non shared
from rl_a3c_pytorch.
@dgriff777 @ppwwyyxx Why did increasing Adam epsilon from 1e-8 to 1e-3 help? Purpose of epsilon is to prevent division by zero by adding it to denominator. 1e-8 is already large enough to prevent 0 division (I think), so changing to 1e-3 would just add more arbitrary bias.
from rl_a3c_pytorch.
The default epsilon for Adam is often not best choice in my experience. As to why in this case it works better several things could be of cause but its hyper parameter searching which always has a fuzzy factor.
from rl_a3c_pytorch.
How long does it take to train Pong-v0? I used 16 threads, and after 7 hours, episode reward is about 10, far slower and worse than the original network.
from rl_a3c_pytorch.
Well as I said before the universal starter agent/ikostrikov/pytorch-a3c is highly optimized for Pong. Also that model uses 42x42 input while mine is 80x80 which means more data to crunch and its also larger more robust model so that it can perform well on all games in Atari not just Pong which is also quite simple. For Pong-v0 its gonna take about 6-7hrs to start scoring 21pts as opposed to other model which is around 2hrs I believe but my model has better overall performance limit
from rl_a3c_pytorch.
thats for 32 threads. Have not trained it on 16threads but rough estimate would be around 10hrs for 16threads at most I believe
from rl_a3c_pytorch.
After about 8 hours, I got expected results on Pong-v0. Thanks!
from rl_a3c_pytorch.
You welcome :)
from rl_a3c_pytorch.
Related Issues (20)
- Why ensure_shared_grads HOT 1
- NotImplementedError HOT 1
- Pretrained models HOT 6
- Quick question on batch processing HOT 4
- eps for Adam HOT 2
- plot rewards as a function of number of timesteps HOT 1
- Stuck when training in MsPacman-v0 HOT 7
- Reward Smoothing HOT 2
- Need for trained models HOT 2
- Cannot import test HOT 1
- Question about Test function HOT 1
- Is there any necessary to lock when update params? HOT 1
- How can I let the training automatically stop after a given number of episodes or after a given period of time? HOT 1
- Clarification needed regarding num_workers HOT 2
- question about trained models HOT 1
- UserWarning: This overload of add_ is deprecated
- Need a model, thank you HOT 1
- The links to the Gym environment evaluations is 404.
- run a3c on 8 cpus, it still slow. HOT 1
- Hyperparameters for training
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rl_a3c_pytorch.