This is a reposiry where I benchmark my implenation of reinforcement algorthems on Atari games. The alogorthems are implemtend in Pytorch.
You can train the model by executing the following command:
python atariPOO.py
Currently it takes roughly 2h 20min to to run 100k time steps with 8 parallel environemnts on a 15-4300u cpu using Proximal Policy Optimization. Achieving an average run score of 7 (of the past 100 time steps) on OpenAi's Atari BreakoutNoFrameskip-v4
Currenlty episodes are arbitrary defined as 128 time steps. This allows me to rigidly define memory buffer size
- This is not complete work. expect a few more updates
- Weights and bais are obtaining a average score of about 15 at about 100k time steps. https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Atari--VmlldzoxMTExNTI
- more tuning needs to be done, increasing buffer size 1024, learning rate to 0.00025 and what score is recorded
PPO-Implementation-Deep-Dive, Great starting point Proximal Policy Optimization - PPO in PyTorch