yfeng997 / madmario Goto Github PK

View Code? Open in Web Editor NEW

197.0 197.0 69.0 13.63 MB

Interactive tutorial to build a learning Mario, for first-time RL learners

Python 8.37% Jupyter Notebook 91.63%

madmario's People

Contributors

Stargazers

Watchers

Forkers

ganzhi weileze neverstoplearn zhong-shan ahmedgarip mfkenson federicowong ian75013 anjagjerpe gitdavids zutherb markovsc tedb0dy kyledomico tylera277 ayushsawlani seunghyunseo sanatsharma haoranz99 inventor22 agamkohli9 tigerwang3133 purpleyoung kleptokat76 haotianzh cjamessmith davideragazzon aloe-games niceboy120 devherles antonheryanto wangzhesun alandiegosantos oceanhorn blondemonk syl5683 thegiraffe1 threxx89 15751064254 hoanghungict mwickersheim rohitpid alexandrepradeilles barbergeek sidkiblawi zoruasama krysciu jiajie999 ricky-lab moyudaduizhang robweber minzzii-kim b0r1ngx greenhoh1 alexanderhustinx edgaras-civinskas khashayarghamati marinusdebeer kyunghwan-abel-bae ia-in03 yjrubixcube resdese ronakonwatta h0r1on jwcalder

madmario's Issues

why always repeats the env.step() 4times? in class SkipFrame(gym.Wrapper): ?

https://github.com/YuansongFeng/MadMario/blob/6c7bfc2cfe40baae78d9a9aeb62f68d651887ea0/main.py#L29

Enviorment missing some libraries

environment is missing scikit-image and matplotlib. These seem to be required to run main.py

Render video in Colab?

error when running the tutorial v2.ipnb on colab

ModuleNotFoundError Traceback (most recent call last)
in ()
15
16 #NES Emulator for OpenAI Gym
---> 17 from nes_py.wrappers import JoypadSpace
18
19 # Super Mario environment for OpenAI Gym

ModuleNotFoundError: No module named 'nes_py'

Pretrained weight is really bad?

Hi, thanks for the amazing repo!

I download the trained weight here
https://drive.google.com/file/d/1RRwhSMUrpBBRyAsfHLPGt1rlYFoiuus2/view?usp=sharing
mentioned in README.

And then load statedict into Mario network successfully.

file_id = '1RRwhSMUrpBBRyAsfHLPGt1rlYFoiuus2'
url = f'https://drive.google.com/uc?id={file_id}'
!gdown {url} # I run in Colab

ckp = torch.load('./trained_mario.chkpt', map_location=('cuda' if use_cuda else 'cpu'))
mario.exploration_rate = ckp.get('exploration_rate')
mario.net.load_state_dict(ckp.get('model'))

<All keys matched successfully>

However, when trying to play using this trained model, the mario always dies very fast at the beginning (e.g. 40 frames)
Is the above path still a correct pretrained path?

Running out of GPU memory after several minutes training

Hi,

I got a CUDA out of memory issue after several minutes training. Is there a way to fix it?

(py38) C:\Src\GitHub\MadMario>python main.py
Loading model at checkpoints\2021-02-20T16-13-06\trained_mario.chkpt with exploration rate 0.1
Episode 0 - Step 660 - Epsilon 0.1 - Mean Reward 2990.0 - Mean Length 660.0 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 10.198 - Time 2021-02-20T16:29:03
Episode 20 - Step 5262 - Epsilon 0.1 - Mean Reward 1311.095 - Mean Length 250.571 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 61.936 - Time 2021-02-20T16:30:05
Episode 40 - Step 9888 - Epsilon 0.1 - Mean Reward 1149.829 - Mean Length 241.171 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 62.843 - Time 2021-02-20T16:31:08
Episode 60 - Step 13407 - Epsilon 0.1 - Mean Reward 1072.361 - Mean Length 219.787 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 47.898 - Time 2021-02-20T16:31:56
Episode 80 - Step 19197 - Epsilon 0.1 - Mean Reward 1144.407 - Mean Length 237.0 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 77.715 - Time 2021-02-20T16:33:14
Episode 100 - Step 22474 - Epsilon 0.1 - Mean Reward 1060.12 - Mean Length 218.14 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 44.237 - Time 2021-02-20T16:33:58
Episode 120 - Step 26864 - Epsilon 0.1 - Mean Reward 1015.29 - Mean Length 216.02 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 58.86 - Time 2021-02-20T16:34:57
Episode 140 - Step 32109 - Epsilon 0.1 - Mean Reward 1094.56 - Mean Length 222.21 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 71.322 - Time 2021-02-20T16:36:08
Traceback (most recent call last):
File "main.py", line 59, in
action = mario.act(state)
File "C:\Src\GitHub\MadMario\agent.py", line 57, in act
state = torch.FloatTensor(state).cuda() if self.use_cuda else torch.FloatTensor(state)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 10.00 GiB total capacity; 7.56 GiB already allocated; 0 bytes free; 7.74 GiB reserved in total by PyTorch)

How to load the trained model?

How can I load the trained model? When I run replay.py it always creates new checkpoints, why?

GPU Speed Up Benchmark

Here we compare training time on a Macbook Pro (CPU) vs. Google Colab (GPU). In the below terminal outputs, pay attention to the Step Time. It is the average iteration time including act(), step(), learn() and remember().

Macbook Pro CPU

Episode 20 - Step 3603 - Step Time 0.065 - Epsilon 0.999 - Mean Reward 578.905 - Mean Length 171.571 - Mean Loss 2.127 - Mean Q Value 4.123 - Time 2020-06-05T20:23:33
Episode 21 - Step 3643 - Step Time 0.066 - Epsilon 0.999 - Mean Reward 563.091 - Mean Length 165.591 - Mean Loss 2.056 - Mean Q Value 4.087 - Time 2020-06-05T20:23:36
Episode 22 - Step 4097 - Step Time 0.068 - Epsilon 0.999 - Mean Reward 581.696 - Mean Length 178.13 - Mean Loss 1.994 - Mean Q Value 4.063 - Time 2020-06-05T20:24:06
Episode 23 - Step 4195 - Step Time 0.07 - Epsilon 0.999 - Mean Reward 583.542 - Mean Length 174.792 - Mean Loss 1.934 - Mean Q Value 4.041 - Time 2020-06-05T20:24:13
Episode 24 - Step 4235 - Step Time 0.071 - Epsilon 0.999 - Mean Reward 569.44 - Mean Length 169.4 - Mean Loss 1.877 - Mean Q Value 4.019 - Time 2020-06-05T20:24:16
Episode 25 - Step 4493 - Step Time 0.068 - Epsilon 0.999 - Mean Reward 576.231 - Mean Length 172.808 - Mean Loss 1.824 - Mean Q Value 4.001 - Time 2020-06-05T20:24:34

Google Colab GPU

Episode 41 - Step 9149 - Step Time 0.018 - Epsilon 0.998 - Mean Reward 733.976 - Mean Length 217.833 - Mean Loss 0.859 - Mean Q Value 2.953 - Time 2020-06-06T03:22:58
Episode 42 - Step 9568 - Step Time 0.019 - Epsilon 0.998 - Mean Reward 742.163 - Mean Length 222.512 - Mean Loss 0.848 - Mean Q Value 2.963 - Time 2020-06-06T03:23:06
Episode 43 - Step 10622 - Step Time 0.019 - Epsilon 0.997 - Mean Reward 745.114 - Mean Length 241.409 - Mean Loss 0.842 - Mean Q Value 3.004 - Time 2020-06-06T03:23:26
Episode 44 - Step 10662 - Step Time 0.019 - Epsilon 0.997 - Mean Reward 733.689 - Mean Length 236.933 - Mean Loss 0.837 - Mean Q Value 3.054 - Time 2020-06-06T03:23:26
Episode 45 - Step 10702 - Step Time 0.018 - Epsilon 0.997 - Mean Reward 722.761 - Mean Length 232.652 - Mean Loss 0.832 - Mean Q Value 3.104 - Time 2020-06-06T03:23:27
Episode 46 - Step 10828 - Step Time 0.018 - Epsilon 0.997 - Mean Reward 719.851 - Mean Length 230.383 - Mean Loss 0.827 - Mean Q Value 3.16 - Time 2020-06-06T03:23:30

We see a speed up of ~4x by training on Colab GPU.

When we call replay, should we be saving anything at all?

When we call replay, should we be saving anything at all? Since in replay, we're only using the model for inference/serving.

We probably should take a look at replay.py to ensure it's running in eval mode

Originally posted by @suraj813 in #4 (comment)

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)

colab-connect-button is the button on the upper right that shows RAM and Disk usage.