yfeng997 / madmario Goto Github PK
View Code? Open in Web Editor NEWInteractive tutorial to build a learning Mario, for first-time RL learners
Interactive tutorial to build a learning Mario, for first-time RL learners
environment is missing scikit-image and matplotlib. These seem to be required to run main.py
ModuleNotFoundError Traceback (most recent call last)
in ()
15
16 #NES Emulator for OpenAI Gym
---> 17 from nes_py.wrappers import JoypadSpace
18
19 # Super Mario environment for OpenAI Gym
ModuleNotFoundError: No module named 'nes_py'
Hi, thanks for the amazing repo!
I download the trained weight here
https://drive.google.com/file/d/1RRwhSMUrpBBRyAsfHLPGt1rlYFoiuus2/view?usp=sharing
mentioned in README.
And then load statedict into Mario network successfully.
file_id = '1RRwhSMUrpBBRyAsfHLPGt1rlYFoiuus2'
url = f'https://drive.google.com/uc?id={file_id}'
!gdown {url} # I run in Colab
ckp = torch.load('./trained_mario.chkpt', map_location=('cuda' if use_cuda else 'cpu'))
mario.exploration_rate = ckp.get('exploration_rate')
mario.net.load_state_dict(ckp.get('model'))
<All keys matched successfully>
However, when trying to play using this trained model, the mario always dies very fast at the beginning (e.g. 40 frames)
Is the above path still a correct pretrained path?
Hi,
I got a CUDA out of memory issue after several minutes training. Is there a way to fix it?
(py38) C:\Src\GitHub\MadMario>python main.py
Loading model at checkpoints\2021-02-20T16-13-06\trained_mario.chkpt with exploration rate 0.1
Episode 0 - Step 660 - Epsilon 0.1 - Mean Reward 2990.0 - Mean Length 660.0 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 10.198 - Time 2021-02-20T16:29:03
Episode 20 - Step 5262 - Epsilon 0.1 - Mean Reward 1311.095 - Mean Length 250.571 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 61.936 - Time 2021-02-20T16:30:05
Episode 40 - Step 9888 - Epsilon 0.1 - Mean Reward 1149.829 - Mean Length 241.171 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 62.843 - Time 2021-02-20T16:31:08
Episode 60 - Step 13407 - Epsilon 0.1 - Mean Reward 1072.361 - Mean Length 219.787 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 47.898 - Time 2021-02-20T16:31:56
Episode 80 - Step 19197 - Epsilon 0.1 - Mean Reward 1144.407 - Mean Length 237.0 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 77.715 - Time 2021-02-20T16:33:14
Episode 100 - Step 22474 - Epsilon 0.1 - Mean Reward 1060.12 - Mean Length 218.14 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 44.237 - Time 2021-02-20T16:33:58
Episode 120 - Step 26864 - Epsilon 0.1 - Mean Reward 1015.29 - Mean Length 216.02 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 58.86 - Time 2021-02-20T16:34:57
Episode 140 - Step 32109 - Epsilon 0.1 - Mean Reward 1094.56 - Mean Length 222.21 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 71.322 - Time 2021-02-20T16:36:08
Traceback (most recent call last):
File "main.py", line 59, in
action = mario.act(state)
File "C:\Src\GitHub\MadMario\agent.py", line 57, in act
state = torch.FloatTensor(state).cuda() if self.use_cuda else torch.FloatTensor(state)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 10.00 GiB total capacity; 7.56 GiB already allocated; 0 bytes free; 7.74 GiB reserved in total by PyTorch)
How can I load the trained model? When I run replay.py it always creates new checkpoints, why?
Here we compare training time on a Macbook Pro (CPU) vs. Google Colab (GPU). In the below terminal outputs, pay attention to the Step Time
. It is the average iteration time including act()
, step()
, learn()
and remember()
.
Macbook Pro CPU
Episode 20 - Step 3603 - Step Time 0.065 - Epsilon 0.999 - Mean Reward 578.905 - Mean Length 171.571 - Mean Loss 2.127 - Mean Q Value 4.123 - Time 2020-06-05T20:23:33
Episode 21 - Step 3643 - Step Time 0.066 - Epsilon 0.999 - Mean Reward 563.091 - Mean Length 165.591 - Mean Loss 2.056 - Mean Q Value 4.087 - Time 2020-06-05T20:23:36
Episode 22 - Step 4097 - Step Time 0.068 - Epsilon 0.999 - Mean Reward 581.696 - Mean Length 178.13 - Mean Loss 1.994 - Mean Q Value 4.063 - Time 2020-06-05T20:24:06
Episode 23 - Step 4195 - Step Time 0.07 - Epsilon 0.999 - Mean Reward 583.542 - Mean Length 174.792 - Mean Loss 1.934 - Mean Q Value 4.041 - Time 2020-06-05T20:24:13
Episode 24 - Step 4235 - Step Time 0.071 - Epsilon 0.999 - Mean Reward 569.44 - Mean Length 169.4 - Mean Loss 1.877 - Mean Q Value 4.019 - Time 2020-06-05T20:24:16
Episode 25 - Step 4493 - Step Time 0.068 - Epsilon 0.999 - Mean Reward 576.231 - Mean Length 172.808 - Mean Loss 1.824 - Mean Q Value 4.001 - Time 2020-06-05T20:24:34
Google Colab GPU
Episode 41 - Step 9149 - Step Time 0.018 - Epsilon 0.998 - Mean Reward 733.976 - Mean Length 217.833 - Mean Loss 0.859 - Mean Q Value 2.953 - Time 2020-06-06T03:22:58
Episode 42 - Step 9568 - Step Time 0.019 - Epsilon 0.998 - Mean Reward 742.163 - Mean Length 222.512 - Mean Loss 0.848 - Mean Q Value 2.963 - Time 2020-06-06T03:23:06
Episode 43 - Step 10622 - Step Time 0.019 - Epsilon 0.997 - Mean Reward 745.114 - Mean Length 241.409 - Mean Loss 0.842 - Mean Q Value 3.004 - Time 2020-06-06T03:23:26
Episode 44 - Step 10662 - Step Time 0.019 - Epsilon 0.997 - Mean Reward 733.689 - Mean Length 236.933 - Mean Loss 0.837 - Mean Q Value 3.054 - Time 2020-06-06T03:23:26
Episode 45 - Step 10702 - Step Time 0.018 - Epsilon 0.997 - Mean Reward 722.761 - Mean Length 232.652 - Mean Loss 0.832 - Mean Q Value 3.104 - Time 2020-06-06T03:23:27
Episode 46 - Step 10828 - Step Time 0.018 - Epsilon 0.997 - Mean Reward 719.851 - Mean Length 230.383 - Mean Loss 0.827 - Mean Q Value 3.16 - Time 2020-06-06T03:23:30
We see a speed up of ~4x by training on Colab GPU.
When we call replay, should we be saving anything at all? Since in replay, we're only using the model for inference/serving.
We probably should take a look at replay.py to ensure it's running in eval mode
Originally posted by @suraj813 in #4 (comment)
when i run main.py.
it shows that
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
how to solve this error?
Training on Colab takes around 8 hours. We need to keep the session active. Use this snippet to keep the session alive.
function ClickConnect(){
console.log("Clicked on connect button");
document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000)
colab-connect-button
is the button on the upper right that shows RAM and Disk usage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.