Hi, I tried to evaluate the baselines using the pretrained models provided in the

Not getting the success rate mentioned in the paper using the pretrained models. about babyai HOT 6 CLOSED

akshay-sharma1995 commented on August 25, 2024

Not getting the success rate mentioned in the paper using the pretrained models.

from babyai.

Comments (6)

maximecb commented on August 25, 2024

Hi, just to make sure, did you use the models in the docker image with the code that is also in the docker image? We've made changes to the master branch since, so if you're using the models from the docker image with code from the current github repo, it's to be expected that this won't work.

If that's not the problem, can you tell us what shell commands you used to run evaluate.py?

from babyai.

akshay-sharma1995 commented on August 25, 2024

Hi Maxime,
Thanks for the quick reply.
So for the results I have posted I used the models from the docker image with the code from the github repo. But after you pointed it out I used the code provided in the docker image, and the results did not change much. Now I am getting 12% success rate for the GoToSeq-v0 environment, and 69% success rate for the PickupLocal-v0. I have used the same models as mentioned in the earlier post.
I used the following shell commands:
python3 evaluate.py --env BabyAI-GoToSeq-v0 --model GoToSeq-1
python3 evaluate.py --env BabyAI-PickupLoc-v0 --model PickupLoc-4_best

from babyai.

maximecb commented on August 25, 2024

The commands that you provided don't work as-is, you'd have to use:

python3 -m scripts.evaluate --env BabyAI-GoToSeq-v0 --model il_baselines/GoToSeq-1
python3 -m scripts.evaluate --env BabyAI-PickupLoc-v0 --model rl_baselines/PickupLoc-4_best

They seem to be working on my end. Evaluating with just 50 episodes because this laptop has a slow GPU:

root@2f97e72b8915:/babyai# python3 -m scripts.evaluate --env BabyAI-PickupLoc-v0 --model rl_baselines/PickupLoc-4_best --episodes 50
F 362 | FPS 136 | D 0:00:02 | R:xsmM 0.90 0.07 0.70 0.97 | S 1.00 | F:xsmM 7.2 4.9 2 21
10 worst episodes:
- episode 29: R=0.7046874999999999, F=21
- episode 45: R=0.7046874999999999, F=21
- episode 25: R=0.7328125, F=19
- episode 2: R=0.775, F=16
- episode 46: R=0.775, F=16
- episode 20: R=0.803125, F=14
- episode 15: R=0.8171875, F=13
- episode 10: R=0.83125, F=12
- episode 17: R=0.83125, F=12
- episode 40: R=0.83125, F=12

root@2f97e72b8915:/babyai# python3 -m scripts.evaluate --env BabyAI-GoToSeq-v0 --model il_baselines/GoToSeq-1 --episodes 50
F 7813 | FPS 240 | D 0:00:32 | R:xsmM 0.86 0.16 0.00 1.00 | S 0.98 | F:xsmM 156.3 193.7 2 1152
10 worst episodes:
- episode 28: R=0.0, F=1152
- episode 22: R=0.5640624999999999, F=279
- episode 38: R=0.5757812499999999, F=543
- episode 2: R=0.675, F=416
- episode 36: R=0.7015625, F=191
- episode 46: R=0.7234375, F=177
- episode 19: R=0.7375, F=168
- episode 5: R=0.746875, F=162
- episode 15: R=0.759375, F=154
- episode 32: R=0.759765625, F=615

Do you have nvidia-docker installed, or are you using plain docker?

from babyai.

akshay-sharma1995 commented on August 25, 2024

The reason I had used the commands like that was because I was running them from the script directory, and had kind of remapped the model directory such that it worked like that.
But anyways I tried to do it the way you suggested. So if I use the shell commands while in the docker image they work fine.
But earlier I had just copied all the stuff out of the docker container into my local machine, and was running it like that. So even now when I tried to run the shell commands you have mentioned on my local copy, it still gives those not so good results I have mentioned earlier.
I am not sure why is it happening. Is it something to do with running everything inside the docker container? Sorry I have never used docker before so not sure about it.

from babyai.

maximecb commented on August 25, 2024

The docker images also contains an installation of torch, python, numpy, etc. It maintains everything in a "frozen" configuration so the results stay replicable, eg: you're running exactly the setup we has when we trained those models. It's hard to say exactly what's different with your configuration outside of the docker container.

from babyai.

akshay-sharma1995 commented on August 25, 2024

I did not know that. Seems like that might have been the issue then.
Thanks for the help!

from babyai.

Not getting the success rate mentioned in the paper using the pretrained models. about babyai HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent