toshikwa / fqf-iqn-qrdqn.pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of FQF, IQN and QR-DQN.
License: MIT License
PyTorch implementation of FQF, IQN and QR-DQN.
License: MIT License
Hi,
First, thank you for providing such a clear and easy-to-follow implementation for some important Distributional RL algorithms.
I found the following line not to be correct according to the QRDQN paper:
I believe it should have been:
q = 1 / 200 * torch.sum(quantiles, dim=1)
which corresponds to the following equation in the paper:
And, here qj is fixed to 1 / 200.
Hi, I ran the FQF agent on Breakout, but just turned out to get a curve that collapses in the middle(like in the attachment).
The command I was using is
python3 -u train_fqf.py --cuda --env_id BreakoutNoFrameskip-v4 --seed 0 --config config/fqf.yaml .
And the hyperparameters is the default ones.
Should I adjust some hyperparameters to get a curve that you have achieved?
I think you're forgetting the element-wise product in the IQN paper, am I wrong? At the end of the first paragraph of section 3.1
Hello,
Could you please tell me how it to run 'BreakoutDeterministic-v4'?
BTW, I run it by: python train_iqn.py --cuda --env_id BreakoutNoFrameskip-v4 --seed 0 --config config/iqn.yaml
and
python train_iqn.py --cuda --env_id BreakoutDeterministic-v4 --seed 0 --config config/iqn.yaml
I want to compare with, but after run this code, it's return seems strange and not the same as mine. I commented (fqf_iqn_qrdqn/env.py line 275)
assert 'NoFrameskip' in env.spec.id
it seems work, but it's return is strange too. At the beginning of trainning of my code's traning process, it score is larger than your's. It puzzled me
Hi Toshiki
I'm having problems with running your code. It seems to me like there is a problem with the CNN architecture for base DQN.
For all three algorithms, FQF, IQN, QRDQN, the experiments fail at different episode numbers.
For instance running the following:
python train_qrdqn.py --cuda --env_id PongNoFrameskip-v4 --seed 0 --config config/qrdqn.yaml
returns
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 15, 8, 8], but got 3-dimensional input of size [32, 15, 7] instead
Other errors I've observed are.
For instance at episode: 4788
Traceback (most recent call last):
File "train_qrdqn.py", line 46, in <module>
run(args)
File "train_qrdqn.py", line 35, in run
agent.run()
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/base_agent.py", line 89, in run
self.train_episode()
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/base_agent.py", line 176, in train_episode
self.train_step_interval()
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/base_agent.py", line 197, in train_step_interval
self.learn()
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/qrdqn_agent.py", line 71, in learn
quantile_loss, mean_q, errors = self.calculate_loss(
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/qrdqn_agent.py", line 94, in calculate_loss
self.online_net(states=states),
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/model/qrdqn.py", line 48, in forward
state_embeddings = self.dqn_net(states)
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/network.py", line 48, in forward
state_embedding = self.net(states)
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 15, 8, 8], but got 3-dimensional input of size [32, 15, 7] instead
I've tried to use unsqueeze and squeeze methods in pytorch to change dimensions and get around this but I think perhaps the CNN network is causing this. Have you come across this before?
PS: torch version is 1.7.1
Thanks
Brian
I am using pytorch 1.5.0 and i am getting error "one of the variables needed for gradient computation has been modified by an inplace operation".
When i enabled torch anomaly detection to find which tensor was being modified inplace.
I got error at line :
https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/542a6e57cdbc8c467495215c5348800942037bfa/fqf_iqn_qrdqn/network.py#L71
Note: It works when i downgraded to pytorch 1.4.0
I am unable to find where is the issue to make it work on torch 1.5.0
Hi, guys!
I have some problems about fraction proposal network of FQF:
Hi, guys!
I often have a problem that when I train the agent in the computer, the code is terminated. So, I think if it can be realized that the agent terminated unexpectedly can continue to complete the training steps.
And another question, how long does it take to train for 200M frames per game?
thanks
I use the following command to run three algorithms on Pong respectively, but returns are always around -20 (by replacing <algo>
with fqf
and so on).
python train_<algo>.py --cuda --env_id PongNoFrameskip-v4 --seed 0 --config config/<algo>.yaml
Is there anything wrong now at master branch (b4928f9)?
Hi, I ran the code for basically around 4M steps, and suddenly it stoped training and output the model. Do you have any idea about what's wrong here?
Hello @ku2482
May I ask you several implementation details and why you made these decisions?
sa_quantiles[I]
with sa_quantiles[i-1]
(except the first one). Why don't you use values_1>0
as the signs?gain=0.01
. What makes you choose this initialization?I am learning FQF recent days. Thanks for the repo that I can learn the algorithm more efficiently~ I found that the Fraction Proposal Net's input in FQF is (s, a) which mentioned in the paper(Algorithm 1). But your implementation made all actions share quantiles/taus for the same state. I'm looking forward to your reply to the conflict. Thank you very much!
When calculating the quantile huber loss in QR-DQN (here), the whole term torch.abs(taus[..., None] - (td_errors.detach() < 0).float()) * element_wise_huber_loss
is divided by self.kappa
.
I cannot find this equation in the paper. Is there any reason for this implementation?
Hello @ku2482
May I ask you several implementation details and why you made these decisions?
sa_quantiles[I]
with sa_quantiles[i-1]
(except the first one). Why don't you use values_1>0
as the signs?gain=0.01
. What makes you choose this initialization?Hi,
This is actually more or of question rather than a issue.
In the utils.py, line#39 where "quantile huber loss" is calculated, there is .detach() statement on the td_errors.
Could you please explain what is the reason?
Thanks
Hello @ku2482,
First off, great work on the repo, the code is very well written.
I do have a question regarding the default value of the update_interval
argument defined here. As the environment setup mentioned here implies we are already skipping to every 4th frame, doesn't update_interval=4
mean the learning occurs every 16th frame?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.