toshikwa / fqf-iqn-qrdqn.pytorch Goto Github PK

View Code? Open in Web Editor NEW

155.0 155.0 23.0 105 KB

PyTorch implementation of FQF, IQN and QR-DQN.

License: MIT License

Python 100.00%

fqf-iqn-qrdqn.pytorch's People

Contributors

Stargazers

Watchers

fqf-iqn-qrdqn.pytorch's Issues

Incorrect Q-Value calculation in "qrdqn" agent

Hi,

First, thank you for providing such a clear and easy-to-follow implementation for some important Distributional RL algorithms.
I found the following line not to be correct according to the QRDQN paper:

https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/11d70bb428e449fe5384654c05e4ab2c3bbdd4cd/fqf_iqn_qrdqn/model/qrdqn.py#L74

I believe it should have been:

q = 1 / 200 * torch.sum(quantiles, dim=1)

which corresponds to the following equation in the paper:

And, here q_j is fixed to 1 / 200.

Question about running FQF agent on Breakout

Hi, I ran the FQF agent on Breakout, but just turned out to get a curve that collapses in the middle(like in the attachment).

The command I was using is
python3 -u train_fqf.py --cuda --env_id BreakoutNoFrameskip-v4 --seed 0 --config config/fqf.yaml .
And the hyperparameters is the default ones.
Should I adjust some hyperparameters to get a curve that you have achieved?

Element-wise product

I think you're forgetting the element-wise product in the IQN paper, am I wrong? At the end of the first paragraph of section 3.1

Could you please tell me how it to run 'BreakoutDeterministic-v4'?

Hello,
Could you please tell me how it to run 'BreakoutDeterministic-v4'?

BTW, I run it by: python train_iqn.py --cuda --env_id BreakoutNoFrameskip-v4 --seed 0 --config config/iqn.yaml
and
python train_iqn.py --cuda --env_id BreakoutDeterministic-v4 --seed 0 --config config/iqn.yaml
I want to compare with, but after run this code, it's return seems strange and not the same as mine. I commented (fqf_iqn_qrdqn/env.py line 275)
assert 'NoFrameskip' in env.spec.id
it seems work, but it's return is strange too. At the beginning of trainning of my code's traning process, it score is larger than your's. It puzzled me

Convolution issue with Dimensions in states/embedded_states

Hi Toshiki

I'm having problems with running your code. It seems to me like there is a problem with the CNN architecture for base DQN.

For all three algorithms, FQF, IQN, QRDQN, the experiments fail at different episode numbers.
For instance running the following:

python train_qrdqn.py --cuda --env_id PongNoFrameskip-v4 --seed 0 --config config/qrdqn.yaml

returns
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 15, 8, 8], but got 3-dimensional input of size [32, 15, 7] instead

Other errors I've observed are.

RuntimeError: Calculated padded input size per channel: (n x n). Kernel size: (8 x 8). Kernel size can't be greater than actual input size.
I also an error pertaining to 5-dimensional input of size [x,x,x,x,x] as opposed to 3-dimeniosnal input.

For instance at episode: 4788

Traceback (most recent call last):
  File "train_qrdqn.py", line 46, in <module>
    run(args)
  File "train_qrdqn.py", line 35, in run
    agent.run()
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/base_agent.py", line 89, in run
    self.train_episode()
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/base_agent.py", line 176, in train_episode
    self.train_step_interval()
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/base_agent.py", line 197, in train_step_interval
    self.learn()
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/qrdqn_agent.py", line 71, in learn
    quantile_loss, mean_q, errors = self.calculate_loss(
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/agent/qrdqn_agent.py", line 94, in calculate_loss
    self.online_net(states=states),
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/model/qrdqn.py", line 48, in forward
    state_embeddings = self.dqn_net(states)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brian/ML/Distribute-Reinforcement-Learning/fqf_iqn_qrdqn/network.py", line 48, in forward
    state_embedding = self.net(states)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/brian/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 15, 8, 8], but got 3-dimensional input of size [32, 15, 7] instead

I've tried to use unsqueeze and squeeze methods in pytorch to change dimensions and get around this but I think perhaps the CNN network is causing this. Have you come across this before?

PS: torch version is 1.7.1

Thanks
Brian

one of the variables needed for gradient computation has been modified by an inplace operation

I am using pytorch 1.5.0 and i am getting error "one of the variables needed for gradient computation has been modified by an inplace operation".

When i enabled torch anomaly detection to find which tensor was being modified inplace.
I got error at line :
https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/542a6e57cdbc8c467495215c5348800942037bfa/fqf_iqn_qrdqn/network.py#L71

Note: It works when i downgraded to pytorch 1.4.0
I am unable to find where is the issue to make it work on torch 1.5.0

fraction proposal network of FQF

Hi, guys!
I have some problems about fraction proposal network of FQF:

why set fraction_lr=2.5e-9, which is very small? And I found that the tau_hats distibution almost had no change during the training.
why apply initialize_weights_xavier(x, gain=0.01)? When I trained, I found if I didn't apply this initialization, gradient explosion would happen sometiomes.
why use RMSprop, and set alpha=0.95, eps=0.00001, of which the default values are 0.99 and 10e-8 respectively.
And I found that the tau_hats distibution almost had no change during the training of qbert. Is it the key of this algorithm?
thanks!

A function about continuing training

Hi, guys!
I often have a problem that when I train the agent in the computer, the code is terminated. So, I think if it can be realized that the agent terminated unexpectedly can continue to complete the training steps.

And another question, how long does it take to train for 200M frames per game?

thanks

No performance in all three algorithms

I use the following command to run three algorithms on Pong respectively, but returns are always around -20 (by replacing <algo> with fqf and so on).

python train_<algo>.py --cuda --env_id PongNoFrameskip-v4 --seed 0 --config config/<algo>.yaml

Is there anything wrong now at master branch (b4928f9)?

Question with running the code

Hi, I ran the code for basically around 4M steps, and suddenly it stoped training and output the model. Do you have any idea about what's wrong here?

Questions of several implementation details

Hello @ku2482

May I ask you several implementation details and why you made these decisions?

In this line, you compute signs by comparing sa_quantiles[I] with sa_quantiles[i-1](except the first one). Why don't you use values_1>0 as the signs?
In this line, you initialize the weights of the FractionProposalNetwork using Xavier initialize with gain=0.01. What makes you choose this initialization?

A question for Fraction Proposal Network in FQF

I am learning FQF recent days. Thanks for the repo that I can learn the algorithm more efficiently~ I found that the Fraction Proposal Net's input in FQF is (s, a) which mentioned in the paper(Algorithm 1). But your implementation made all actions share quantiles/taus for the same state. I'm looking forward to your reply to the conflict. Thank you very much!

Could you please help me with the proof for proposition 1?

Hi @ku2482

Thanks for the code. May I ask you a question?

The author gives proposition 1 and its proof as follows:

I'm quite confused about how they compute the third step, which involves the integral over a quantile function. Could you please help me with that?

Question on QR-DQN calculate_quantile_huber_loss

When calculating the quantile huber loss in QR-DQN (here), the whole term torch.abs(taus[..., None] - (td_errors.detach() < 0).float()) * element_wise_huber_loss is divided by self.kappa.
I cannot find this equation in the paper. Is there any reason for this implementation?

Questions of several implementation details

Hello @ku2482

May I ask you several implementation details and why you made these decisions?

In this line, you compute signs by comparing sa_quantiles[I] with sa_quantiles[i-1](except the first one). Why don't you use values_1>0 as the signs?
In this line, you initialize the weights of the FractionProposalNetwork using Xavier initialize with gain=0.01. What makes you choose this initialization?

calculation of loss

Hi,
This is actually more or of question rather than a issue.

In the utils.py, line#39 where "quantile huber loss" is calculated, there is .detach() statement on the td_errors.
Could you please explain what is the reason?
Thanks

Question on `update_interval` argument

Hello @ku2482,

First off, great work on the repo, the code is very well written.

I do have a question regarding the default value of the update_interval argument defined here. As the environment setup mentioned here implies we are already skipping to every 4^th frame, doesn't update_interval=4 mean the learning occurs every 16th frame?

toshikwa / fqf-iqn-qrdqn.pytorch Goto Github PK

fqf-iqn-qrdqn.pytorch's People

Contributors

Stargazers

Watchers

Forkers

fqf-iqn-qrdqn.pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org