Giter Site home page Giter Site logo

microsoft / fqf Goto Github PK

View Code? Open in Web Editor NEW
40.0 6.0 10.0 837 KB

FQF(Fully parameterized Quantile Function for distributional reinforcement learning) is a general reinforcement learning framework for Atari games, which can learn to play Atari games automatically by predicting return distribution in the form of a fully parameterized quantile function.

License: Other

Python 20.12% Jupyter Notebook 79.70% Shell 0.18%

fqf's Introduction

Fully parameterized Quantile Function (FQF)

Tensorflow implementation of paper

Fully Parameterized Quantile Function for Distribution Reinforcement Learning

Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-yan Liu

If you use this code in your research, please cite

@inproceedings{yang2019fully,
  title={Fully Parameterized Quantile Function for Distributional Reinforcement Learning},
  author={Yang, Derek and Zhao, Li and Lin, Zichuan and Qin, Tao and Bian, Jiang and Liu, Tie-Yan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={6190--6199},
  year={2019}
}

Requirements

  • python==3.6
  • tensorflow
  • gym
  • absl-py
  • atari-py
  • gin-config
  • opencv-python

Installation on Ubuntu

sudo apt-get update && sudo apt-get install cmake zlib1g-dev
pip install absl-py atari-py gin-config==0.1.4 gym opencv-python tensorflow-gpu==1.12.0
cd FQF
pip install -e .

Experiments

  • Our experiments and hyper-parameter searching can be simply run as the following
cd FQF/dopamine/discrete_domains
bash run-fqf.sh

Bug Fixed

  • It is recommended to use the L2 loss on gradient for probability proposal network, or clip the largest proposed probability to 0.98. The reason is as follows: in quantile function, when the probability goes to 1, the quantile value goes to infinity(or a very large number). Although a very large quantile value is reasonable for a probability such as 0.9999999, with limited approximation ability of neural network, quantile values for other probabilities will go up quickly, leading to a performance drop.

Acknowledgement

  • Our code is implemented based on dopamine.

Code of Conduct

fqf's People

Contributors

linzichuan avatar microsoft-github-operations[bot] avatar microsoftopensource avatar waterblue13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fqf's Issues

Some Instructions Required

Hi, I'm just trying to build up some new stuff from your word. Could you please give me some instructions or suggestions on how to use your code here?

tf.gather_nd error

Hi!

I'm triyng to run FQF using the script run-fqf.sh, but I'm getting an error that I couldn't resolve. It only happens when the agent starts trainning.

I'm running the code using CPU and not GPU. Would be it the problem?

Thanks for your attention!

File "train.py", line 65, in <module>
    app.run(main)
[elided 14 identical lines from previous traceback]
File "../../dopamine/agents/dqn/dqn_agent.py", line 205, in __init__
    self._train_op = self._build_train_op()
File "../../dopamine/agents/fqf/fqf_agent.py", line 377, in _build_train_op
    chosen_action_L_tau = tf.gather_nd(self._replay_net_outputs.L_tau, reshaped_actions)
File "/home/julio/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3647, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
File "/home/julio/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
File "/home/julio/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
File "/home/julio/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
File "/home/julio/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[31] = [31, 1] does not index into shape [31,32,9]
	 [[node gradients_2/GatherNd_3_grad/ScatterNd (defined at ../../dopamine/agents/fqf/fqf_agent.py:410) ### ]]

stale gradients problem

If I didn't get it wrong, there might be a subtle problem in applying gradients to FPN's trainable variables.

When optimizing FPN, the application of gradients w.r.t. FPN's trainable variables is separated into 2 stages: first dW1 (from the 1-Wasserstein loss) and then the entropy.
After the first optimization, the trainables would have changed.
What I mean is: entropy is calculated based on the old trainables but applied to the new trainables.
I'm not sure, but is this the so-called stale gradients problem?

Hope to respond

Reproducing paper results

Hi,
I am trying to evaluate FQF, to use it as a baseline on some discrete environments. However, I encountered an issue: the script run-iqn.sh [EDIT: run-fqf.sh] does not seem to evaluate FQF, but actually IQN. I think the problem comes from the function create_agent in dopamine/discrete_domains/run_experiment.py can only create Rainbow, DQN and IQN (and not FQF). It is possible I missed something, could you explain how I can use this code to evaluate FQF?
Thanks,
Nino

entroy coeffieicent problem

If I didn't get it wrong, there might be a subtle problem in applying gradients to FPN's trainable variables.

the entropy coefficient, 0.001 or fqf_ent or self.ent in the code, applied twice.

first at fqf_agent.py, line 399, via a magic number 0.001:

q_entropy = tf.reduce_sum(-quantile_tau * tf.log(quantile_tau), axis=1) * 0.001

then at line 419 the same file, applied twice, via self.ent:

self.optimizer1.minimize(self.ent * tf.reduce_mean(-q_entropy), var_list=fqf_params), \

fraction proposal network of FQF

Hi,
I have some problems about fraction proposal network of FQF:
1.why set fraction_lr=5e-5*fqf_factor(0.000001)=5e-11, which is very small? And I found that the tau_hats distibution almost had no change during the training.
2.why apply initialize_weights_xavier(x, gain=0.01)? When I trained, I found if I didn't apply this initialization, gradient explosion would happen sometiomes.
3.why use RMSprop, and set alpha=0.95, eps=0.00001, of which the default values are 0.99 and 10e-8 respectively.
4.And I found that the tau_hats distibution almost had no change during the training of qbert. Is it the key of this algorithm?
thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.