Giter Site home page Giter Site logo

datawhalechina / easy-rl Goto Github PK

View Code? Open in Web Editor NEW
8.9K 78.0 1.8K 527.59 MB

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

License: Other

Jupyter Notebook 99.32% Python 0.68%
deep-reinforcement-learning reinforcement-learning dqn ppo a3c q-learning sarsa imitation-learning policy-gradient ddpg

easy-rl's People

Contributors

bebravebecurious avatar gorgeouswang avatar johnjim0816 avatar olianate avatar qiwang067 avatar sm1les avatar ssccinng avatar taojunhui avatar xyw5vplus1 avatar ynjxsjmh avatar yyysjz1997 avatar zh4men9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

easy-rl's Issues

笔记提出疑问

image

这里的表述好像有点问题,不过从前面也能够大抵理解意思。可能有漏字的现象
不理解的地方是:
一个好的 policy π 能够 绝大多数 V(s) 达到很大,这个可以理解。但是能否解释一下,为什么有一个π能够让所有的V(s)最大吗? 我好像明白了,也就是每一步我的action 都是最优的,那么我就能保证每一个V(s)都是极大的。但是这样显然有个问题。 就是我可能为了以后某个S的value极大,暂时放弃目前的最优action。 也就是说,不能够一味的采取贪婪的策略。 所以我觉得表述让每个状态 V(state) 最大让我有点困惑。

Policy Gradient 的疑问

第四章 策略梯度 有如下表述:

image

假设你在 st​ 执行 at​,最后发现 τ 的奖励是正的, 那你就要增加这一项的概率。
反之,在 st​ 执行 at​ 会导致 τ 的奖励变成负的, 你就要减少这一项的概率。

我能明白这两句话是在表述我们的目的,但是公式中哪一部分在实现这个目的呢?

image

Error when running A2C

Thanks for this project.

An error is reported when running main.py in the A2C folder:

Start to train ! 
tensor([ 0.0307,  0.0015, -0.0309, -0.0313], device='cuda:0')
Traceback (most recent call last):
  File "G:/My First Paper/Code/easy-rl/codes/A2C/main.py", line 105, in <module>
    train(cfg,env,agent)
  File "G:/My First Paper/Code/easy-rl/codes/A2C/main.py", line 64, in train
    dist, value = agent.model(state)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "G:/My First Paper/Code/easy-rl/codes\A2C\model.py", line 34, in forward
    probs = self.actor(x)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\activation.py", line 1200, in forward
    return F.softmax(input, self.dim, _stacklevel=5)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\functional.py", line 1583, in softmax
    ret = input.softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Process finished with exit code 1

Looking forward to your guidance, thanks in advance.

第五章对应的习题解答

第五章对应的习题中,“on-policy”和“off-policy”的区别。作者您只阐述了“on-policy”的描述和优劣势,并没有说到区分呀

对bootstrapping(拔靴自助)翻译的一点想法

你好,谢谢你们的项目,我学到了很多。
或许第二章节的Iterative Algorithm for Computing Value of a MRP这里的bootstrapping(拔靴自助)应该翻译为自举吗?
我参考的是Reinforcement Learning an introduction的翻译版强化学习(第二版)第87页第三段第三行的翻译,感觉这样顺口点,bootstrapping(拔靴自助)有点奇怪。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.