datawhalechina / easy-rl Goto Github PK

View Code? Open in Web Editor NEW

8.9K 78.0 1.8K 527.59 MB

强化学习中文教程（蘑菇书🍄），在线阅读地址：https://datawhalechina.github.io/easy-rl/

License: Other

Jupyter Notebook 99.32% Python 0.68%

deep-reinforcement-learning reinforcement-learning dqn ppo a3c q-learning sarsa imitation-learning policy-gradient ddpg

easy-rl's People

Contributors

Stargazers

Watchers

Forkers

hzj1558718 toplht horysk snowflying bluesss30 jackliaoall-ai-rl light8990 floraljq wingfight xyanggu mintonmu senwang98 johnathonblog curryliu30 bequietha sjyttkl jiang-19971127 hhhercules blue-ybl z-sb wuhuaqiang barneyqiao i-zhouqh redalexmercer running-sn shenyong123 open-classes zdchen-star shao19950821 liyixinphp mrzhangzhengxv mrcongcong wyqsss dy033 hexi519 en574894764 herrhub rose-tmp honghaoqi suifeng2020 chen-yongquan hkwuks lhf860 jackyvan yangmingmath stan2313 xjsxujingsong sandysnow3 deepframwork whiplash003 desertzk superf0sh pwzy williamjia iamjustarookie lijunrio ssiaw12345 simple667 liyuanbogithub yldang wengbenjue yamonc weibohuang-tw crsir007 huzhyn yjlaugus hehuaiyucn wyj-fps iambecauseithink clementfyj panxiong-cn maomaojianjian littleseadog dolphin4mi buzheteng rattlesnakey tommywongww bweng001 yuyu-pareto goldgaruda leesihan lgry jefferyoung96 1700022825 angrybread qianrenjian llj668 sophie0610 sanwaner wurentidai yxjun1984 m-efforts einsmai lem89757 yanghaocsg zplusless sugar-215 wuyan-ai xrosliang active2626

easy-rl's Issues

笔记提出疑问

这里的表述好像有点问题，不过从前面也能够大抵理解意思。可能有漏字的现象
不理解的地方是：
一个好的 policy π 能够绝大多数 V（s) 达到很大，这个可以理解。但是能否解释一下，为什么有一个π能够让所有的V（s）最大吗？我好像明白了，也就是每一步我的action 都是最优的，那么我就能保证每一个V(s）都是极大的。但是这样显然有个问题。就是我可能为了以后某个S的value极大，暂时放弃目前的最优action。也就是说，不能够一味的采取贪婪的策略。所以我觉得表述让每个状态 V（state）最大让我有点困惑。

Easy-RL

https://datawhalechina.github.io/easy-rl/#/

Description

/chapter3/project1

https://datawhalechina.github.io/easy-rl/#/chapter3/project1

Description

第一章强化学习概述

https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1

Description

Policy Gradient 的疑问

第四章策略梯度有如下表述：

假设你在 st 执行 at，最后发现 τ 的奖励是正的，那你就要增加这一项的概率。
反之，在 st 执行 at 会导致 τ 的奖励变成负的，你就要减少这一项的概率。

我能明白这两句话是在表述我们的目的，但是公式中哪一部分在实现这个目的呢？

PPO中出现一点笔误.

这里第一行的PPO, 是否应该改为TRPO?

/chapter6/chapter6

https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6

Description

会生成pdf版本吗

Error when running A2C

Thanks for this project.

An error is reported when running main.py in the A2C folder:

Start to train ! 
tensor([ 0.0307,  0.0015, -0.0309, -0.0313], device='cuda:0')
Traceback (most recent call last):
  File "G:/My First Paper/Code/easy-rl/codes/A2C/main.py", line 105, in <module>
    train(cfg,env,agent)
  File "G:/My First Paper/Code/easy-rl/codes/A2C/main.py", line 64, in train
    dist, value = agent.model(state)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "G:/My First Paper/Code/easy-rl/codes\A2C\model.py", line 34, in forward
    probs = self.actor(x)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\modules\activation.py", line 1200, in forward
    return F.softmax(input, self.dim, _stacklevel=5)
  File "D:\2019Download\Anaconda\Anaconda\envs\IL\lib\site-packages\torch\nn\functional.py", line 1583, in softmax
    ret = input.softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Process finished with exit code 1

Looking forward to your guidance, thanks in advance.

/chapter3/chapter3_questions&keywords

https://datawhalechina.github.io/easy-rl/#/chapter3/chapter3_questions&keywords

Description

在线笔记图太大字太小

这个地方报错，我需要安装什么

/chapter7/chapter7

https://datawhalechina.github.io/easy-rl/#/chapter7/chapter7

Description

/

https://datawhalechina.github.io/easy-rl/#/

Description

/chapter13/chapter13

https://datawhalechina.github.io/easy-rl/#/chapter13/chapter13

Description

chapter2 bootstrap的翻译

这里应该是叫自助采样，而非拔靴自助docs/chapter2/chapter2.md

Double dqn代码

https://github.com/datawhalechina/leedeeprl-notes/blob/0a9c1bd78d3d5d30f6c11c9d43e0e73863cadbf4/codes/double_dqn/dqn.py#L98-L108
这里使用的策略是Nature dqn而不是Double dqn吧？

第五章对应的习题解答

第五章对应的习题中，“on-policy”和“off-policy”的区别。作者您只阐述了“on-policy”的描述和优劣势，并没有说到区分呀

/chapter10/chapter10

https://datawhalechina.github.io/easy-rl/#/chapter10/chapter10

Description

/chapter2/chapter2_questions&keywords

https://datawhalechina.github.io/easy-rl/#/chapter2/chapter2_questions&keywords

Description

Easy-RL

https://datawhalechina.github.io/easy-rl/#/

Description

/chapter1/chapter1

第5章这里是不是写反了？

是不是应该写成：p(x)很大，q(x)很小？

/chapter5/chapter5_questions&keywords

https://datawhalechina.github.io/easy-rl/#/chapter5/chapter5_questions&keywords

Description

/chapter6/chapter6_questions&keywords

https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6_questions&keywords

Description

/chapter3/chapter3

https://datawhalechina.github.io/easy-rl/#/chapter3/chapter3

Description

请问是不是还不支持完整汇聚全部章节成PDF？

/chapter4/chapter4_questions&keywords

https://datawhalechina.github.io/easy-rl/#/chapter4/chapter4_questions&keywords

Description

关于codes中Q-learning部分的疑问

在运行命令，python main.py --train 0时出现如下报错：

是因为eval函数不接受参数saved_model_path吗？

Policy Gradient logp(a∣s) 的计算

在第四章策略梯度的 Tips 小节上面一点有如下表述：

对于 “TensorFlow 上调用现成函数”，我看到 SO 上有个 How to choose cross-entropy loss in TensorFlow? 这个问题，你建议 PG 这里怎么选呢？

/chapter8/chapter8

https://datawhalechina.github.io/easy-rl/#/chapter8/chapter8

Description

/chapter1/chapter1

https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1

Description

/chapter11/chapter11

https://datawhalechina.github.io/easy-rl/#/chapter11/chapter11

Description

/chapter5/chapter5

https://datawhalechina.github.io/easy-rl/#/chapter5/chapter5

Description

请问怎么打不开笔记了？

/chapter12/project3

https://datawhalechina.github.io/easy-rl/#/chapter12/project3

Description

/

https://datawhalechina.github.io/easy-rl/#/

Description

/chapter7/project2

https://datawhalechina.github.io/easy-rl/#/chapter7/project2

Description

/chapter1/chapter1_questions&keywords

https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1_questions&keywords

Description

/chapter9/chapter9

https://datawhalechina.github.io/easy-rl/#/chapter9/chapter9

Description

/chapter4/chapter4

https://datawhalechina.github.io/easy-rl/#/chapter4/chapter4

Description

/codes/PPO/main.py syntax error

easy-rl/codes/PPO/main.py

Lines 77 to 81 in e4690ac

    
           avg_reward = np.mean(rewards[-100:]) 
        
           if avg_rewardself.actor_lr = 0.002 
        
           self.critic_lr = 0.005 > best_reward: 
        
               best_reward = avg_reward 
        
               agent.save(path=SAVED_MODEL_PATH)

On Line 78 and L79

	avg_reward = np.mean(rewards[-100:])
	if avg_rewardself.actor_lr = 0.002
	self.critic_lr = 0.005 > best_reward:
	best_reward = avg_reward
	agent.save(path=SAVED_MODEL_PATH)

datawhalechina / easy-rl Goto Github PK

easy-rl's People

Contributors

Stargazers

Watchers

Forkers

easy-rl's Issues

Recommend Projects

Recommend Topics

Recommend Org