Giter Site home page Giter Site logo

Comments (5)

starry-sky6688 avatar starry-sky6688 commented on May 18, 2024

(1)COMA是on-policy,所以只能用当前策略去选择下一个动作,而不能像DQN那样选择最大的。
(2)如果是填充的话,那么mask那里应该会抹掉,不会影响网络的更新。

from marl-algorithms.

yywe avatar yywe commented on May 18, 2024

(1) make sense
(2) 我之前这么想过,但是我感觉这里mask 抹不掉?
因为这里最后一步的target其实是为了计算最后一步的td error.
比如说这里假设最后一步是取的a,就是q_evals[-1] (就是q [a]是其中的某个值), 然后 terminated 了。
对应的,q_target,因为没有下一步了,填充的。
所以这里最后一步有个td error: (q_evals[-1][a] - q_target[-1][0]) **2
虽然q_target的最后一个是填充的,但是q-evals全部都不是填充的?
就是说, mask跟terminated不会截断最后一步,毕竟最后一步是真实的。

这个跟q-learning不一样的地方在于,q-learning的最后一步没有predicted target再选值这一步。q_learning最后一步的target直接就是最后的reward 数值,没有经过neural network。所以感觉不一样

from marl-algorithms.

yywe avatar yywe commented on May 18, 2024

关于(1), 有更多的指教吗,就是说, 如果像dqn那样选最大的,会又啥问题??

另外,policy gradient这类算法一般是on-policy的。value-based一般是off policy训练。但是td error本身一般都是用在off policy方法上。
这里coma其实是分两部分,eval部分是on policy 的policy gradient, 只是advantage是用critic网络来估计的,而且这个critic网络是用td_error来优化的。
所以这里算是actor critic, actor critic本身算法 policy based 跟value based的一个组合。
所以这里对critic 我不太理解为啥不能用dqn那样选择.

多谢!🙏

from marl-algorithms.

starry-sky6688 avatar starry-sky6688 commented on May 18, 2024

最后一步真实,u_next是填充,但是由于terminated的存在,计算q_target时会抹掉u_next对应的Q值,只剩下r。

COMA的Critic也需要是on-policy来更新,更新actor的advantage需要服从策略对应的分布,而advantage是critic计算出来的。

from marl-algorithms.

yywe avatar yywe commented on May 18, 2024

豁然开朗,非常感谢。
忽视了td lambda target

from marl-algorithms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.