Giter Site home page Giter Site logo

double_q-quickstart's Introduction

double_q-quickstart

DeepMind 的 Double Q-learning 算法是 Q-learning 的一种改进,旨在解决标准 Q-learning 中的过估计问题。具体来说,它通过使用两个 Q 值函数估计器来分离动作选择和动作评估步骤,从而降低了这种过估计。Double Q-learning 主要包括以下几个核心步骤:

  1. 初始化两个Q值函数:在Double Q-learning中,我们初始化两个Q值函数 ( Q_A ) 和 ( Q_B )。这两个函数可以看作是相互独立学习的,分别用不同的经验数据进行更新。

  2. 经验采样:在每个时间步,智能体根据当前策略(通常是ε-贪心策略)选择一个动作 ( a ) 并执行,然后观察到新的状态 ( s' ) 和奖励 ( r )。

  3. 随机选择更新哪一个Q函数:在每个步骤中,随机选择 ( Q_A ) 或 ( Q_B ) 来进行更新。假设选择了 ( Q_A ) 进行更新。

  4. 动作选择和动作评估分离:使用 ( Q_B ) 来选择在状态 ( s' ) 下的最优动作 ( a^* ),即 ( a^* = \arg \max_a Q_B(s', a) );然后使用 ( Q_A ) 来评估这个选择的动作,计算更新目标 ( y = r + \gamma Q_A(s', a^*) ),其中 ( \gamma ) 是折扣因子。

  5. 更新选中的Q函数:使用上面计算得到的目标 ( y ),更新 ( Q_A ): [ Q_A(s, a) \leftarrow Q_A(s, a) + \alpha \left(y - Q_A(s, a)\right) ] 其中 ( \alpha ) 是学习率。

  6. 重复执行:重复以上步骤直到算法收敛或达到预定的迭代次数。

通过这种方法,Double Q-learning 试图通过在选择最佳动作和评估其价值时使用不同的 Q 函数来避免过估计。这样可以使学习过程更稳定,并提高学习算法的性能。

double_q-quickstart's People

Contributors

zgimszhd61 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.