Light

zgimszhd61 / double_q-quickstart Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 8 KB

License: Apache License 2.0

double_q-quickstart's Introduction

double_q-quickstart

DeepMind 的 Double Q-learning 算法是 Q-learning 的一种改进，旨在解决标准 Q-learning 中的过估计问题。具体来说，它通过使用两个 Q 值函数估计器来分离动作选择和动作评估步骤，从而降低了这种过估计。Double Q-learning 主要包括以下几个核心步骤：

初始化两个Q值函数：在Double Q-learning中，我们初始化两个Q值函数 ( Q_A ) 和 ( Q_B )。这两个函数可以看作是相互独立学习的，分别用不同的经验数据进行更新。
经验采样：在每个时间步，智能体根据当前策略（通常是ε-贪心策略）选择一个动作 ( a ) 并执行，然后观察到新的状态 ( s' ) 和奖励 ( r )。
随机选择更新哪一个Q函数：在每个步骤中，随机选择 ( Q_A ) 或 ( Q_B ) 来进行更新。假设选择了 ( Q_A ) 进行更新。
动作选择和动作评估分离：使用 ( Q_B ) 来选择在状态 ( s' ) 下的最优动作 ( a^* )，即 ( a^* = \arg \max_a Q_B(s', a) )；然后使用 ( Q_A ) 来评估这个选择的动作，计算更新目标 ( y = r + \gamma Q_A(s', a^*) )，其中 ( \gamma ) 是折扣因子。
更新选中的Q函数：使用上面计算得到的目标 ( y )，更新 ( Q_A )： [ Q_A(s, a) \leftarrow Q_A(s, a) + \alpha \left(y - Q_A(s, a)\right) ] 其中 ( \alpha ) 是学习率。
重复执行：重复以上步骤直到算法收敛或达到预定的迭代次数。

通过这种方法，Double Q-learning 试图通过在选择最佳动作和评估其价值时使用不同的 Q 函数来避免过估计。这样可以使学习过程更稳定，并提高学习算法的性能。

double_q-quickstart's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.