In the example (5.3) the book states As the initial pol

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Chapter 5: Monte Carlo ES initial policy about reinforcement-learning-an-introduction HOT 5 CLOSED

jerome-white commented on August 23, 2024

Chapter 5: Monte Carlo ES initial policy

from reinforcement-learning-an-introduction.

Comments (5)

ShangtongZhang commented on August 23, 2024

It seems I did't notice this sentence when I implemented this so I just used a random one.

As the initial policy we use the policy evaluated in the previous blackjack example, that which sticks only on 20 or 21.

I have no idea how exactly you use targetPolicyPlayer as the initial policy so I can't give further advise. But I will also try it myself in the future.

from reinforcement-learning-an-introduction.

jerome-white commented on August 23, 2024

230c230
<         initialAction = np.random.choice(actions)
---
>         initialAction = int(targetPolicyPlayer(*initialState))

from reinforcement-learning-an-introduction.

ShangtongZhang commented on August 23, 2024

This won't work. By initial policy it means the initial policy for the whole training process, not the initial policy for each episode.

from reinforcement-learning-an-introduction.

jerome-white commented on August 23, 2024

Another interpretation would be to use targetPolicyPlayer once for each state, the first time that state has been explored. In this case, the policy array could be viewed as a set: if the state is not in the set, use targetPolicyPlayer to decide the action; use behaviorPolicy otherwise.

However, in my personal implementation I wasn't able to get this logic to work either. I have the feeling that random is the correct thing to do; I just wish I knew what the authors had in mind when they wrote what they did in the exercise.

from reinforcement-learning-an-introduction.

ShangtongZhang commented on August 23, 2024

https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/blob/latest/chapter05/blackjack.py#L230

This line may explain how it works

from reinforcement-learning-an-introduction.

Recommend Projects

Chapter 5: Monte Carlo ES initial policy about reinforcement-learning-an-introduction HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent