ucla-rlcourse / rlexample Goto Github PK

Some basic examples of playing with RL

Python 85.42% TeX 14.58%

rlexample's Issues

No module named 'gym.envs.atari'

Question about the policy evaluation function and policy extraction of the MDP code

Hi, thanks for your course! It helps me a lot.

I have some question about the code in frozenlake_policy_iteration.py. Why is the expression of the value fuction in compute_policy_v (line 52) same as the state-action function in compute_policy_v (line37) ?

And why is the expression of the value function v[s] = sum([p * (r + gamma * prev_v[s_]) for p, s_, r, _ in env.env.P[s][policy_a]]) different from the formula(17) in the slide? It seems that the expression in the code ignore the transition probability P(s'|s,a)?

Thanks! Look forward your reply~

BEETLE Algorithm

Is there an implementation of BEETLE algorithm from paper "An Analytic Solution to Discrete bayesian RL"? Thanks!

cliffwalk.py running issue

Below code in def _draw_grid gives an error (run by python 3.6)

ValueError: Missing category information for StrCategoryConverter; this might be caused by unintendedly mixing categorical and numeric data
ConversionError: Failed to convert value(s) to axis units: '0'

self.q_texts = [self.ax.text( '0',*self._id_to_position(i)[::-1],
                                     fontsize=11, verticalalignment='center', 
                                     horizontalalignment='center') for i in range(12 * 4)]

switch position and '0' could work. Could you please check and correct it?

self.q_texts = [self.ax.text(*self._id_to_position(i)[::-1], '0',
                                     fontsize=11, verticalalignment='center', 
                                     horizontalalignment='center') for i in range(12 * 4)]

Thanks!

policy iteration doesn't work for deterministic frozen_lake env

I am trying to use the code as an example. Well, it is a little bit strange, when I changed the frozen lake env to the deterministic version, e.g. env = gym.make("FrozenLake=v0, is_slippery=False), I found the policy iteration algorithm can't work correctly. I checked the code, it seems nothing wrong. One of the reason might be the insufficient exploration of the agent, however, the env is simple enough and the default iteration numbers are set to 200000. But the problem still can't be solved.

问题咨询

The small problem in value_iteration code

In line 79, sum([p*(r + prev_v[s_]) lack the gamma (the gamma=1.0 is not affected the result). The right code is sum([p*(r + gamma*prev_v[s_]) in line 70.
Thanks.

Policy loss computes gradients for value network

This issue exists in ac-pong-pytorch.py and pgb-pong-pytorch.

I am investigating this problem. It is the possible cause to the training failure.

Problems with code in pgb-pong-pytorch and pg-pong-pytorch

Hi, I assumed there are some errors with the above two algorithms codes. Basically, they are similar.

In both of them, Professor used "args.batch_size" to update model params every batch_size episodes, this corresponds to what was presented in professor's lecture slide 5. But in the defined function: finish_episode(), G is calculated for every single episode, I guess you might forget to separate rewards in each episode since you also commented in the ac-pong codes and flatten rewards and values you defined for calculation.

If the model is updated every batch_size time, then policy.rewards should append a [] for every episode separately. Hope my understanding is correct.

Jupyter Notebook Vs Anaconda's Spyder

I used to code in Spyder which is a much better IDE. Could I use Spyder instead of Jupyter?

ucla-rlcourse / rlexample Goto Github PK

rlexample's Issues

No module named 'gym.envs.atari'

Question about the policy evaluation function and policy extraction of the MDP code

BEETLE Algorithm

cliffwalk.py running issue

policy iteration doesn't work for deterministic frozen_lake env

问题咨询

The small problem in value_iteration code

Policy loss computes gradients for value network

Problems with code in pgb-pong-pytorch and pg-pong-pytorch

Jupyter Notebook Vs Anaconda's Spyder

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent